"open source" entries

Four short links: 3 July 2015

Four short links: 3 July 2015

Storage Interference, Open Source SSL, Pub-Sub Reverse-Proxy, and Web Components Checklist

  1. The Storage Tipping Pointthe performance optimization technologies of the last decade – log structured file systems, coalesced writes, out-of-place updates and, soon, byte-addressable NVRAM – are conflicting with similar-but-different techniques used in SSDs and arrays. The software we use is written for dumb storage; we’re getting smart storage; but smart+smart = fragmentation, write amplification, and over-consumption.
  2. s2n — Amazon’s open source ssl implementation.
  3. pushpina reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. It communicates with backend web applications using regular, short-lived HTTP requests (GRIP protocol). This allows backend applications to be written in any language and use any webserver.
  4. The Gold Standard Checklist for Web ComponentsThis is a working draft of a checklist to define a “gold standard” for web components that aspire to be as predictable, flexible, reliable, and useful as the standard HTML elements.
Comment
Four short links: 30 June 2015

Four short links: 30 June 2015

Ductile Systems, Accessibility Testing, Load Testing, and CRAP Data

  1. Brittle SystemsMore than two decades ago at Sun, I was convinced that making systems ductile (the opposite of brittle) was the hardest and most important problem in system engineering.
  2. tota11y — accessibility testing toolkit from Khan.
  3. Locustan open source load testing tool.
  4. Impala: a Modern, Open-source SQL Engine for Hadoop (PDF) — CRAP, aka Create, Read, and APpend, as coined by an ex-colleague at VMware, Charles Fan (note the absence of update and delete capabilities). (via A Paper a Day)
Comment
Four short links: 22 June 2015

Four short links: 22 June 2015

Power Analysis, Data at Scale, Open Source Fail, and Closing the Virtuous Loop

  1. Power Analysis of a Typical Psychology Experiment (Tom Stafford) — What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.
  2. The Future of Data at ScaleData curation, on the other hand, is “the 800-pound gorilla in the corner,” says Stonebraker. “You can solve your volume problem with money. You can solve your velocity problem with money. Curation is just plain hard.” The traditional solution of extract, transform, and load (ETL) works for 10, 20, or 30 data sources, he says, but it doesn’t work for 500. To curate data at scale, you need automation and a human domain expert.
  3. Why Are We Still Explaining? (Stephen Walli) — Within 24 hours we received our first righteous patch. A simple 15-line change that provided a 10% boost in Just-in-Time compiler performance. And we politely thanked the contributor and explained we weren’t accepting changes yet. Another 24 hours and we received the first solid bug fix. It was golden. It included additional tests for the test suite to prove it was fixed. And we politely thanked the contributor and explained we weren’t accepting changes yet. And that was the last thing that was ever contributed.
  4. Blood Donors in Sweden Get a Text Message When Their Blood Helps Someone (Independent) — great idea to close the feedback loop. If you want to get more virtuous behaviour, make it a relationship and not a transaction. And if a warm feeling is all you have to offer in return, then offer it!
Comment

.NET open source

Microsoft .Net Team Program Manager, Beth Massi, on the open source .NET Core.

wire mesh

You might have heard the news that .NET is open source. In this post I’m going to explain what exactly we open sourced, why we did it, and how you can get involved.

Defining .NET

If you’re not familiar with .NET, it’s a managed execution environment that provides a variety of services to its running applications including things like automatic memory management, type safety, native interop, and multiple modern programming languages that make it easier to build all kinds of apps, for nearly any device, quickly. The first version of .NET was initially released in 2002 and quickly picked up steam in many businesses. Today there are over 1.8 billion active installs of the .NET Framework and 6 million .NET developers in the world.

The .NET Framework consists of these major components: the common language runtime (CLR), which is the execution engine that handles running applications; the .NET Framework Base Class Libraries (BCL), which provides a library of tested, reusable code that developers can call from their own applications; and the managed languages and compilers for C#, F#, and Visual Basic. Application models extend the common libraries of the .NET Framework to provide additional libraries that developers can use to build specific types of applications, like web, desktop, mobile device apps, etc. For more information on all the components in .NET 2015 see: Understanding .NET 2015.

There are multiple implementations of .NET, some from Microsoft and others from other companies or open source projects. In this post, I’ll focus on .NET Core from Microsoft.

Read more…

Comment
Four short links: 12 June 2015

Four short links: 12 June 2015

OLAP Datastores, Timely Dataflow, Paul Ford is God, and Static Analysis

  1. pinota realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
  2. Naiad: A Timely Dataflow System — in Timely Dataflow, the first two features are needed to execute iterative and incremental computations with low latency. The third feature makes it possible to produce consistent results, at both outputs and intermediate stages of computations, in the presence of streaming or iteration.
  3. What is Code (Paul Ford) — What the coders aren’t seeing, you have come to believe, is that the staid enterprise world that they fear isn’t the consequence of dead-eyed apathy but rather détente. Words and feels.
  4. Facebook Infer Opensourced — the static analyzer I linked to yesterday, released as open source today.
Comment
Four short links: 8 June 2015

Four short links: 8 June 2015

Software Psychology, Virus ID, Mobile Ads, and Complex Coupling

  1. Psychology of Software Architecture — a wonderful piece of writing, but this stood out: It comes down to behavioral economics and game theory. The license we choose modifies the economics of those who use our work.
  2. Single Blood Test to ID Every Virus You’ve Ever HadAs Elledge notes, “in this paper alone we identified more antibody/peptide interactions to viral proteins than had been identified in the previous history of all viral exploration.”
  3. Internet Users Increasingly Blocking Ads, Including on Mobiles (The Economist) — mobile networks working on ad blockers for their customers, If lots of mobile subscribers did switch it on, it would give European carriers what they have long sought: some way of charging giant American online firms for the strain those firms put on their mobile networks. Google and Facebook, say, might have to pay the likes of Deutsche Telekom and Telefónica to get on to their whitelists.
  4. Connasence (Wikipedia) — a taxonomy of (systems) coupling. Two components are connascent if a change in one would require the other to be modified in order to maintain the overall correctness of the system. (Via Ben Gracewood.)
Comment: 1
Four short links: 5 June 2015

Four short links: 5 June 2015

IoT and New Hardware Movement, OpenCV 3, FBI vs Crypto, and Transactional Datastore

  1. New Hardware and the Internet of Things (Jon Bruner) — The Internet of Things and the new hardware movement are not the same thing. The new hardware movement is driven by new tools for: Prototyping (inexpensive 3D printers, CNC machine tools, cheap and powerful microcontrollers, high-level programming languages on embedded systems); Fundraising and business development (Highway1, Lab IX); Manufacturing (PCH, Seeed); Marketing (Etsy, Quirky). The IoT is driven by: Ubiquitous connectivity; Cheap hardware (i.e., the new hardware movement); Inexpensive data processing and machine learning.
  2. OpenCV 3.0 Released — I hadn’t realised how much hardware acceleration comes out of the box with OpenCV.
  3. FBI: Companies Should Help us Prevent Encryption (WaPo) — as Mike Loukides says, we are in a Post-Modern age where we don’t trust our computers and they don’t trust us. It’s jarring to hear the organisation that (over-zealously!) investigates computer crime arguing that citizens should not be able to secure their communications. It’s like police arguing against locks.
  4. cockroacha scalable, geo-replicated, transactional datastore. The Wired piece about it drops the factoid that the creators of GIMP worked on Google’s massive BigTable-successor, Colossus. From Photoshop-alike to massive file systems. Love it.
Comment
Four short links: 3 June 2015

Four short links: 3 June 2015

Filter Design, Real-Time Analytics, Neural Turing Machines, and Evaluating Subjective Opinions

  1. How to Design Applied FiltersThe most frequently observed issue during usability testing were filtering values changing placement when the user applied them – either to another position in the list of filtering values (typically the top) or to an “Applied filters” summary overview. During testing, the subjects were often confounded as they noticed that the filtering value they just clicked was suddenly “no longer there.”
  2. Twitter Herona real-time analytics platform that is fully API-compatible with Storm […] At Twitter, Heron is used as our primary streaming system, running hundreds of development and production topologies. Since Heron is efficient in terms of resource usage, after migrating all Twitter’s topologies to it we’ve seen an overall 3x reduction in hardware, causing a significant improvement in our infrastructure efficiency.
  3. ntman implementation of neural Turing machines. (via @fastml_extra)
  4. Bayesian Truth Seruma scoring system for eliciting and evaluating subjective opinions from a group of respondents, in situations where the user of the method has no independent means of evaluating respondents’ honesty or their ability. It leverages respondents’ predictions about how other respondents will answer the same questions. Through these predictions, respondents reveal their meta-knowledge, which is knowledge of what other people know.
Comment
Four short links: 1 June 2015

Four short links: 1 June 2015

AI Drives, Decent Screencaps, HTTP/2 Antipatterns, Time Series

  1. The Basic AI Drives (PDF) — Surely, no harm could come from building a chess-playing robot, could it? In this paper, we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal-driven systems.
  2. PreTTY — how to take a good-looking screencap of your terminal app in action.
  3. Why Some of Yesterday’s HTTP Best Practices are HTTP/2 Antipatterns — also functions as an overview of HTTP/2 for those of us who didn’t keep up with the standardization efforts.
  4. Tiseana software project for the analysis of time series with methods based on the theory of nonlinear deterministic dynamical systems. (via @aphyr)
Comment
Four short links: 29 May 2015

Four short links: 29 May 2015

Data Integration, Carousel, Distributed Project Management, and Costs of Premature Build

  1. Using Logs to Build Solid Data Infrastructure — (Martin Kleppmann) — For lack of a better term, I’m going to call this the problem of ‘data integration.’ With that, I really just mean ‘making sure that the data ends up in all the right places.’ Whenever a piece of data changes in one place, it needs to change correspondingly in all the other places where there is a copy or derivative of that data.
  2. TremulaJS — sweet carousel, with PHYSICS.
  3. Project Advice (Daniel Bachhuber) — for leaders of distributed teams. Focus on clearing blockers above all else. Because you’re working on an open source project with contributors across many timezones, average time to feedback will optimistically be six hours. More likely, it will be 24-48 hours. This slow feedback cycle can kill progress on pull requests. As a project maintainer, prioritize giving feedback, clearing blockers, and making decisions.
  4. YAGNI (Martin Fowler) — cost of build, cost of delay, cost of carry, cost of repair. Every product manager has felt these costs.
Comment