The Toxoplasma of Rage — It’s in activists’ interests to destroy their own causes by focusing on the most controversial cases and principles, the ones that muddy the waters and make people oppose them out of spite. And it’s in the media’s interest to help them and egg them on.
Samza: LinkedIn’s Stream-Processing Engine — Samza’s goal is to provide a lightweight framework for continuous data processing. Unlike batch processing systems such as Hadoop, which typically has high-latency responses (sometimes hours), Samza continuously computes results as data arrives, which makes sub-second response times possible.
Program Synthesis Explained — The promise of program synthesis is that programmers can stop telling computers how to do things, and focus instead on telling them what they want to do. Inductive program synthesis tackles this problem with fairly vague specifications and, although many of the algorithms seem intractable, in practice they work remarkably well.
Ev Williams on Metrics — a master-class in how to think about and measure what matters. If what you care about — or are trying to report on — is impact on the world, it all gets very slippery. You’re not measuring a rectangle, you’re measuring a multi-dimensional space. You have to accept that things are very imperfectly measured and just try to learn as much as you can from multiple metrics and anecdotes.
Nature, the IT Wizard (Nautilus) — a fun walk through the connections between information theory, computation, and biology.
Google’s Philosopher — interesting take on privacy. Now that the mining and manipulation of personal information has spread to almost all aspects of life, for instance, one of the most common such questions is, “Who owns your data?” According to Floridi, it’s a misguided query. Your personal information, he argues, should be considered as much a part of you as, say, your left arm. “Anything done to your information,” he has written, “is done to you, not to your belongings.” Identity theft and invasions of privacy thus become more akin to kidnapping than stealing or trespassing. Informational privacy is “a fundamental and inalienable right,” he argues, one that can’t be overridden by concerns about national security, say, or public safety. “Any society (even a utopian one) in which no informational privacy is possible,” he has written, “is one in which no personal identity can be maintained.”
S-1 for a Bitcoin Trust (SEC) — always interesting to read through the risks list to see what’s there and what’s not.
Computationally Modelling Human Emotion (ACM) — our work seeks to create true synergies between computational and psychological approaches to understanding emotion. We are not satisfied simply to show our models “fit” human data but rather seek to show they are generative in the sense of producing new insights or novel predictions that can inform understanding. From this perspective, computational models are simply theories, albeit more concrete ones that afford a level of hypothesis generation and experimentation difficult to achieve through traditional theories.
Opinion Formation Models on a Gradient (PLoSONE) — Many opinion formation models embedded in two-dimensional space have only one stable solution, namely complete consensus, in particular when they implement deterministic rules. In reality, however, deterministic social behavior and perfect agreement are rare – at least one small village of indomitable Gauls always holds out against the Romans. […] In this article we tackle the open question: can opinion dynamics, with or without a stochastic element, fundamentally alter percolation properties such as the clusters’ fractal dimensions or the cluster size distribution? We show that in many cases we retrieve the scaling laws of independent percolation. Moreover, we also give one example where a slight change of the dynamic rules leads to a radically different scaling behavior.
Gearpump — Intel’s “actor-driven streaming framework”, initial benchmarks shows that we can process 2 million messages/second (100 bytes per message) with latency around 30ms on a cluster of 4 nodes.
Foundations of Data Science (PDF) — These notes are a first draft of a book being written by Hopcroft and Kannan [of Microsoft Research] and in many places are incomplete. However, the notes are in good enough shape to prepare lectures for a modern theoretical course in computer science.
Fix Mac OS X — each time you start typing in Spotlight (to open an application or search for a file on your computer), your local search terms and location are sent to Apple and third parties (including Microsoft) under default settings on Yosemite (10.10). See also Net Monitor, an open source toolkit for finding phone-home behaviour.
A/B Testing at Netflix (ACM) — Using a combination of static analysis to build a dependency tree, which is then consumed at request time to resolve conditional dependencies, we’re able to build customized payloads for the millions of unique experiences across Netflix.com.
Leslie Lamport Interview Summary — One idea about formal specifications that Lamport tries to dispel is that they require mathematical capabilities that are not available to programmers: “The mathematics that you need in order to write specifications is a lot simpler than any programming language […] Anyone who can write C code, should have no trouble understanding simple math, because C code is a hell of a lot more complicated than” first-order logic, sets, and functions. When I was at uni, profs worked on distributed data, distributed computation, and formal correctness. We have the first two, but so much flawed software that I can only dream of the third arriving.
Fake Identity — generate fake identity data when testing systems.
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput of trillions of rows fetched per day, continuous updates on the order of millions of rows updated per second, strong consistency and repeatable query results even if a query involves multiple datacenters, and no SPOF. (via Greg Linden)
Thumbstopping (Salon) — The prime goal of a Facebook ad campaign is to create an ad “so compelling that it would get people to stop scrolling through their news feeds,” reports the Times. This is known, in Facebook land, as a “thumbstopper.” And thus, the great promise of the digitial revolution is realized: The best minds of our generation are obsessed with manipulating the movement of your thumb on a smartphone touch-screen.
Microsoft’s Development Practices (Ars Technica) — they get the devops religion but call it “combined engineering”. They get the idea of shared code bases, but call it “open source”. At least when they got the agile religion, they called it that. Check out the horror story of where they started: a two-year development process in which only about four months would be spent writing new code. Twice as long would be spent fixing that code. MSFT’s waterfall was the equivalent of American football, where there’s 11 minutes of actual play in the average 3h 12m game.