Gearpump — Intel’s “actor-driven streaming framework”, initial benchmarks shows that we can process 2 million messages/second (100 bytes per message) with latency around 30ms on a cluster of 4 nodes.
Foundations of Data Science (PDF) — These notes are a first draft of a book being written by Hopcroft and Kannan [of Microsoft Research] and in many places are incomplete. However, the notes are in good enough shape to prepare lectures for a modern theoretical course in computer science.
Fix Mac OS X — each time you start typing in Spotlight (to open an application or search for a file on your computer), your local search terms and location are sent to Apple and third parties (including Microsoft) under default settings on Yosemite (10.10). See also Net Monitor, an open source toolkit for finding phone-home behaviour.
A/B Testing at Netflix (ACM) — Using a combination of static analysis to build a dependency tree, which is then consumed at request time to resolve conditional dependencies, we’re able to build customized payloads for the millions of unique experiences across Netflix.com.
Leslie Lamport Interview Summary — One idea about formal specifications that Lamport tries to dispel is that they require mathematical capabilities that are not available to programmers: “The mathematics that you need in order to write specifications is a lot simpler than any programming language […] Anyone who can write C code, should have no trouble understanding simple math, because C code is a hell of a lot more complicated than” first-order logic, sets, and functions. When I was at uni, profs worked on distributed data, distributed computation, and formal correctness. We have the first two, but so much flawed software that I can only dream of the third arriving.
Fake Identity — generate fake identity data when testing systems.
Project Naptha — automatically applies state-of-the-art computer vision algorithms on every image you see while browsing the web. The result is a seamless and intuitive experience, where you can highlight as well as copy and paste and even edit and translate the text formerly trapped within an image. Chrome extension. (via Anil Dash)
Garbage Trucks and FedEx Vans (IEEE) — Foo alum, Ian Wright, found traction for his electric car biz by selling powertrains for garbage trucks and Fedex vans. Trucks have 20-30y lifetime, but powertrains are replaced several times; the trucks for fleets are custom; and “The average garbage truck in the U.S. spends $55,000 a year on fuel, and up to $30,000 a year on maintenance, mostly brake replacements.”
Microsoft’s Quantum Mechanics (MIT TR) — the race for the “topological qubit”, involving newly-discovered fundamental particles and large technology companies racing to be the first to make something that works.
Floodwatch — a Chrome extension that tracks the ads you see as you browse the internet. It offers tools to help you understand both the volume and the types of ads you’re being served during the course of normal browsing, with the goal of increasing awareness of how advertisers track your browsing behavior, build their version of your online identity, and target their ads to you as an individual.
slfsrv — create simple, cross-platform GUI applications, or wrap GUIs around command-line applications, using HTML/JS/CSS and your own browser.
Review Ninja — a lightweight code review tool that works with GitHub, providing a more structured way to use pull requests for code review. ReviewNinja dispenses with elaborate voting systems, and supports hassle-free committing and merging for acceptable changes.
Liquibase — source control for your database. Apache 2.0 licensed.
A Few Useful Things to Know About Machine Learning (PDF) — This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions. My fave: First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.