Machine Learning Done Wrong — When dealing with small amounts of data, it’s reasonable to try as many algorithms as possible and to pick the best one since the cost of experimentation is low. But as we hit “big data”, it pays off to analyze the data upfront and then design the modeling pipeline (pre-processing, modeling, optimization algorithm, evaluation, productionization) accordingly.
Ten Simple Rules for Lifelong Learning According to Richard Hamming (PLoScompBio) — Exponential growth of the amount of knowledge is a central feature of the modern era. As Hamming points out, since the time of Isaac Newton (1642/3-1726/7), the total amount of knowledge (including but not limited to technical fields) has doubled about every 17 years. At the same time, the half-life of technical knowledge has been estimated to be about 15 years. If the total amount of knowledge available today is x, then in 15 years the total amount of knowledge can be expected to be nearly 2x, while the amount of knowledge that has become obsolete will be about 0.5x. This means that the total amount of knowledge thought to be valid has increased from x to nearly 1.5x. Taken together, this means that if your daughter or son was born when you were 34 years old, the amount of knowledge she or he will be faced with on entering university at age 17 will be more than twice the amount you faced when you started college.
Apache NiFi — incubated open source project for data flow.
Tug Hospital Robot (Wired) — It may have an adult voice, but Tug has a childlike air, even though in this hospital you’re supposed to treat it like a wheelchair-bound old lady. It’s just so innocent, so earnest, and at times, a bit helpless. If there’s enough stuff blocking its way in a corridor, for instance, it can’t reroute around the obstruction. This happened to the Tug we were trailing in pediatrics. “Oh, something’s in its way!” a woman in scrubs says with an expression like she herself had ruined the robot’s day. She tries moving the wheeled contraption but it won’t budge. “Uh, oh!” She shoves on it some more and finally gets it to move. “Go, Tug, go!” she exclaims as the robot, true to its programming, continues down the hall.
Improving the Robustness of Complex Networks with Preserving Community Structure (PLoSone) — To improve robustness while minimizing the above three costly changes, we first seek to verify that the community structure of networks actually do identify the robustness and vulnerability of networks to some extent. Then, we propose an effective 3-step strategy for robustness improvement, which retains the degree distribution of a network, as well as preserves its community structure.
Update on indie.vc — We’ve worked with the team at Cooley to create an investment instrument that has elements of both debt and equity. Debt in that we will not be purchasing equity initially, but, unlike debt, there is no maturity date, no collateralization of assets and no recourse if it’s never paid back. The equity element will only become a factor if the participating company chooses to raise a round of financing or sell out to an acquiring company. We don’t have a clever acronym or name for this instrument yet, but I’m sure we’ll come up with something great.
How Nathan Barley Came True (Guardian) — if you haven’t already seen Nathan Barley, you should. It’s by the guy who did Black Mirror, and it’s both awful and authentic and predictive and retro and … painfully accurate about the horrors of our Internet/New Media industry. (via BoingBoing)
Trust Engineers (Radio Lab) — Facebook has a created a laboratory of human behavior the likes of which we’ve never seen. We peek into the work of Arturo Bejar and a team of researchers who are tweaking our online experience, bit by bit, to try to make the world a better place. Radio show of goodness. (via Flowing Data)
DARPA’S Haptix Project — The goal of the HAPTIX program is to provide amputees with prosthetic limb systems that feel and function like natural limbs, and to develop next-generation sensorimotor interfaces to drive and receive rich sensory content from these limbs. Today it’s prosthetic limbs for amputees, but within five years it’ll be augmented ad-driven realities for virtual currency ambient social recommendations.
Real World Active Learning — the point at which algorithms fail is precisely where there’s an opportunity to insert human judgment to actively improve the algorithm’s performance. An O’Reilly report with CrowdFlower.
Hearing With Your Tongue (BoingBoing) — The tongue contains thousands of nerves, and the region of the brain that interprets touch sensations from the tongue is capable of decoding complicated information. “What we are trying to do is another form of sensory substitution,” Williams said.
Making Wrong Code Look Wrong (Joel Spolsky) — This makes mistakes even more visible. Your eyes will learn to “see” smelly code, and this will help you find obscure security bugs just through the normal process of writing code and reading code.
Simple Testing Can Prevent Most Critical Failures — We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code – the last line of defense – even without an understanding of the software design. We extracted three simple rules from the bugs that have lead to some of the catastrophic failures, and developed a static [Java] checker, Aspirator, capable of locating these bugs. One of the tests is a FIXME or TODO in an exception handler.
Quantum Machine Learning Algorithms: Read the Fine Print (Scott Aaronson) — In the years since HHL, quantum algorithms achieving “exponential speedups over classical algorithms” have been proposed for other major application areas […]. With each of them, one faces the problem of how to load a large amount of classical data into a quantum computer (or else compute the data “on-the-fly”), in a way that is efficient enough to preserve the quantum speedup.
Global Forecast System — National Weather Service open sources its weather forecasting software. Hope you have a supercomputer and all the data to make use of it …
High-reproducibility and high-accuracy method for automated topic classification — Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
MIT Faculty Search — two open gigs at MIT, one around climate change and one “undefined.” Great job ad.
Scalability at What Cost? — evaluation of these systems, especially in the academic context, is lacking. Folks have gotten all wound-up about scalability, despite the fact that scalability is just a means to an end (performance, capacity). When we actually look at performance, the benefits the scalable systems bring start to look much more sketchy. We’d like that to change.
Building the Workplace We Want (Slack) — culture is the manifestation of what your company values. What you reward, who you hire, how work is done, how decisions are made — all of these things are representations of the things you value and the culture you’ve wittingly or unwittingly created. Nice (in the sense of small, elegant) explanation of what they value at Slack.
The Internet of Things Has Four Big Data Problems (Alistair Croll) — What the IoT needs is data. Big data and the IoT are two sides of the same coin. The IoT collects data from myriad sensors; that data is classified, organized, and used to make automated decisions; and the IoT, in turn, acts on it. It’s precisely this ever-accelerating feedback loop that makes the coin as a whole so compelling. Nowhere are the IoT’s data problems more obvious than with that darling of the connected tomorrow known as the wearable. Yet, few people seem to want to discuss these problems.
Keysweeper — a stealthy Arduino-based device, camouflaged as a functioning USB wall charger, that wirelessly and passively sniffs, decrypts, logs, and reports back (over GSM) all keystrokes from any Microsoft wireless keyboard in the vicinity. Designs and demo videos included.
The Toxoplasma of Rage — It’s in activists’ interests to destroy their own causes by focusing on the most controversial cases and principles, the ones that muddy the waters and make people oppose them out of spite. And it’s in the media’s interest to help them and egg them on.
Samza: LinkedIn’s Stream-Processing Engine — Samza’s goal is to provide a lightweight framework for continuous data processing. Unlike batch processing systems such as Hadoop, which typically has high-latency responses (sometimes hours), Samza continuously computes results as data arrives, which makes sub-second response times possible.
Program Synthesis Explained — The promise of program synthesis is that programmers can stop telling computers how to do things, and focus instead on telling them what they want to do. Inductive program synthesis tackles this problem with fairly vague specifications and, although many of the algorithms seem intractable, in practice they work remarkably well.
Ev Williams on Metrics — a master-class in how to think about and measure what matters. If what you care about — or are trying to report on — is impact on the world, it all gets very slippery. You’re not measuring a rectangle, you’re measuring a multi-dimensional space. You have to accept that things are very imperfectly measured and just try to learn as much as you can from multiple metrics and anecdotes.
Nature, the IT Wizard (Nautilus) — a fun walk through the connections between information theory, computation, and biology.