Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
Program Synthesis Explained — The promise of program synthesis is that programmers can stop telling computers how to do things, and focus instead on telling them what they want to do. Inductive program synthesis tackles this problem with fairly vague specifications and, although many of the algorithms seem intractable, in practice they work remarkably well.
Ev Williams on Metrics — a master-class in how to think about and measure what matters. If what you care about — or are trying to report on — is impact on the world, it all gets very slippery. You’re not measuring a rectangle, you’re measuring a multi-dimensional space. You have to accept that things are very imperfectly measured and just try to learn as much as you can from multiple metrics and anecdotes.
Nature, the IT Wizard (Nautilus) — a fun walk through the connections between information theory, computation, and biology.
Roaring Bitmaps — compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression.
Two Eras of the Internet: From Pull to Push (Chris Dixon) — in which the consumer becomes the infinite sink for an unending and constant stream of updates, media, and social mobile local offers to swipe right on brands near you.
Brain Time (David Eagleman) — the visual system is a distributed system with some flexible built-in consistency. So if the visual brain wants to get events correct timewise, it may have only one choice: wait for the slowest information to arrive. To accomplish this, it must wait about a tenth of a second. In the early days of television broadcasting, engineers worried about the problem of keeping audio and video signals synchronized. Then they accidentally discovered that they had around a hundred milliseconds of slop: As long as the signals arrived within this window, viewers’ brains would automatically resynchronize the signals; outside that tenth-of-a-second window, it suddenly looked like a badly dubbed movie.
CS Bumper Stickers (PDF) — Allocate four digits for the year part of a date: a new millenium is coming. —David Martin. From 1985.
ASCIIcam — real-time ASCII output from your videocamera. This is doing terrible things to my internal chronometer. Is it 2014 or 1984? Yes!
Michael Ossman and the NSA Playset — the guy who read the leaked descriptions of the NSA’s toolchest, built them, and open sourced the designs. One device, dubbed TWILIGHTVEGETABLE, is a knock off of an NSA-built GSM cell phone that’s designed to sniff and monitor Internet traffic. The ANT catalog lists it for $15,000; the NSA Playset researchers built one using a USB flash drive, a cheap SDR, and an antenna, for about $50. The most expensive device, a drone that spies on WiFi traffic called PORCUPINEMASQUERADE, costs about $600 to assemble. At Defcon, a complete NSA Playset toolkit was auctioned by the EFF for $2,250.
Gates Foundation Announces World’s Strongest Policy on Open Access Research (Nature) — Once made open, papers must be published under a license that legally allows unrestricted re-use — including for commercial purposes. This might include ‘mining’ the text with computer software to draw conclusions and mix it with other work, distributing translations of the text, or selling republished versions. CC-BY! We believe that published research resulting from our funding should be promptly and broadly disseminated.
Xenotix — an advanced Cross Site Scripting (XSS) vulnerability detection and exploitation framework. It provides Zero False Positive scan results with its unique Triple Browser Engine (Trident, WebKit, and Gecko) embedded scanner. It is claimed to have the world’s 2nd largest XSS Payloads of about 4700+ distinctive XSS Payloads for effective XSS vulnerability detection and WAF Bypass. Xenotix Scripting Engine allows you to create custom test cases and addons over the Xenotix API. It is incorporated with a feature-rich Information Gathering module for target Reconnaissance. The Exploit Framework includes offensive XSS exploitation modules for Penetration Testing and Proof of Concept creation.
Firing Range — Google’s open source set of web security test cases for scanners.
Wearable Power Assist Device Goes on Sale in Japan (WSJ, Paywall) — The Muscle Suit, which weighs 5.5 kilograms (12 pounds), can be worn knapsack-style and uses a mouthpiece as its control. Unlike other similar suits that rely on motors, it uses specially designed rubber tubes and compressed air as the source of its power. The Muscle Suit can help users pick up everyday loads with about a third of the usual effort. […] will sell for about ¥600,000 ($5,190), and is also available for rent at about ¥30,000 to ¥50,000 per month. Prof. Kobayashi said he expected the venture would ship 5,000 of them in 2015. (via Robot Economics)
Building a Complete Tweet Index (Twitter) — engineering behind the massive searchable Tweet collection: indexes roughly half a trillion documents and serves queries with an average latency of under 100ms.
Data Capture for the Real World (Cameron Neylon) — there’s a huge opportunity for science IT: tracking data as scientists do their work, and then with massive audit trails and provenance info. Think Salesforce for experiments.
Colossus — I/O and Microservice library for Scala from Tumblr engineering.
TweetNLP — CMU open source natural language parsing tools for making sense of Tweets.
Interview with Google X Life Science’s Head (Medium) — I will have been here two years this March. In nineteen months we have been able to hire more than a hundred scientists to work on this. We’ve been able to build customized labs and get the equipment to make nanoparticles and decorate them and functionalize them. We’ve been able to strike up collaborations with MIT and Stanford and Duke. We’ve been able to initiate protocols and partnerships with companies like Novartis. We’ve been able to initiate trials like the baseline trial. This would be a good decade somewhere else. The power of focus and money.
Schooloscope Open Data Post-Mortem — The case of Schooloscope and the wider question of public access to school data challenges the belief that sunlight is the best disinfectant, that government transparency would always lead to better government, better results. It challenges the sentiments that see data as value-neutral and its representation as devoid of politics. In fact, access to school data exposes a sharp contrast between the private interest of the family (best education for my child) and the public interest of the government (best education for all citizens).
M-Lab Observatory — explorable data on the data experience (RTT, upload speed, etc) across different ISPs in different geographies over time.
Gearpump — Intel’s “actor-driven streaming framework”, initial benchmarks shows that we can process 2 million messages/second (100 bytes per message) with latency around 30ms on a cluster of 4 nodes.
Foundations of Data Science (PDF) — These notes are a first draft of a book being written by Hopcroft and Kannan [of Microsoft Research] and in many places are incomplete. However, the notes are in good enough shape to prepare lectures for a modern theoretical course in computer science.