Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
The Uncanny Valley of Speech Recognition (Zach Holman) — I’m reminded of driving up US-280 in 2003 or so with @raelity, a Kiwi and a South African trying every permutation of American accent from Kentucky to Yosemite Sam in order to get TellMe to stop giving us the weather for zipcode 10000. It didn’t recognise the swearing either. (Caution: features similarly strong language.)
TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries (PDF) — an integrated PAQ [Predictive Analytic Queries] planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching. The resulting system, TUPAQ, solves the PAQ planning problem with comparable accuracy to exhaustive strategies but an order of magnitude faster, and can scale to models trained on terabytes of data across hundreds of machines.
p2pvc — point-to-point video chat. In an 80×25 terminal window.
Real World Active Learning — the point at which algorithms fail is precisely where there’s an opportunity to insert human judgment to actively improve the algorithm’s performance. An O’Reilly report with CrowdFlower.
Hearing With Your Tongue (BoingBoing) — The tongue contains thousands of nerves, and the region of the brain that interprets touch sensations from the tongue is capable of decoding complicated information. “What we are trying to do is another form of sensory substitution,” Williams said.
Making Wrong Code Look Wrong (Joel Spolsky) — This makes mistakes even more visible. Your eyes will learn to “see” smelly code, and this will help you find obscure security bugs just through the normal process of writing code and reading code.
Simple Testing Can Prevent Most Critical Failures — We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code – the last line of defense – even without an understanding of the software design. We extracted three simple rules from the bugs that have lead to some of the catastrophic failures, and developed a static [Java] checker, Aspirator, capable of locating these bugs. One of the tests is a FIXME or TODO in an exception handler.
Quantum Machine Learning Algorithms: Read the Fine Print (Scott Aaronson) — In the years since HHL, quantum algorithms achieving “exponential speedups over classical algorithms” have been proposed for other major application areas […]. With each of them, one faces the problem of how to load a large amount of classical data into a quantum computer (or else compute the data “on-the-fly”), in a way that is efficient enough to preserve the quantum speedup.
Global Forecast System — National Weather Service open sources its weather forecasting software. Hope you have a supercomputer and all the data to make use of it …
High-reproducibility and high-accuracy method for automated topic classification — Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
Comcast (Github) — Comcast is a tool designed to simulate common network problems like latency, bandwidth restrictions, and dropped/reordered/corrupted packets. On BSD-derived systems such as OSX, we use tools like ipfw and pfctl to inject failure. On Linux, we use iptables and tc. Comcast is merely a thin wrapper around these controls.
The UX Reader — This ebook is a collection of the most popular articles from our [MailChimp] UX Newsletter, along with some exclusive content.
Bad Assumptions — Apple lost more money to currency fluctuations than Google makes in a quarter.
Note and Vote (Google Ventures) — nifty meeting hack to surface ideas and identify popular candidates to a decision maker.
Applying Psychology to Improve Online Behaviour — online game runs massive experiments (w/researchers to validate findings) to improve the behaviour of their players. Some of Riot’s experiments are causing the game to evolve. For example, one product is a restricted chat mode that limits the number of messages abusive players can type per match. It’s a temporary punishment that has led to a noticeable improvement in player behavior afterward —on average, individuals who went through a period of restricted chat saw 20 percent fewer abuse reports filed by other players. The restricted chat approach also proved 4 percent more effective at improving player behavior than the usual punishment method of temporarily banning toxic players. Even the smallest improvements in player behavior can make a huge difference in an online game that attracts 67 million players every month.
Decentralised Autonomous Corporations — Charlie Stross’s near-future fiction of Accelerando comes closer to reality: Malice – revenge for waking him up – sharpens Manfred’s voice. “The president of agalmic.holdings.root.184.97.AB5 is agalmic.holdings.root.184.97.201. The secretary is agalmic.holdings.root.184.D5, and the chair is agalmic.holdings.root.184.E8.FF. All the shares are owned by those companies in equal measure, and I can tell you that their regulations are written in Python. Have a nice day, now!” He thumps the bedside phone control and sits up, yawning, then pushes the do-not-disturb button before it can interrupt again. After a moment he stands up and stretches, then heads to the bathroom to brush his teeth, comb his hair, and figure out where the lawsuit originated and how a human being managed to get far enough through his web of robot companies to bug him.
Coding is Not the New Literacy (Chris Grainger) — We build mental models of everything – from how to tie our shoes to the way macro-economic systems work. With these, we make decisions, predictions, and understand our experiences. If we want computers to be able to compute for us, then we have to accurately extract these models from our heads and record them. Writing Python isn’t the fundamental skill we need to teach people. Modeling systems is. Amen!
Pattern — a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.