SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
Algorithms and Accountability — Thus, the appearance of an autocompletion suggestion during the search process might make people decide to search for this suggestion although they didn’t have the intention to. A recent paper by Baker and Potts (2013) consequently questions “the extent to which such algorithms inadvertently help to perpetuate negative stereotypes”. (via New Aesthetic Tumblr)
Triage — iPhone app to quickly triage your email in your downtime. See also the backstory. Awesome UI.
Webcam Pulse Detector — I was wondering how long it would take someone to do the Eulerian video magnification in real code. Now I’m wondering how long it will take the patent-inspired takedown…
How Microsoft Quietly Built the City of the Future — The team now collects 500 million data transactions every 24 hours, and the smart buildings software presents engineers with prioritized lists of misbehaving equipment. Algorithms can balance out the cost of a fix in terms of money and energy being wasted with other factors such as how much impact fixing it will have on employees who work in that building. Because of that kind of analysis, a lower-cost problem in a research lab with critical operations may rank higher priority-wise than a higher-cost fix that directly affects few. Almost half of the issues the system identifies can be corrected in under a minute, Smith says.
UDOO (Kickstarter) — mini PC that could run either Android or Linux, with an Arduino-compatible board embedded. Like faster Raspberry Pi but with Arduino Due-compatible I/O.
Aaron’s Army — powerful words from Carl Malamud. Aaron was part of an army of citizens that believes democracy only works when the citizenry are informed, when we know about our rights—and our obligations. An army that believes we must make justice and knowledge available to all—not just the well born or those that have grabbed the reigns of power—so that we may govern ourselves more wisely.
Vaurien the Chaos TCP Monkey — a project at Netflix to enhance the infrastructure tolerance. The Chaos Monkey will randomly shut down some servers or block some network connections, and the system is supposed to survive to these events. It’s a way to verify the high availability and tolerance of the system. (via Pete Warden)
All Trials Registered — Ben Goldacre steps up his campaign to ensure trial data is reported and used accurately. I’m astonished that there are people who would withhold data, obfuscate results, or opt out of the system entirely, let alone that those people would vigorously assert that they are, in fact, professional scientists.
Patching Binaries — a patch for a crashing bug during import of account transactions or when changing a payee of a downloaded transaction in Microsoft Money Sunset Deluxe. Written with no source, simply by debugging the executable as it shipped for XP.
Book Crossing Dataset — Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
Network Games Market Update (Cartagena Capital) — The myth that players use mobile only ‘on the go’ has been shattered. Smartphones and tablets are now mainstream gaming platforms in their own right and a significant proportion of players play in stationary use case scenarios. Stats abound, including 38% of tablet gamers play more than five hours per week compared to 20% of mobile phone gamer.
Peter Molyneux Profile in Wired — worth reading for: (a) Molyneux’s contribution to the genre; (b) the inspiration he drew from his satirical Twitter mirror (@PeterMolydeux) is lovely, and (c) the game jams to build the fake Molyneux games, where satire becomes reality. (via Andy Baio)
Teaching Web Development in Africa — I used the resources that Pamela Fox helpfully compiled at teaching-materials.org to mentor twelve students who all built their own websites, such as websites for their karate club, fashion club, and traditional dance troupe. One student made a website to teach others about the hardware components of computers, and another website discussing the merits of a common currency in the East African Community. The two most advanced students began programming their own computer game to help others practice touch typing, and it allows players to compete across the network with WebSockets.
Transient Faces (Jeff Howard) — only displaying the unchanging parts of a scene, effectively removing people using computer vision. Disconcerting and elegant. (via Greg Borenstein)
Reproducibility Initiative (Science Exchange) — a service offering researchers who will attempt to reproduce your work. Validated studies will receive a Certificate of Reproducibility acknowledging that their results have been independently reproduced as part of the Reproducibility Initiative. Researchers have the opportunity to publish the replicated results as an independent publication in the PLOS Reproducibility Collection, and can share their data via the figshare Reproducibility Collection repository. The original study will also be acknowledged as independently reproduced if published in a supporting journal. See also writeup in Nature.
Designing Open Projects (PDF) — IBM report with very sensible advice on steps to take when creating open projects for engagement and participation. Should be recommended reading for all who hope to get others to help.
Urban Camouflage Workshop — Most of the day was spent crafting urban camouflage intended to hide the wearer from the Kinect computer vision system. By the end of the workshop we understood how to dress to avoid detection for the three different Kinect formats. (via Beta Knowledge)
Starting a Django Project The Right Way (Jeff Knupp) — I wish more people did this: it’s not enough to learn syntax these days. Projects live in a web of best practices for source code management, deployment, testing, and migrations.
FailCon — a one-day conference for technology entrepreneurs, investors, developers and designers to study their own and others’ failures and prepare for success. Figure out how to learn from failures—they’re far more common than successes. (via Krissy Mo)
Google Fiber in the Real World (Giga Om) — These tests show one of the limitations of Google’s Fiber network: other services. Since Google Fiber is providing virtually unheard of speeds for their subscribers, companies like Apple and I suspect Hulu, Netflix and Amazon will need to keep up. Are you serving DSL speeds to fiber customers? (via Jonathan Brewer)
The Internet of Things That Do What You Tell Them: Cory Doctorow passionately explains how computers are already entwined in our lives, which means laws that support lock-in are much more than inconveniences.