Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.
Machine Learning Done Wrong — [M]ost practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don’t-s).
Bandits for Recommendations — A common problem for internet-based companies is: which piece of content should we display? Google has this problem (which ad to show), Facebook has this problem (which friend’s post to show), and RichRelevance has this problem (which product recommendation to show). Many of the promising solutions come from the study of the multi-armed bandit problem.
Droplets — the Droplet is almost spherical, can self-right after being poured out of a bucket, and has the hardware capabilities to organize into complex shapes with its neighbors due to accurate range and bearing. Droplets are available open-source and use cheap vibration motors and a 3D printed shell. (via Robohub)
Apple’s App Store Approval Guidelines — some of the plainest English I’ve seen, especially the Introduction. I can only aspire to that clarity. If your App looks like it was cobbled together in a few days, or you’re trying to get your first practice App into the store to impress your friends, please brace yourself for rejection. We have lots of serious developers who don’t want their quality Apps to be surrounded by amateur hour.
Cockroach — a distributed key/value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is one binary with minimal configuration and no required auxiliary services.
Linux Foundation Providing for Core Infrastructure Projects — press release, but interested in how they’re tackling sustainability—they’re taking on identifying worthies (glad I’m not the one who says “you’re not worthy” to a project) and being the non-profit conduit for the dosh. Interesting: implies they think the reason companies weren’t supporting necessary open source projects was some combination of being unsure who to support (projects you use, surely?) and how to get them money (ask?). (Sustainability of open source projects is a pet interest of mine)
Ori — a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.
RoboEarth — a Cloud Robotics infrastructure, which includes everything needed to close the loop from robot to the cloud and back to the robot. RoboEarth’s World-Wide-Web style database stores knowledge generated by humans – and robots – in a machine-readable format. Data stored in the RoboEarth knowledge base include software components, maps for navigation (e.g., object locations, world models), task knowledge (e.g., action recipes, manipulation strategies), and object recognition models (e.g., images, object models).
Mother — domestic sensors and an app with an appallingly presumptuous name. (Also, wasn’t “Mother” the name of the ship computer in Alien?) (via BoingBoing)
Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating technologies needed to build a mobile, dexterous robot. Mr. Rubin said he was pursuing additional acquisitions. Rundown of those seven companies.
Hebel (Github) — GPU-Accelerated Deep Learning Library in Python.
What We Learned Open Sourcing — my eye was caught by the way they offered APIs to closed source code, found and solved performance problems, then open sourced the fixed code.
SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
Living Light — 3D printed cephalopods filled with bioluminescent bacteria. PAGING CORY DOCTOROW, YOUR ORGASMATRON HAS ARRIVED. (via Sci Blogs)
Repacking Lego Batteries with a CNC Mill — check out the video. Patrick programmed a CNC machine to drill out the rivets holding the Mindstorms battery pack together. Coding away a repetitive task like this is gorgeous to see at every scale. We don’t have to teach our kids a particular programming language, but they should know how to automate cruft.
My Thoughts on Google+ (YouTube) — when your fans make hatey videos like this one protesting Google putting the pig of Google Plus onto the lipstick that was YouTube, you are Doin’ It Wrong.
The Amen Break (YouTube) — fascinating 20m history of the amen break, a handful of bars of drum solo from a forgotten 1969 song which became the origin of a huge amount of popular music from rap to jungle and commercials, and the contested materials at the heart of sample-based music. Remix it and weep. (via Beta Knowledge)
Lab41 (Github) — open sourced code from a spook hacklab in Silicon Valley.
Fanulus — open sourced Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster. A breadth-first version of the graph traversal language Gremlin operates on graphs stored in the distributed graph database Titan, in any Rexster-fronted graph database, or in HDFS via various text and binary formats.