Nat has chaired the O'Reilly Open Source Convention and other O'Reilly conferences for over a decade. He ran the first web server in New Zealand, co-wrote the best-selling Perl Cookbook, and was one of the founding Radar bloggers. He lives in New Zealand and consults in the Asia-Pacific region.
Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
coop — cheat sheet of the most common concurrency program flows in Go.
Tessera — set of open source tools around Hadoop, R, and visualization.
Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
Thread a ZigBee Killer? — Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
HP’s IoT Security Research (PDF) — 70% of devices use unencrypted network services, 90% of devices collected at least one piece of personal information, 60% of those that have UIs are vulnerable to things like XSS, 60% didn’t use encryption when downloading software updates, …
USB Security Flawed From Foundation (Wired) — The element of Nohl and Lell’s research that elevates it above the average theoretical threat is the notion that the infection can travel both from computer to USB and vice versa. Any time a USB stick is plugged into a computer, its firmware could be reprogrammed by malware on that PC, with no easy way for the USB device’s owner to detect it. And likewise, any USB device could silently infect a user’s computer. “It goes both ways,” Nohl says. “Nobody can trust anybody.” [...] “In this new way of thinking, you can’t trust a USB just because its storage doesn’t contain a virus. Trust must come from the fact that no one malicious has ever touched it,” says Nohl. “You have to consider a USB infected and throw it away as soon as it touches a non-trusted computer. And that’s incompatible with how we use USB devices right now.”
AdBlock vs AdBlock Plus — short answer: the genuinely open source AdBlock Plus, because AdBlock resiled from being open source, phones home, has misleading changelog entries, …. No longer trustworthy.
Offline First is the New Mobile First — Luke Wroblewski’s notes from John Allsopp’s talk about “Breaking Development” in Nashville. Offline technologies don’t just give us sites that work offline, they improve performance, and security by minimizing the need for cookies, http, and file uploads. It also opens up new possibilities for better user experiences.
Winograd Schemas as Alternative to Turing Test (IEEE) — specially constructed sentences that are surface ambiguous and require deeper knowledge of the world to disambiguate, e.g. “Jim comforted Kevin because he was so upset. Who was upset?”. Our WS [Winograd schemas] challenge does not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses. Assuming a subject is willing to take a WS test at all, much will be learned quite unambiguously about the subject in a few minutes. (that last from the paper on the subject)
Reclaiming Your Nest (Forbes) — Like so many connected devices, Nest devices regularly report back to the Nest mothership with usage data. Over a month-long period, the researchers’ device sent 32 MB worth of information to Nest, including temperature data, at-rest settings, and self-entered information about the home, such as how big it is and the year it was built. “The Nest doesn’t give us an option to turn that off or on. They say they’re not going to use that data or share it with Google, but why don’t they give the option to turn it off?” says Jin. Jailbreak your Nest (technique to be discussed at Black Hat), and install less chatty software. Loose Lips Sink Thermostats.
SyncNet — decentralised browser: don’t just pull pages from the source, but also fetch from distributed cache (implemented with BitTorrent Sync).
streisand — sets up a new server running L2TP/IPsec, OpenSSH, OpenVPN, Shadowsocks, Stunnel, and a Tor bridge. It also generates custom configuration instructions for all of these services. At the end of the run you are given an HTML file with instructions that can be shared with friends, family members, and fellow activists.
Neglected Machine Learning Ideas — Perhaps my list is a “send me review articles and book suggestions” cry for help, but perhaps it is useful to others as an overview of neat things.
First Crowdfunded Book on Booker Shortlist — Booker excludes self-published works, but “The Wake” was through Unbound, a Threadless-style “if we hit this limit, the book is printed and you have bought a copy” site.
Watson Can Debate Its Opponents (io9) — Speaking in nearly perfect English, Watson/The Debater replied: “Scanned approximately 4 million Wikipedia articles, returning ten most relevant articles. Scanned all 3,000 sentences in top ten articles. Detected sentences which contain candidate claims. Identified borders of candidate claims. Assessed pro and con polarity of candidate claims. Constructed demo speech with top claim predictions. Ready to deliver.”
ipfs — a global, versioned, peer-to-peer file system. It combines good ideas from Git, BitTorrent, Kademlia, and SFS. You can think of it like a single BitTorrent swarm, exchanging Git objects, making up the web. IPFS provides an interface much simpler than HTTP, but has permanence built in.. (via Sourcegraph)
Talking to Big Machines (Jon Bruner) — “Selfless machines” coordinate across networks and modify their own operation to improve the output of the entire system.
Docker Security — Containers do not contain and Stop assuming that Docker and the Linux kernel protect you from malware.
Your Voice Assistant is Mine (PDF) — Through Android Intent mechanism, VoicEmployer triggers Google Voice Search to the foreground, and then plays prepared audio ﬁles (like “call number 1234 5678”) in the background. Google Voice Search can recognize this voice command and execute corresponding operations. With ingenious designs, our GVS-Attack can forge SMS/Email, access privacy information, transmit sensitive data and achieve remote control without any permission.
escher (GitHub) — choiceless programming and non-Turing coding. Mind: blown.