Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
coop — cheat sheet of the most common concurrency program flows in Go.
Tessera — set of open source tools around Hadoop, R, and visualization.
Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
Virtual Economies — new book from MIT Press on economics in games. The book will enable developers and designers to create and maintain successful virtual economies, introduce social scientists and policy makers to the power of virtual economies, and provide a useful guide to economic fundamentals for students in other disciplines.
The Wisdom of Smaller, Smarter Crowds — in domains in which some crowd members have demonstrably more skill than others, smart sub-crowds could possibly outperform the whole. The central question this work addresses is whether such smart subsets of a crowd can be identified a priori in a large-scale prediction contest that has substantial skill and luck components. (via David Pennock)
NYC’s Dollar Vans (New Yorker) — New York’s unofficial shuttles, called “dollar vans” in some neighborhoods, make up a thriving transportation system that operates where the subway and buses don’t. A somewhat invisible economy. (via Seb Chan)
Eigenmorality — caution: linear algebra and morality, two subjects that many programmers struggle with. (via Pete Warden)
Reflections on Solid Conference — recap of the conference, great for those of us who couldn’t make it. “Software is eating the world…. Hardware gives it teeth.” – Renee DiResta
Cybernation: The Silent Conquest (1962) — [When] computers acquire the necessary capabilities…speeded-up data processing and interpretation will be necessary if professional services are to be rendered with any adequacy. Once the computers are in operation, the need for additional professional people may be only moderate [...] There will be a small, almost separate, society of people in rapport with the advanced computers. These cyberneticians will have established a relationship with their machines that cannot be shared with the average man any more than the average man today can understand the problems of molecular biology, nuclear physics, or neuropsychiatry. Indeed, many scholars will not have the capacity to share their knowledge or feeling about this new man-machine relationship. Those with the talent for the work probably will have to develop it from childhood and will be trained as intensively as the classical ballerina. (via Simon Wardley)
UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
Building a Solid World — O’Reilly research paper about the “software-enhanced networked physical world”. Gonna be mighty interesting in a world where our stuff knows more and is better connected than its owners.
What Did Not Happen at Mt Gox — interesting analysis of some of the popular theories. Overall, Bitcoin has been an ongoing massive online course on economics and distributed systems for the libertarian masses. It’s ironic that Mt. Gox turned into a chapter on fractional reserve banking.
Papers We Love (Github) — a collection of papers from the computer science community to read and discuss.
Understanding Understanding Source Code with Functional Magnetic Resonance Imaging (PDF) — we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax error. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing. I’m wary of fMRI studies but welcome more studies that try to identify what we do when we code. (Or, in this case, identify syntax errors—if they wanted to observe real programming, they’d watch subjects creating syntax errors) (via Slashdot)
Oobleck Security (O’Reilly Radar) — if you missed or skimmed this, go back and reread it. The future will be defined by the objects that turn on us. 50s scifi was so close but instead of human-shaped positronic robots, it’ll be our cars, HVAC systems, light bulbs, and TVs. Reminds me of the excellent Old Paint by Megan Lindholm.
Most Winning A/B Test Results are Illusory (PDF) — Statisticians have known for almost a hundred years how to ensure that experimenters don’t get misled by their experiments [...] I’ll show how these methods ensure equally robust results when applied to A/B testing.