Four short links: 4 August 2014

Web Spreadsheet, Correlated Novelty, A/B Ethics, and Replicated Data Structures

  1. EtherCalcopen source web-based spreadsheet.
  2. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
  3. On The Media Interview with OKCupid CEO — relevant to the debate on ethics of A/B tests. I preferred this to Tim Carmody’s rant.
  4. CRDTs as Alternative to APIswhen using CRDTs to tie your system together, you don’t need to resort to using impoverished representations that simply never come anywhere near the representational power of the data structures you use in your programs at runtime. See also this paper on Convergent and Commutative Replicated Data Types.
Facebook Research, Mountain Game, Dollar Vans, and Eigenmorality

  1. Experimental Evidence of Massive-scale Emotional Contagion Through Social Networks — I suspect many more people have expressed an opinion on the research than have read the research.
  2. Mountain — a new game in which you are (wait for it) a mountain. From the creator of the fake game in Her. (via Chris McDowall)
  3. NYC’s Dollar Vans (New Yorker) — New York’s unofficial shuttles, called “dollar vans” in some neighborhoods, make up a thriving transportation system that operates where the subway and buses don’t. A somewhat invisible economy. (via Seb Chan)
  4. Eigenmorality — caution: linear algebra and morality, two subjects that many programmers struggle with. (via Pete Warden)
Statistical Sensitivity, Scientific Mining, Data Mining Books, and Two-Sided Smartphones

  1. Car Alarms and Smoke Alarms (Slideshare) — how to think about and draw the line between sensitivity and specificity.
  2. 101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
  3. 12 Free-as-in-beer Data Mining Books — for your next flight.
  4. Dual-Touch Smartphone Concept — brilliant design sketches for interactivity using the back of the phone as a touch-sensitive input device.
Your Brain on Code, Internet of Compromised Things, Waiting for Wearables, and A/B Illusions

  1. Understanding Understanding Source Code with Functional Magnetic Resonance Imaging (PDF) — we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax error. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing. I’m wary of fMRI studies but welcome more studies that try to identify what we do when we code. (Or, in this case, identify syntax errors—if they wanted to observe real programming, they’d watch subjects creating syntax errors) (via Slashdot)
  2. Oobleck Security (O’Reilly Radar) — if you missed or skimmed this, go back and reread it. The future will be defined by the objects that turn on us. 50s scifi was so close but instead of human-shaped positronic robots, it’ll be our cars, HVAC systems, light bulbs, and TVs. Reminds me of the excellent Old Paint by Megan Lindholm.
  3. Google Readying Android Watch — just as Samsung moves away from Android for smart watches and I buy me and my wife a Pebble watch each for our anniversary. Watches are in the same space as Goggles and other wearables: solutions hunting for a problem, a use case, a killer tap. “OK Google, show me offers from brands I love near me” isn’t it (and is a low-lying operating system function anyway, not a userland command).
  4. Most Winning A/B Test Results are Illusory (PDF) — Statisticians have known for almost a hundred years how to ensure that experimenters don’t get misled by their experiments […] I’ll show how these methods ensure equally robust results when applied to A/B testing.
Floating Point, Secure Distributed FS, Cloud Robotics, and Domestic Sensors

  1. What Every Computer Scientist Should Know About Floating Point Arithmetic — in short, “it will hurt you.”
  2. Ori a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.
  3. RoboEartha Cloud Robotics infrastructure, which includes everything needed to close the loop from robot to the cloud and back to the robot. RoboEarth’s World-Wide-Web style database stores knowledge generated by humans – and robots – in a machine-readable format. Data stored in the RoboEarth knowledge base include software components, maps for navigation (e.g., object locations, world models), task knowledge (e.g., action recipes, manipulation strategies), and object recognition models (e.g., images, object models).
  4. Mother — domestic sensors and an app with an appallingly presumptuous name. (Also, wasn’t “Mother” the name of the ship computer in Alien?) (via BoingBoing)
Mating Math, Precogs Are Coming, Tor Bad Guys, and Mind Maps

  1. How a Math Genius Hacked OkCupid to Find True Love (Wired) — if he doesn’t end up working for OK Cupid, productising this as a new service, something is wrong with the world.
  2. Humin: The App That Uses Context to Enable Better Human Connections (WaPo) — Humin is part of a growing trend of apps and services attempting to use context and anticipation to better serve users. The precogs are coming. I knew it.
  3. Spoiled Onions — analysis identifying bad actors in the Tor network, Since September 2013, we discovered several malicious or misconfigured exit relays[…]. These exit relays engaged in various attacks such as SSH and HTTPS MitM, HTML injection, and SSL stripping. We also found exit relays which were unintentionally interfering with network traffic because they were subject to DNS censorship.
  4. My Mind (Github) — a web application for creating and managing Mind maps. It is free to use and you can fork its source code. It is distributed under the terms of the MIT license.

Design, Math, and Data

Lessons from the design community for developing data-driven applications

By Dean Malmgren

When you hear someone say, “that is a nice infographic” or “check out this sweet dashboard,” many people infer that they are “well-designed.” Creating accessible (or for the cynical, “pretty”) content is only part of what makes good design powerful. The design process is geared toward solving specific problems. This process has been formalized in many ways (e.g., IDEO’s Human Centered Design, Marc Hassenzahl’s User Experience Design, or Braden Kowitz’s Story-Centered Design), but the basic idea is that you have to explore the breadth of the possible before you can isolate truly innovative ideas. We, at Datascope Analytics, argue that the same is true of designing effective data science tools, dashboards, engines, etc — in order to design effective dashboards, you must know what is possible.

Read more…

Zombie Drones, Algebra Through Code, Data Toolkit, and Crowdsourcing Antibiotic Discovery

  1. Skyjack — drone that takes over other drones. Welcome to the Malware of Things.
  2. Bootstrap Worlda curricular module for students ages 12-16, which teaches algebraic and geometric concepts through computer programming. (via Esther Wojicki)
  3. Harvestopen source BSD-licensed toolkit for building web applications for integrating, discovering, and reporting data. Designed for biomedical data first. (via Mozilla Science Lab)
  4. Project ILIAD — crowdsourced antibiotic discovery.
Hardware Market, Bio Patent History Lesson, Multiplayer Mathematics, and TV Numbers (Down)

  1. Huaqiang Bei Map for Makers — excellent resource for visitors to an iconic huge electronics market in Shenzhen. (via Bunnie Huang)
  2. A 16th Century Dutchman Can Tell us Everything We Need to Know about GMO PatentsThere’s nothing wrong with this division of labor, except that it means that fewer people are tinkering. We’ve centralized the responsibility for agricultural innovation among a few engineers, even fewer investors, and just a handful of corporations. (and check out the historical story—it’s GREAT)
  3. Polymath Projects — massively multiplayer mathematical proving ground. Let the “how many mathematicians does it take” jokes commence. (via Slashdot)
  4. Stats on Dying TV — like a Mary Meeker preso, accumulation of evidence that TV screens and cable subscriptions are dying and mobile-consumed media are taking its place.
Coding for Unreliability, AirBnB JS Style, Category Theory, and Text Processing

  1. Quantitative Reliability of Programs That Execute on Unreliable Hardware (MIT) — As MIT’s press release put it: Rely simply steps through the intermediate representation, folding the probability that each instruction will yield the right answer into an estimation of the overall variability of the program’s output. (via Pete Warden)
  2. AirBNB’s Javascript Style Guide (Github) — A mostly reasonable approach to JavaScript.
  3. Category Theory for Scientists (MIT Courseware) — Scooby snacks for rationalists.
  4. Textblob — Python open source text processing library with sentiment analysis, PoS tagging, term extraction, and more.