P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
Modern Methods for Sentiment Analysis — Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.
Duplicate SSH Keys Everywhere — It looks like all devices with the fingerprint are Dropbear SSH instances that have been deployed by Telefonica de Espana. It appears that some of their networking equipment comes set up with SSH by default, and the manufacturer decided to reuse the same operating system image across all devices.
Style.ONS — UK govt style guide covers the elements of writing about statistics. It aims to make statistical content more open and understandable, based on editorial research and best practice. (via Hadley Beeman)
Warren Ellis on the Apple Watch — I, personally, want to put a gold chain on my phone, pop it into a waistcoat pocket, and refer to it as my “digital fob watch” whenever I check the time on it. Just to make the point in as snotty and high-handed a way as possible: This is the decadent end of the current innovation cycle, the part where people stop having new ideas and start adding filigree and extra orifices to the stuff we’ve got and call it the future.
Clustering Bitcoin Accounts Using Heuristics (O’Reilly Radar) — In theory, a user can go by many different pseudonyms. If that user is careful and keeps the activity of those different pseudonyms separate, completely distinct from one another, then they can really maintain a level of, maybe not anonymity, but again, cryptographically it’s called pseudo-anonymity. […] It turns out in reality, though, the way most users and services are using bitcoin, was really not following any of the guidelines that you would need to follow in order to achieve this notion of pseudo-anonymity. So, basically, what we were able to do is develop certain heuristics for clustering together different public keys, or different pseudonyms.
A Primer on Hardware Security: Models, Methods, and Metrics (PDF) — Camouflaging: This is a layout-level technique to hamper image-processing-based extraction of gate-level netlist. In one embodiment of camouflaging, the layouts of standard cells are designed to look alike, resulting in incorrect extraction of the netlist. The layout of nand cell and the layout of nor cell look different and hence their functionality can be extracted. However, the layout of a camouflaged nand cell and the layout of camouflaged nor cell can be made to look identical and hence an attacker cannot unambiguously extract their functionality.
Prompter: A Domain-Specific Language for Versu (PDF) — literally a scripting language (you write theatrical-style scripts, characters, dialogues, and events) for an inference engine that lets you talk to characters and have a different story play out each time.
Failing at Microservices — deconstructed a failed stab at microservices. Category three engineers also presented a significant problem to our implementation. In many cases, these engineers implemented services incorrectly; in one example, an engineer had literally wrapped and hosted one microservice within another because he didn’t understand how the services were supposed to communicate if they were in separate processes (or on separate machines). These engineers also had a tough time understanding how services should be tested, deployed, and monitored because they were so used to the traditional “throw the service over the fence”to an admin approach to deployment. This basically lead to huge amounts of churn and loss of productivity.
Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
NYC’s Dollar Vans (New Yorker) — New York’s unofficial shuttles, called “dollar vans” in some neighborhoods, make up a thriving transportation system that operates where the subway and buses don’t. A somewhat invisible economy. (via Seb Chan)
Eigenmorality — caution: linear algebra and morality, two subjects that many programmers struggle with. (via Pete Warden)
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
Understanding Understanding Source Code with Functional Magnetic Resonance Imaging (PDF) — we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax error. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing. I’m wary of fMRI studies but welcome more studies that try to identify what we do when we code. (Or, in this case, identify syntax errors—if they wanted to observe real programming, they’d watch subjects creating syntax errors) (via Slashdot)
Oobleck Security (O’Reilly Radar) — if you missed or skimmed this, go back and reread it. The future will be defined by the objects that turn on us. 50s scifi was so close but instead of human-shaped positronic robots, it’ll be our cars, HVAC systems, light bulbs, and TVs. Reminds me of the excellent Old Paint by Megan Lindholm.
Most Winning A/B Test Results are Illusory (PDF) — Statisticians have known for almost a hundred years how to ensure that experimenters don’t get misled by their experiments […] I’ll show how these methods ensure equally robust results when applied to A/B testing.
Ori — a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.
RoboEarth — a Cloud Robotics infrastructure, which includes everything needed to close the loop from robot to the cloud and back to the robot. RoboEarth’s World-Wide-Web style database stores knowledge generated by humans – and robots – in a machine-readable format. Data stored in the RoboEarth knowledge base include software components, maps for navigation (e.g., object locations, world models), task knowledge (e.g., action recipes, manipulation strategies), and object recognition models (e.g., images, object models).
Mother — domestic sensors and an app with an appallingly presumptuous name. (Also, wasn’t “Mother” the name of the ship computer in Alien?) (via BoingBoing)