statistics Archives - O'Reilly Radar

Four short links: 7 April 2015

JavaScript Numeric Methods, Misunderstood Statistics, Web Speed, and Sentiment Analysis

NumericJS — numerical methods in JavaScript.
P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
Breaking the 1000ms Time to Glass Mobile Barrier (YouTube) —
See also slides. Stay under 250 ms to feel “fast.” Stay under 1000 ms to keep users’ attention.
Modern Methods for Sentiment Analysis — Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.

Four short links: 26 March 2015

GPU Graph Algorithms, Data Sharing, Build Like Google, and Distributed Systems Theory

by Nat Torkington | @gnat | +Nat Torkington | March 26, 2015

gunrock — a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. (via Ben Lorica)
How to Share Data with a Statistician — some instruction on the best way to share data to avoid the most common pitfalls and sources of delay in the transition from data collection to data analysis.
Bazel — a build tool, i.e. a tool that will run compilers and tests to assemble your software, similar to Make, Ant, Gradle, Buck, Pants, and Maven. Google’s build tool, to be precise.
You Can’t Have Exactly-Once Delivery — not about the worst post office ever. FLP and the Two Generals Problem are not design complexities, they are impossibility results.

Four short links: 18 February 2015

Sales Automation, Clone Boxes, Stats Style, and Extra Orifices

by Nat Torkington | @gnat | +Nat Torkington | February 18, 2015

Systematising Sales with Software and Processes — sweet use of Slack as UI for sales tools.
Duplicate SSH Keys Everywhere — It looks like all devices with the fingerprint are Dropbear SSH instances that have been deployed by Telefonica de Espana. It appears that some of their networking equipment comes set up with SSH by default, and the manufacturer decided to reuse the same operating system image across all devices.
Style.ONS — UK govt style guide covers the elements of writing about statistics. It aims to make statistical content more open and understandable, based on editorial research and best practice. (via Hadley Beeman)
Warren Ellis on the Apple Watch — I, personally, want to put a gold chain on my phone, pop it into a waistcoat pocket, and refer to it as my “digital fob watch” whenever I check the time on it. Just to make the point in as snotty and high-handed a way as possible: This is the decadent end of the current innovation cycle, the part where people stop having new ideas and start adding filigree and extra orifices to the stuff we’ve got and call it the future.

Four short links: 19 December 2014

Statistical Causality, Clustering Bitcoin, Hardware Security, and A Language for Scripts

by Nat Torkington | @gnat | +Nat Torkington | December 19, 2014

Distinguishing Cause and Effect using Observational Data — research paper evaluating effectiveness of the “additive noise” test, a nifty statistical trick to identify causal relationships from observational data. (via Slashdot)
Clustering Bitcoin Accounts Using Heuristics (O’Reilly Radar) — In theory, a user can go by many different pseudonyms. If that user is careful and keeps the activity of those different pseudonyms separate, completely distinct from one another, then they can really maintain a level of, maybe not anonymity, but again, cryptographically it’s called pseudo-anonymity. […] It turns out in reality, though, the way most users and services are using bitcoin, was really not following any of the guidelines that you would need to follow in order to achieve this notion of pseudo-anonymity. So, basically, what we were able to do is develop certain heuristics for clustering together different public keys, or different pseudonyms.
A Primer on Hardware Security: Models, Methods, and Metrics (PDF) — Camouflaging: This is a layout-level technique to hamper image-processing-based extraction of gate-level netlist. In one embodiment of camouflaging, the layouts of standard cells are designed to look alike, resulting in incorrect extraction of the netlist. The layout of nand cell and the layout of nor cell look different and hence their functionality can be extracted. However, the layout of a camouflaged nand cell and the layout of camouflaged nor cell can be made to look identical and hence an attacker cannot unambiguously extract their functionality.
Prompter: A Domain-Specific Language for Versu (PDF) — literally a scripting language (you write theatrical-style scripts, characters, dialogues, and events) for an inference engine that lets you talk to characters and have a different story play out each time.

Four short links: 21 October 2014

Data Delusions, OS Robotics, Insecure Crypto, and Free Icons

by Nat Torkington | @gnat | +Nat Torkington | October 21, 2014

The Delusions of Big Data (IEEE) — When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.
ROSCON 2014 — slides and videos of talks from Chicago open source robotics conference.
Making Sure Crypto Stays Insecure (PDF) — Daniel J. Bernstein talk: This talk is actually a thought experiment: how could an attacker manipulate the ecosystem for insecurity?
Material Design Icons — Google’s CC-licensed (attribution, sharealike) collection of sweet, straightforward icons.

Four short links: 15 October 2014

Recognising Uncertainty, Responsive Screenshots, Rapid Prototyping, and SD Drones

by Nat Torkington | @gnat | +Nat Torkington | October 15, 2014

Guidance Note on Uncertainty (PDF) –expert advice to IPCC scientists on identifying, quantifying, and communicating uncertainty. Everyone deals with uncertainty, but none are quite so ruthless in their pursuit of honesty about it as scientists. (via Peter Gluckman)
pageres — Responsive website screenshots. (via infovore)
SparkFun Rapid Prototyping Lab — with links to some other expert advice on creative spaces. Some very obvious software parallels, too. E.g., this from Adam Savage’s advice: The right tool for the job – Despite his oft-cited declaration that ‘every tool is a hammer,’ Adam can usually be relied on to geek-out about purpose-built tools. If you’re having trouble learning a new skill, check that you’re using the right tools. The right tool is the one that does the hard work for you. There’s no point in dropping big bucks on tools you’re almost certainly not going to use, but don’t be afraid to buy the cheap version of the snap-setter, or leather punch, or tamper bit before trying to jerry-rig something that will end up making your life harder.
Dudes with Drones (The Atlantic) — ghastly title (“Bros with Bots”, “Bangers with Clangers”, and “Fratboys with Phat Toys” were presumably already taken), interesting article. San Diego is the Palo Alto of drones. Interesting to compare software startups with the hardware crews’ stance on the FAA. “We want them to regulate us,” Maloney says. “We want nothing more than a framework to allow us to continue to operate safely and legally.”

Four short links: 9 June 2014

SQL against Text, Fake Social Networks, Hidden Biases, and Versioned Data

by Nat Torkington | @gnat | +Nat Torkington | June 9, 2014

textql — execute SQL against structured text like CSV or TSV.
Social Network Structure of Fake Friends — author bought 4,000 Twitter followers and studied their relationships.
Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.

Four short links: 26 May 2014

Statistical Sensitivity, Scientific Mining, Data Mining Books, and Two-Sided Smartphones

by Nat Torkington | @gnat | +Nat Torkington | May 26, 2014

Car Alarms and Smoke Alarms (Slideshare) — how to think about and draw the line between sensitivity and specificity.
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
12 Free-as-in-beer Data Mining Books — for your next flight.
Dual-Touch Smartphone Concept — brilliant design sketches for interactivity using the back of the phone as a touch-sensitive input device.

Four short links: 21 May 2014

Funnel Tool, Security Tools, Inside Mac Malware, and Everything is Broken

by Nat Torkington | @gnat | +Nat Torkington | May 21, 2014

EventHub — open source funnel/cohort/a-b analysis tool.
Mantra — a collection of free/open source security tools, integrated into a browser (Firefox or Chromium).
Reverse Engineering Mac Malware (PDF) — fascinating to see how it’s shipped, bundled, packaged, and distributed.
Everything is Broken (Quinn Norton) — Computers have gotten incredibly complex, while people have remained the same gray mud with pretensions of Godhood. Today’s required read, because everything is broken and it’s the defining characteristic of this age of software. We have built computers in our image: our cancerous STD-addled diabetic alcoholic lead-sniffing telomere-decaying bacteria- and virus-addled image.

Four short links: 3 March 2014

Vanishing Money, Car Hackery, Data Literacy Course, and Cheaper CI

by Nat Torkington | @gnat | +Nat Torkington | March 3, 2014

The Programming Error That Cost Mt Gox 2609 Bitcoins — in the unforgiving world of crypto-currency, it’s easy to miscode and vanish your money.
Ford Invites Open-Source Community to Tinker Away — One example: Nelson has re-tasked the motor from a Microsoft Xbox 360 game controller to create an OpenXC shift knob that vibrates to signal gear shifts in a standard-transmission Mustang. The 3D-printed prototype shift knob uses Ford’s OpenXC research platform to link devices to the car via Bluetooth, and shares vehicle data from the on-board diagnostics port. Nelson has tested his prototype in a Ford Mustang Shelby GT500 that vibrates at the optimal time to shift.
Making Sense of Data — Google online course on data literacy.
Cost-Efficient Continuous Integration at Mozilla — CI on a big project can imply hundreds if not thousands of VMs on Amazon spinning up to handle compiles and tests. This blog post talks about Mozilla’s efforts to reduce its CI-induced spend without reducing the effectiveness of its CI practices.

"statistics" entries