P Values are not Error Probabilities (PDF) — In particular, we illustrate how this mixing of statistical testing methodologies has resulted in widespread confusion over the interpretation of p values (evidential measures) and α levels (measures of error). We demonstrate that this confusion was a problem between the Fisherian and Neyman–Pearson camps, is not uncommon among statisticians, is prevalent in statistics textbooks, and is well nigh universal in the pages of leading (marketing) journals. This mass confusion, in turn, has rendered applications of classical statistical testing all but meaningless among applied researchers.
Modern Methods for Sentiment Analysis — Recently, Google developed a method called Word2Vec that captures the context of words, while at the same time reducing the size of the data. Gentle introduction, with code.
gunrock — a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. (via Ben Lorica)
How to Share Data with a Statistician — some instruction on the best way to share data to avoid the most common pitfalls and sources of delay in the transition from data collection to data analysis.
Bazel — a build tool, i.e. a tool that will run compilers and tests to assemble your software, similar to Make, Ant, Gradle, Buck, Pants, and Maven. Google’s build tool, to be precise.
Duplicate SSH Keys Everywhere — It looks like all devices with the fingerprint are Dropbear SSH instances that have been deployed by Telefonica de Espana. It appears that some of their networking equipment comes set up with SSH by default, and the manufacturer decided to reuse the same operating system image across all devices.
Style.ONS — UK govt style guide covers the elements of writing about statistics. It aims to make statistical content more open and understandable, based on editorial research and best practice. (via Hadley Beeman)
Warren Ellis on the Apple Watch — I, personally, want to put a gold chain on my phone, pop it into a waistcoat pocket, and refer to it as my “digital fob watch” whenever I check the time on it. Just to make the point in as snotty and high-handed a way as possible: This is the decadent end of the current innovation cycle, the part where people stop having new ideas and start adding filigree and extra orifices to the stuff we’ve got and call it the future.
Clustering Bitcoin Accounts Using Heuristics (O’Reilly Radar) — In theory, a user can go by many different pseudonyms. If that user is careful and keeps the activity of those different pseudonyms separate, completely distinct from one another, then they can really maintain a level of, maybe not anonymity, but again, cryptographically it’s called pseudo-anonymity. […] It turns out in reality, though, the way most users and services are using bitcoin, was really not following any of the guidelines that you would need to follow in order to achieve this notion of pseudo-anonymity. So, basically, what we were able to do is develop certain heuristics for clustering together different public keys, or different pseudonyms.
A Primer on Hardware Security: Models, Methods, and Metrics (PDF) — Camouflaging: This is a layout-level technique to hamper image-processing-based extraction of gate-level netlist. In one embodiment of camouflaging, the layouts of standard cells are designed to look alike, resulting in incorrect extraction of the netlist. The layout of nand cell and the layout of nor cell look different and hence their functionality can be extracted. However, the layout of a camouflaged nand cell and the layout of camouflaged nor cell can be made to look identical and hence an attacker cannot unambiguously extract their functionality.
Prompter: A Domain-Specific Language for Versu (PDF) — literally a scripting language (you write theatrical-style scripts, characters, dialogues, and events) for an inference engine that lets you talk to characters and have a different story play out each time.
The Delusions of Big Data (IEEE) — When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.
ROSCON 2014 — slides and videos of talks from Chicago open source robotics conference.
Making Sure Crypto Stays Insecure (PDF) — Daniel J. Bernstein talk: This talk is actually a thought experiment: how could an attacker manipulate the ecosystem for insecurity?
Material Design Icons — Google’s CC-licensed (attribution, sharealike) collection of sweet, straightforward icons.
Guidance Note on Uncertainty (PDF) –expert advice to IPCC scientists on identifying, quantifying, and communicating uncertainty. Everyone deals with uncertainty, but none are quite so ruthless in their pursuit of honesty about it as scientists. (via Peter Gluckman)
SparkFun Rapid Prototyping Lab — with links to some other expert advice on creative spaces. Some very obvious software parallels, too. E.g., this from Adam Savage’s advice: The right tool for the job – Despite his oft-cited declaration that ‘every tool is a hammer,’ Adam can usually be relied on to geek-out about purpose-built tools. If you’re having trouble learning a new skill, check that you’re using the right tools. The right tool is the one that does the hard work for you. There’s no point in dropping big bucks on tools you’re almost certainly not going to use, but don’t be afraid to buy the cheap version of the snap-setter, or leather punch, or tamper bit before trying to jerry-rig something that will end up making your life harder.
Dudes with Drones (The Atlantic) — ghastly title (“Bros with Bots”, “Bangers with Clangers”, and “Fratboys with Phat Toys” were presumably already taken), interesting article. San Diego is the Palo Alto of drones. Interesting to compare software startups with the hardware crews’ stance on the FAA. “We want them to regulate us,” Maloney says. “We want nothing more than a framework to allow us to continue to operate safely and legally.”
Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
Everything is Broken (Quinn Norton) — Computers have gotten incredibly complex, while people have remained the same gray mud with pretensions of Godhood. Today’s required read, because everything is broken and it’s the defining characteristic of this age of software. We have built computers in our image: our cancerous STD-addled diabetic alcoholic lead-sniffing telomere-decaying bacteria- and virus-addled image.
Ford Invites Open-Source Community to Tinker Away — One example: Nelson has re-tasked the motor from a Microsoft Xbox 360 game controller to create an OpenXC shift knob that vibrates to signal gear shifts in a standard-transmission Mustang. The 3D-printed prototype shift knob uses Ford’s OpenXC research platform to link devices to the car via Bluetooth, and shares vehicle data from the on-board diagnostics port. Nelson has tested his prototype in a Ford Mustang Shelby GT500 that vibrates at the optimal time to shift.
Cost-Efficient Continuous Integration at Mozilla — CI on a big project can imply hundreds if not thousands of VMs on Amazon spinning up to handle compiles and tests. This blog post talks about Mozilla’s efforts to reduce its CI-induced spend without reducing the effectiveness of its CI practices.