The Delusions of Big Data (IEEE) — When you have large amounts of data, your appetite for hypotheses tends to get even larger. And if it’s growing faster than the statistical strength of the data, then many of your inferences are likely to be false. They are likely to be white noise.
ROSCON 2014 — slides and videos of talks from Chicago open source robotics conference.
Making Sure Crypto Stays Insecure (PDF) — Daniel J. Bernstein talk: This talk is actually a thought experiment: how could an attacker manipulate the ecosystem for insecurity?
Material Design Icons — Google’s CC-licensed (attribution, sharealike) collection of sweet, straightforward icons.
Guidance Note on Uncertainty (PDF) –expert advice to IPCC scientists on identifying, quantifying, and communicating uncertainty. Everyone deals with uncertainty, but none are quite so ruthless in their pursuit of honesty about it as scientists. (via Peter Gluckman)
SparkFun Rapid Prototyping Lab — with links to some other expert advice on creative spaces. Some very obvious software parallels, too. E.g., this from Adam Savage’s advice: The right tool for the job – Despite his oft-cited declaration that ‘every tool is a hammer,’ Adam can usually be relied on to geek-out about purpose-built tools. If you’re having trouble learning a new skill, check that you’re using the right tools. The right tool is the one that does the hard work for you. There’s no point in dropping big bucks on tools you’re almost certainly not going to use, but don’t be afraid to buy the cheap version of the snap-setter, or leather punch, or tamper bit before trying to jerry-rig something that will end up making your life harder.
Dudes with Drones (The Atlantic) — ghastly title (“Bros with Bots”, “Bangers with Clangers”, and “Fratboys with Phat Toys” were presumably already taken), interesting article. San Diego is the Palo Alto of drones. Interesting to compare software startups with the hardware crews’ stance on the FAA. “We want them to regulate us,” Maloney says. “We want nothing more than a framework to allow us to continue to operate safely and legally.”
Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
Everything is Broken (Quinn Norton) — Computers have gotten incredibly complex, while people have remained the same gray mud with pretensions of Godhood. Today’s required read, because everything is broken and it’s the defining characteristic of this age of software. We have built computers in our image: our cancerous STD-addled diabetic alcoholic lead-sniffing telomere-decaying bacteria- and virus-addled image.
Ford Invites Open-Source Community to Tinker Away — One example: Nelson has re-tasked the motor from a Microsoft Xbox 360 game controller to create an OpenXC shift knob that vibrates to signal gear shifts in a standard-transmission Mustang. The 3D-printed prototype shift knob uses Ford’s OpenXC research platform to link devices to the car via Bluetooth, and shares vehicle data from the on-board diagnostics port. Nelson has tested his prototype in a Ford Mustang Shelby GT500 that vibrates at the optimal time to shift.
Cost-Efficient Continuous Integration at Mozilla — CI on a big project can imply hundreds if not thousands of VMs on Amazon spinning up to handle compiles and tests. This blog post talks about Mozilla’s efforts to reduce its CI-induced spend without reducing the effectiveness of its CI practices.
Understanding Understanding Source Code with Functional Magnetic Resonance Imaging (PDF) — we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax error. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing. I’m wary of fMRI studies but welcome more studies that try to identify what we do when we code. (Or, in this case, identify syntax errors—if they wanted to observe real programming, they’d watch subjects creating syntax errors) (via Slashdot)
Oobleck Security (O’Reilly Radar) — if you missed or skimmed this, go back and reread it. The future will be defined by the objects that turn on us. 50s scifi was so close but instead of human-shaped positronic robots, it’ll be our cars, HVAC systems, light bulbs, and TVs. Reminds me of the excellent Old Paint by Megan Lindholm.
Most Winning A/B Test Results are Illusory (PDF) — Statisticians have known for almost a hundred years how to ensure that experimenters don’t get misled by their experiments [...] I’ll show how these methods ensure equally robust results when applied to A/B testing.
tooldiag — a collection of methods for statistical pattern recognition. Implemented in C.
Hacking MicroSD Cards (Bunnie Huang) — In my explorations of the electronics markets in China, I’ve seen shop keepers burning firmware on cards that “expand” the capacity of the card — in other words, they load a firmware that reports the capacity of a card is much larger than the actual available storage. The fact that this is possible at the point of sale means that most likely, the update mechanism is not secured. MicroSD cards come with embedded microcontrollers whose firmware can be exploited.
30c3 — recordings from the 30th Chaos Communication Congress.
Reform Government Surveillance — hard not to view this as a demarcation dispute. “Ruthlessly collecting every detail of online behaviour is something we do clandestinely for advertising purposes, it shouldn’t be corrupted because of your obsession over national security!”
BayesDB — lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.Open source.