- The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code into metadata which is then, through a supernatural mix of pixies, steam engines, and binary, executed. This will make your brain leak. Weird machines are everywhere.
- European Libraries May Digitise Books Without Permission — “The right of libraries to communicate, by dedicated terminals, the works they hold in their collections would risk being rendered largely meaningless, or indeed ineffective, if they did not have an ancillary right to digitize the works in question,” the court said. Even if the rights holder offers a library the possibility of licensing his works on appropriate terms, the library can use the exception to publish works on electronic terminals, the court ruled. “Otherwise, the library could not realize its core mission or promote the public interest in promoting research and private study,” it said.
- CausalImpact (GitHub) — Google’s R package for estimating the causal effect of a designed intervention on a time series. (via Google Open Source Blog)
- Laws of Crappy Dashboards — (caution, NSFW language … “crappy” is my paraphrase) so true. Not talking to users will result in a [crappy] dashboard. You don’t know if the dashboard is going to be useful. But you don’t talk to the users to figure it out. Or you just show it to them for a minute (with someone else’s data), never giving them a chance to figure out what the hell they could do with it if you gave it to them.
Trina Chiasson argues that data has arrived at the same threshold as coding: code or be coded; learn to use data or be data.
Arguments from all sides have surrounded the question of whether or not everyone should learn to code. Trina Chiasson, co-founder and CEO of Infoactive, says learning to code changed her life for the better. “These days I don’t spend a lot of time writing code,” she says, “but it’s incredibly helpful for me to be able to communicate with our engineers and communicate with other people in the industry.”
Though helpful for her personally, she admits that it takes quite a lot of time and commitment to learn to code to any level of proficiency, and that it might not be the best use of time for everyone. What should people commit time to learn? How to use data. Read more…
A new operator from the magrittr package makes it easier to use R for data analysis.
In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily flow from one tool to the next, focusing on asking and answering questions of the data, not struggling to jam the output from one function into the format needed for the next. Wouldn’t it be nice if the world worked this way! I spend a lot of my time thinking about this problem, and how to make the process of data analysis as fast, effective, and expressive as possible. Today, I want to show you a new technique that I’m particularly excited about.
R, at its heart, is a functional programming language: you do data analysis in R by composing functions. However, the problem with function composition is that a lot of it makes for hard-to-read code. For example, here’s some R code that wrangles flight delay data from New York City in 2013. What does it do? Read more…
D3 doesn’t stand for data-design dictator
Designers and developers making data visualizations on the web are buzzing about d3.js. But why? Read more…
- SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
- madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
- Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
- Disguise Detection — using Raspberry Pi, Arduino, and Python.
3D Fossils, Changing Drone Uses, High Scalability, and Sim Redux
- CT Scanning and 3D Printing for Paleo (Scientific American) — using CT scanners to identify bones still in rock, then using 3D printers to recreate them. (via BoingBoing)
- Growing the Use of Drones in Agriculture (Forbes) — According to Sue Rosenstock, 3D Robotics spokesperson, a third of their customers consist of hobbyists, another third of enterprise users, and a third use their drones as consumer tools. “Over time, we expect that to change as we make more enterprise-focused products, such as mapping applications,” she explains. (via Chris Anderson)
- Serving 1M Load-Balanced Requests/Second (Google Cloud Platform blog) — 7m from empty project to serving 1M requests/second. I remember when 1 request/second was considered insanely busy. (via Forbes)
- Boil Up — behind the scenes for the design and coding of a real-time simulation for a museum’s science exhibit. (via Courtney Johnston)
Disk Over Ethernet, Inside Elite, Polar Charts, and R Videos
- Seagate Kinetic Storage — In the words of Geoff Arnold: The physical interconnect to the disk drive is now Ethernet. The interface is a simple key-value object oriented access scheme, implemented using Google Protocol Buffers. It supports key-based CRUD (create, read, update and delete); it also implements third-party transfers (“transfer the objects with keys X, Y and Z to the drive with IP address 22.214.171.124”). Configuration is based on DHCP, and everything can be authenticated and encrypted. The system supports a variety of key schemas to make it easy for various storage services to shard the data across multiple drives.
- Masters of Their Universe (Guardian) — well-written and fascinating story of the creation of the Elite game (one founder of which went on to make the Raspberry Pi). The classic action game of the early 1980s – Defender, Pac Man – was set in a perpetual present tense, a sort of arcade Eden in which there were always enemies to zap or gobble, but nothing ever changed apart from the score. By letting the player tool up with better guns, Bell and Braben were introducing a whole new dimension, the dimension of time.
- Micropolar (github) — A tiny polar charts library made with D3.js.
- Introduction to R (YouTube) — 21 short videos from Google.
Visual Arduino Coding, Hardware Iteration, Segmenting Images, and Client-Side Adjustable Data View
- Visually Programming Arduino — good for little minds.
- Rapid Hardware Iteration at Scale (Forbes) — It’s part of the unique way that Xiaomi operates, closely analyzing the user feedback it gets on its smartphones and following the suggestions it likes for the next batch of 100,000 phones. It releases them every Tuesday at noon Beijing time.
- Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images (PLoS One) — We propose an active learning approach for performing hierarchical agglomerative segmentation from superpixels. Our method combines multiple features at all scales of the agglomerative process, works for data with an arbitrary number of dimensions, and scales to very large datasets.
- Kratu — an Open Source client-side analysis framework to create simple yet powerful renditions of data. It allows you to dynamically adjust your view of the data to highlight issues, opportunities and correlations in the data.