- SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA)
- madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
- Data Portraits: Connecting People of Opposing Views — Yahoo! Labs research to break the filter bubble. Connect people who disagree on issue X (e.g., abortion) but who agree on issue Y (e.g., Latin American interventionism), and present the differences and similarities visually (they used wordclouds). Our results suggest that organic visualisation may revert the negative effects of providing potentially sensitive content. (via MIT Technology Review)
- Disguise Detection — using Raspberry Pi, Arduino, and Python.
3D Fossils, Changing Drone Uses, High Scalability, and Sim Redux
- CT Scanning and 3D Printing for Paleo (Scientific American) — using CT scanners to identify bones still in rock, then using 3D printers to recreate them. (via BoingBoing)
- Growing the Use of Drones in Agriculture (Forbes) — According to Sue Rosenstock, 3D Robotics spokesperson, a third of their customers consist of hobbyists, another third of enterprise users, and a third use their drones as consumer tools. “Over time, we expect that to change as we make more enterprise-focused products, such as mapping applications,” she explains. (via Chris Anderson)
- Serving 1M Load-Balanced Requests/Second (Google Cloud Platform blog) — 7m from empty project to serving 1M requests/second. I remember when 1 request/second was considered insanely busy. (via Forbes)
- Boil Up — behind the scenes for the design and coding of a real-time simulation for a museum’s science exhibit. (via Courtney Johnston)
Disk Over Ethernet, Inside Elite, Polar Charts, and R Videos
- Seagate Kinetic Storage — In the words of Geoff Arnold: The physical interconnect to the disk drive is now Ethernet. The interface is a simple key-value object oriented access scheme, implemented using Google Protocol Buffers. It supports key-based CRUD (create, read, update and delete); it also implements third-party transfers (“transfer the objects with keys X, Y and Z to the drive with IP address 126.96.36.199”). Configuration is based on DHCP, and everything can be authenticated and encrypted. The system supports a variety of key schemas to make it easy for various storage services to shard the data across multiple drives.
- Masters of Their Universe (Guardian) — well-written and fascinating story of the creation of the Elite game (one founder of which went on to make the Raspberry Pi). The classic action game of the early 1980s – Defender, Pac Man – was set in a perpetual present tense, a sort of arcade Eden in which there were always enemies to zap or gobble, but nothing ever changed apart from the score. By letting the player tool up with better guns, Bell and Braben were introducing a whole new dimension, the dimension of time.
- Micropolar (github) — A tiny polar charts library made with D3.js.
- Introduction to R (YouTube) — 21 short videos from Google.
Visual Arduino Coding, Hardware Iteration, Segmenting Images, and Client-Side Adjustable Data View
- Visually Programming Arduino — good for little minds.
- Rapid Hardware Iteration at Scale (Forbes) — It’s part of the unique way that Xiaomi operates, closely analyzing the user feedback it gets on its smartphones and following the suggestions it likes for the next batch of 100,000 phones. It releases them every Tuesday at noon Beijing time.
- Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images (PLoS One) — We propose an active learning approach for performing hierarchical agglomerative segmentation from superpixels. Our method combines multiple features at all scales of the agglomerative process, works for data with an arbitrary number of dimensions, and scales to very large datasets.
- Kratu — an Open Source client-side analysis framework to create simple yet powerful renditions of data. It allows you to dynamically adjust your view of the data to highlight issues, opportunities and correlations in the data.
Rich Text Editing, Structural Visualisation, DDoS Protection, Realtime DDoS Map
- Sir Trevor — nice rich-text editing. Interesting how Markdown has become the way to store formatted text without storing HTML (and thus exposing the CSRF-inducing HTML-escaping stuckfastrophe).
- Slate for Excel — visualising spreadsheet structure. I’d be surprised if it took MSFT or Goog 30 days to acquire them.
- Project Shield — Google project to protect against DDoSes.
- Digital Attack Map — DDoS attacks going on around the world. (via Jim Stogdill)
No Managers, Bezos Pearls, Visualising History, and Scalable Key-Value Store
- No Managers — If we could find a way to replace the function of the managers and focus everyone on actually producing for our Students (customers) then it would actually be possible to be a #NoManager company. In my future posts I’ll explain how we’re doing this at Treehouse.
- The 20 Smartest Things Jeff Bezos Has Ever Said (Motley Fool) — I feel like the 219th smartest thing Jeff Bezos has ever said is still smarter than the smartest thing most business commentators will ever say. (He says, self-referentially) “Invention requires a long-term willingness to be misunderstood.”
- Putting Time in Perspective — nifty representations of relative timescales and history. (via BoingBoing)
- Sophia — BSD-licensed small C library implementing an embeddable key-value database “for a high-load environment”.
Modern data processing tools, many of them open source, allow more clinical studies at lower costs
This guest posting was written by Yadid Ayzenberg (@YadidAyzenberg on Twitter). Yadid is a PhD student in the Affective Computing Group at the MIT Media Lab. He has designed and implemented cloud platforms for the aggregation, processing and visualization of bio-physiological sensor data. Yadid will speak on this topic at the Strata Rx conference.
A few weeks ago, I learned that the Framingham Heart Study would lose $4 million (a full 40 percent of its funding) from the federal government due to automatic spending cuts. This seminal study, begun in 1948, set out to identify the contributing factors to Cardiovascular Disease (CVD) by following a group of 5,209 men and woman and tracking their life style habits, performing regular physical examinations and lab tests. This study was responsible for finding the major risk factors for CVD, such as high blood pressure and lack of exercise. The costs associated with such large-scale clinical studies are prohibitive, making them accessible only to organizations with sufficient financial resources or through government funding.
Health data can go beyond the averages and first order patient characteristics to find long-term trends
This article was written with Arijit Sengupta, CEO of BeyondCore. Tim and Arijit will speak at Strata Rx 2013 on the topic of this post.
Current healthcare cost prevention efforts focus on the top 1% of highest risk patients. As care coordination efforts expand to a larger set of the patient population, the critical question is: If you’re a care manager, which patients should you offer additional care to at any given point in time? Our research shows that focusing on patients with the highest risk scores or highest current costs create suboptimal roadmaps. In this article we share an approach to predict patients whose costs are about to skyrocket, using a hypothesis-free micro-segmentation analysis. From there, working with physicians and care managers, we can formulate appropriate interventions.
Better Tutorials, Self-Talk, Better AI, and Visualised Mechanics
- pineapple.io — attempt to crowdsource rankings for tutorials for important products, so you’re not picking your way through Google search results littered with tutorials written by incompetent illiterates for past versions of the software.
- BBC Forum — American social psychologist Aleks Krotoski has been looking at how the internet affects the way we talk to ourselves. Podcast (available for next 30 days) from BBC. (via Vaughan Bell)
- Why Can’t My Computer Understand Me (New Yorker) — using anaphora as the basis of an intelligence test, as example of what AI should be striving for. It’s not just that contemporary A.I. hasn’t solved these kinds of problems yet; it’s that contemporary A.I. has largely forgotten about them. In Levesque’s view, the field of artificial intelligence has fallen into a trap of “serial silver bulletism,” always looking to the next big thing, whether it’s expert systems or Big Data, but never painstakingly analyzing all of the subtle and deep knowledge that ordinary human beings possess. That’s a gargantuan task— “more like scaling a mountain than shoveling a driveway,” as Levesque writes. But it’s what the field needs to do.
- 507 Mechanical Movements — an old basic engineering textbook, animated. Me gusta.
Aural Viz, SPOF ID, Information Asymmetry, and Support IA
- choir.io explained (Alex Dong) — Sound is the perfect medium for wearable computers to talk back to us. Sound has a dozen of properties that we can tune to convey different level of emotions and intrusivenesses. Different sound packs would fit into various contexts.
- Identity Single Point of Failure (Tim Bray) — continuing his excellent series on federated identity. There’s this guy here at Google, Eric Sachs, who’s been doing Identity stuff in the white-hot center of the Internet universe for a lot of years. One of his mantras is “If you’re typing a password into something, unless they have 100+ full-time engineers working on security and abuse and fraud, you should be nervous.” I think he’s right.
- What Does It Really Matter If Companies Are Tracking Us Online? (The Atlantic) — Rather, the failures will come in the form of consumers being systematically charged more than they would have been had less information about that particular consumer. Sometimes, that will mean exploiting people who are not of a particular class, say upcharging men for flowers if a computer recognizes that that he’s looking for flowers the day after his anniversary. A summary of Ryan Calo’s paper. (via Slashdot)
- Life Inside Brewster’s Magnificent Contraption (Jason Scott) — I’ve been really busy. Checking my upload statistics, here’s what I’ve added to the Internet Archive: Over 169,000 individual objects, totaling 245 terabytes. You should subscribe and keep them in business. I did.