# "reproducibility" entries

## ResourceMiner: Toppling the Tower of Babel in the lab

### An open source project aims to crowdsource a common language for experimental design.

Contributing author: Tim Gardner

Editor’s note: This post originally appeared on PLOS Tech; it is republished here with permission.

From Gutenberg’s invention of the printing press to the Internet of today, technology has enabled faster communication, and faster communication has accelerated technology development. Today, we can zip photos from a mountaintop in Switzerland back home to San Francisco with hardly a thought, but that wasn’t so trivial just a decade ago. It’s not just selfies that are being sent; it’s also product designs, manufacturing instructions, and research plans — all of it enabled by invisible technical standards (e.g., TCP/IP) and language standards (e.g., English) that allow machines and people to communicate.

But in the laboratory sciences (life, chemical, material, and other disciplines), communication remains inhibited by practices more akin to the oral traditions of a blacksmith shop than the modern Internet. In a typical academic lab, the reference description of an experiment is the long-form narrative in the “Materials and Methods” section of a paper or a book. Similarly, industry researchers depend on basic text documents in the form of Standard Operating Procedures. In both cases, essential details of the materials and protocol for an experiment are typically written somewhere in a long-forgotten, hard-to-interpret lab notebook (paper or electronic). More typically, details are simply left to the experimenter to remember and to the “lab culture” to retain.

At the dawn of science, when a handful of researchers were working on fundamental questions, this may have been good enough. But nowadays this archaic method of protocol record keeping and sharing is so lacking that half of all biomedical studies are estimated to be irreproducible, wasting $28 billion each year of U.S. government funding. With more than$400 billion invested each year in biological and chemical research globally, the full cost of irreproducible research to the public and private sector worldwide could be staggeringly large. Read more…

## Four short links: 20 May 2015

### Robots and Shadow Work, Time Lapse Mining, CS Papers, and Software for Reproducibility

1. Rise of the Robots and Shadow Work (NY Times) — In “Rise of the Robots,” Ford argues that a society based on luxury consumption by a tiny elite is not economically viable. More to the point, it is not biologically viable. Humans, unlike robots, need food, health care and the sense of usefulness often supplied by jobs or other forms of work. Two thought-provoking and related books about the potential futures as a result of technology-driven change.
2. Time Lapse Mining from Internet Photos (PDF) — First, we cluster 86 million photos into landmarks and popular viewpoints. Then, we sort the photos by date and warp each photo onto a common viewpoint. Finally, we stabilize the appearance of the sequence to compensate for lighting effects and minimize flicker. Our resulting time-lapses show diverse changes in the world’s most popular sites, like glaciers shrinking, skyscrapers being constructed, and waterfalls changing course.
3. Git Repository of CS PapersThe intention here is to both provide myself with backups and easy access to papers, while also collecting a repository of links so that people can always find the paper they are looking for. Pull the repo and you’ll never be short of airplane/bedtime reading.
4. Software For Reproducible ScienceThis quality is indeed central to doing science with code. What good is a data analysis pipeline if it crashes when I fiddle with the data? How can I draw conclusions from simulations if I cannot change their parameters? As soon as I need trust in code supporting a scientific finding, I find myself tinkering with its input, and often breaking it. Good scientific code is code that can be reused, that can lead to large-scale experiments validating its underlying assumptions.

## Democratizing biotech research

### The O'Reilly Radar Podcast: DJ Kleinbaum on lab automation, virtual lab services, and tackling the challenges of reproducibility.

The convergence of software and hardware, and the growing ubiquitousness of the Internet of Things is affecting industry across the board, and biotech labs are no exception. For this Radar Podcast episode, I chatted with DJ Kleinbaum, co-founder of Emerald Therapeutics, about lab automation, the launch of Emerald Cloud Laboratory, and the problem of reproducibility.

Subscribe to the O’Reilly Radar Podcast

Kleinbaum and his co-founder Brian Frezza started Emerald Therapeutics to research cures for persistent viral infections. They didn’t set out to spin up a second company, but their efforts to automate their own lab processes proved so fruitful, they decided to launch a virtual lab-as-a-service business, Emerald Cloud Laboratory. Kleinbaum explained:

“When Brian and I started the company right out of graduate school, we had this platform anti-viral technology, which the company is still working on, but because we were two freshly minted nobody Ph.D.s, we were not going to be able to raise the traditional $20 or$30 million that platform plays raise in the biotech space.

“We knew that we had to be much more efficient with the money we were able to raise. Brian and I both have backgrounds in computer science. So, from the beginning, we were trying to automate every experiment that our scientists ran, such that every experiment was just push a button, walk away. It was all done with process automation and robotics. That way, our scientists would be able to be much more efficient than your average bench chemist or biologist at a biotech company.

“After building that system internally for three years, we looked at it and realized that every aspect of a life sciences laboratory had been encapsulated in both hardware and software, and that that was too valuable a tool to just keep internally at Emerald for our own research efforts. Around this time last year, we decided that we wanted to offer that as a service, that other scientists, companies, and researchers could use to run their experiments as well.” Read more…

## Beyond lab folklore and mythology

### What the future of science will look like if we’re bold enough to look beyond centuries-old models.

Editor’s note: this post is part of our ongoing investigation into synthetic biology and bioengineering. For more on these areas, download the latest free edition of BioCoder.

Over the last six months, I’ve had a number of conversations about lab practice. In one, Tim Gardner of Riffyn told me about a gene transformation experiment he did in grad school. As he was new to the lab, he asked two more experienced scientists for their protocol: one said it must be done exactly at 42 C for 45 seconds, the other said exactly 37 C for 90 seconds. When he ran the experiment, Tim discovered that the temperature actually didn’t matter much. A broad range of temperatures and times would work.

In an unrelated conversation, DJ Kleinbaum of Emerald Cloud Lab told me about students who would only use their “lucky machine” in their work. Why, given a choice of lab equipment, did one of two apparently identical machines give “good” results for a some experiment, while the other one didn’t? Nobody knew. Perhaps it is the tubing that connects the machine to the rest of the experiment; perhaps it is some valve somewhere; perhaps it is some quirk of the machine’s calibration.

The more people I talked to, the more stories I heard: labs where the experimental protocols weren’t written down, but were handed down from mentor to student. Labs where there was a shared common knowledge of how to do things, but where that shared culture never made it outside, not even to the lab down the hall. There’s no need to write it down or publish stuff that’s “obvious” or that “everyone knows.” As someone who is more familiar with literature than with biology labs, this behavior was immediately recognizable: we’re in the land of mythology, not science. Each lab has its own ritualized behavior that “works.” Whether it’s protocols, lucky machines, or common knowledge that’s picked up by every student in the lab (but which might not be the same from lab to lab), the process of doing science is an odd mixture of rigor and folklore. Everybody knows that you use 42 C for 45 seconds, but nobody really knows why. It’s just what you do.

Despite all of this, we’ve gotten fairly good at doing science. But to get even better, we have to go beyond mythology and folklore. And getting beyond folklore requires change: changes in how we record data, changes in how we describe experiments, and perhaps most importantly, changes in how we publish results. Read more…

## Four short links: 22 October 2014

### Docker Patterns, Better Research, Streaming Framework, and Data Science Textbook

1. Eight Docker Development Patterns (Vidar Hokstad) — patterns for creating repeatable builds that result in as-static-as-possible server environments.
2. How to Make More Published Research True (PLOSmedicine) — overview of efforts, and research on those efforts, to raise the proportion of published research which is true.
3. Gearpump — Intel’s “actor-driven streaming framework”, initial benchmarks shows that we can process 2 million messages/second (100 bytes per message) with latency around 30ms on a cluster of 4 nodes.
4. Foundations of Data Science (PDF) — These notes are a first draft of a book being written by Hopcroft and Kannan [of Microsoft Research] and in many places are incomplete. However, the notes are in good enough shape to prepare lectures for a modern theoretical course in computer science.

## Four short links: 13 June 2014

### Decentralized Web, Reproducibility Talk, Javascript Microcontroller, and Docker Maturity

1. Mapping the Decentralized Movement (Jon Udell) — the pendulum is about to swing back toward a more distributed Web.
2. John Ioannidis: Reproducible Research, True or False? (YouTube) — his talk at Google. (via Paul Kedrosky)
3. Tessel — a microcontroller that runs Javascript. For those who can’t handle C.</troll>
4. Docker MisconceptionsThis is not impossible and can all be done – several large companies are already using Docker in production, but it’s definitely non-trivial. This will change as the ecosystem around Docker matures (via Flynn, Docker container hosting, etc), but currently if you’re going to attempt using Docker seriously in production, you need to be pretty skilled at systems management and orchestration.