- A Cyber Attack Against Israel Shut Down a Road — The hackers targeted the Tunnels’ camera system which put the roadway into an immediate lockdown mode, shutting it down for twenty minutes. The next day the attackers managed to break in for even longer during the heavy morning rush hour, shutting the entire system for eight hours. Because all that is digital melts into code, and code is an unsolved problem.
- Random Decision Forests (PDF) — “Due to the nature of the algorithm, most Random Decision Forest implementations provide an extraordinary amount of information about the final state of the classifier and how it derived from the training data.” (via Greg Borenstein)
- BITalino — 149 Euro microcontroller board full of physiological sensors: muscles, skin conductivity, light, acceleration, and heartbeat. A platform for healthcare hardware hacking?
- How to Be a Programmer — a braindump from a guru.
The Internot of Things, Explainy Learning, Medical Microcontroller Board, and Coder Sutra
The Internet of Americas, Pharma Pricey, Who's Watching, and Data Mining Course
- Bradley Manning and the Two Americas (Quinn Norton) — The first America built the Internet, but the second America moved onto it. And they both think they own the place now. The best explanation you’ll find for wtf is going on.
- Staggering Cost of Inventing New Drugs (Forbes) — $5BB to develop a new drug; and subject to an inverse-Moore’s law: A 2012 article in Nature Reviews Drug Discovery says the number of drugs invented per billion dollars of R&D invested has been cut in half every nine years for half a century.
- Who’s Watching You — (Tim Bray) threat modelling. Everyone should know this.
- Data Mining with Weka — learn data mining with the popular open source Weka platform.
Spatial Verbs, Open Source Malaria, Surviving Management, and Paper-like UAV
- Operative Design — A catalogue of spatial verbs. (via Adafruit)
- Open Source Malaria — open science drug discovery.
- Surviving Being (Senior) Tech Management (Kellan Elliott-McCrea) — Perspective is the thin line between a challenging but manageable problem, and chittering balled up in the corner.
- Disposable UAVs Inspired by Paper Planes (DIY Drones) — The first design, modeled after a paper plane, is created from a cellulose sheet that has electronic circuits ink-jet printed directly onto its body. Once the circuits have been laid on the plane’s frame, the craft is exposed to a UV curing process, turning the planes body into a flexible circuit board. These circuits are then connected to the planes “avionics system”, two elevons attached to the rear of the craft, which give the UAV the ability to steer itself to its destination.
Fit2Cure taps the public's visual skills to match compounds to targets
In the inspiring tradition of Foldit, the game for determining protein shapes, Fit2Cure crowdsources the problem of finding drugs that can cure the many under-researched diseases of developing countries. Fit2Cure appeals to the player’s visual–even physical–sense of the world, and requires much less background knowledge than Foldit.
There about 7,000 rare diseases, fewer than 5% of which have cures. The number of people currently engaged in making drug discoveries is by no means adequate to study all these diseases. A recent gift to Harvard shows the importance that medical researchers attach to filling the gap. As an alternative approach, abstracting the drug discovery process into a game could empower thousands, if not millions, of people to contribute to this process and make discoveries in diseases that get little attention to scientists or pharmaceutical companies.
The biological concept behind Fit2Cure is that medicines have specific shapes that fit into the proteins of the victim’s biological structures like jig-saw puzzle pieces (but more rounded). Many cures require finding a drug that has the same jig-saw shape and can fit into the target protein molecule, thus preventing it from functioning normally.
How the field of genetics is using data within research and to evaluate researchers
Editor’s note: Earlier this week, Part 1 of this article described Sage Bionetworks, a recent Congress they held, and their way of promoting data sharing through a challenge.
Data sharing is not an unfamiliar practice in genetics. Plenty of cell lines and other data stores are publicly available from such places as the TCGA data set from the National Cancer Institute, Gene Expression Omnibus (GEO), and Array Expression (all of which can be accessed through Synapse). So to some extent the current revolution in sharing lies not in the data itself but in critical related areas.
First, many of the data sets are weakened by metadata problems. A Sage programmer told me that the famous TCGA set is enormous but poorly curated. For instance, different data sets in TCGA may refer to the same drug by different names, generic versus brand name. Provenance–a clear description of how the data was collected and prepared for use–is also weak in TCGA.
In contrast, GEO records tend to contain good provenance information (see an example), but only as free-form text, which presents the same barriers to searching and aggregation as free-form text in medical records. Synapse is developing a structured format for presenting provenance based on the W3C’s PROV standard. One researcher told me this was the most promising contribution of Synapse toward the shared used of genetic information.
Observations from Sage Congress and collaboration through its challenge
The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for lack of one key fuel: data. Here they are left with the equivalent of trying to build skyscrapers with lathes and screwdrivers.
Sage Congress, held this past week in San Francisco, investigated the multiple facets of data in these field: gene sequences, models for finding pathways, patient behavior and symptoms (known as phenotypic data), and code to process all these inputs. A survey of efforts by the organizers, Sage Bionetworks, and other innovations in genetic data handling can show how genetics resembles and differs from other disciplines.
An intense lesson in code sharing
At last year’s Congress, Sage announced a challenge, together with the DREAM project, intended to galvanize researchers in genetics while showing off the growing capabilities of Sage’s Synapse platform. Synapse ties together a number of data sets in genetics and provides tools for researchers to upload new data, while searching other researchers’ data sets. Its challenge highlighted the industry’s need for better data sharing, and some ways to get there.
In which the question of whether research subjects have any rights to their data is pondered.
The GET (Genomes, Environments and Traits) conference is a confluence of parties interested in the advances being made in human genomes, the measurement of how the environment impacts individuals, and how the two come together to produce traits. Sponsored by the organizers of the Personal Genome Project (PGP) at Harvard, it is a two-day event whose topics range from the appropriate amount of access that patients should have to their genetics data to the ways that Hollywood can be convinced to portray genomics more accurately.
It also is a yearly meeting place for the participants in the Personal Genome Project (one of whom is your humble narrator), people who have agreed to participate in an “open consent” research model. Among other things, this means that PGP participants agree to let their cell lines be used for any purposes (research or commercial). They also acknowledge ahead of time that because their genomes and phenotypic traits are being released publicly, there is a high likelihood that interested parties may be able to identify them from their data. The long term goal of the PGP is to enroll 100,000 participants and perform whole genome sequencing of their DNA, they currently have nearly 2,300 enrolled participants and have sequenced around 165 genomes.
Patenting Preventing Placebos, Simulating Malaria, Pricing Experiments, and Mining Bitcoin
- Patent on Medical Trial Design to Reduce Placebo Effect — drug companies say these failures are happening not because their drugs are ineffective, but because placebos have recently become more effective in clinical trials. […] The whole idea that placebo effect is getting in the way of producing meaningful results is repugnant, I think, to anyone with scientific training. What’s even more repugnant, however, is that Fava’s group didn’t stop with a mere paper in Psychotherapy and Psychosomatics. They went on to apply for, and obtain, U.S. patents on SPCD. (via Ben Goldacre)
- OpenMalaria (Google Code) — an open source C++ program for simulating malaria epidemiology and the impacts on that epidemiology of interventions against malaria. It is based on microsimulations of Plasmodium falciparum malaria in humans, originally developed for simulating malaria vaccines. (via Victoria Stodden)
- Pricing Experiments You Might Not Know But Can Learn From — compendium of ideas and experiments for pricing.
- Retrominer — mining Bitcoins on a NES. I’m delighted by the conceit, and noticing that Bitcoin is now sufficiently part of the zeitgeist as to feature in playful hacks.
Open Source Cancer Informatics, NPR Framework, Littery Junk, BitTorrent Sync
- Open Source Cancer Informatics Software (NCIP) — we have tackled the main recommendation that came out of our June meeting with open-source thought leaders: Keep it simple. Make barriers to entry as low as possible, and reuse available resources. Specifically, we have adopted a software license that is approved by the Open Source Initiative (OSI) and have begun to migrate the code developed under the cancer Biomedical Informatics Grid® (caBIG®) Program to a public repository. Our goal in taking these steps is to remove as many barriers as possible to community participation in the continuing development of these assets. Awesome! (via John Scott)
- NPR’s Framework for Easy Apps — their three architectural maxims: Servers are for chumps; If it doesn’t work on mobile, it doesn’t work; and Build for use. Refactor for reuse..
- Random Junk in People’s Labs (Reddit) — reminded me of the contents of my “tmp” and “Downloads” and “Documents” directories: unstructured historical crap with no expiration and no current use. (Caution: swearing in the title of the Reddit post) (via Mihalyi Csikszentmihalyi)
- Sync — BitTorrent’s alpha-level tech to “automatically sync files between computers via secure, distributed technology.” Not only is it “slick for alpha” (as one friend described), it’s bloody useful: I know at least one multimillion-dollar project built on their own homegrown implementation of this same idea. (via Jason Ryan)
Medical Data Commons, Verizon Sell You, Doctor Watson, and Weedkilling Drones
- Let’s Pool Our Medical Data (TED) — John Wilbanks (of Science Commons fame) gives a strong talk for creating an open, massive, mine-able database of data about health and genomics from many sources. Money quote: Facebook would never make a change to something as important as an advertising with a sample size as small as a Phase 3 clinical trial.
- Verizon Sells App Use, Browsing Habits, Location (CNet) — Verizon Wireless has begun selling information about its customers’ geographical locations, app usage, and Web browsing activities, a move that raises privacy questions and could brush up against federal wiretapping law. To Verizon, even when you do pay for it, you’re still the product. Carriers: they’re like graverobbing organ harvesters but without the strict ethical standards.
- IBM Watson About to Launch in Medicine (Fast Company) — This fall, after six months of teaching their treatment guidelines to Watson, the doctors at Sloan-Kettering will begin testing the IBM machine on real patients. […] On the screen, a colorful globe spins. In a few seconds, Watson offers three possible courses of chemotherapy, charted as bars with varying levels of confidence–one choice above 90% and two above 80%. “Watson doesn’t give you the answer,” Kris says. “It gives you a range of answers.” Then it’s up to [the doctor] to make the call. (via Reddit)
- Robot Kills Weeds With 98% Accuracy — During tests, this automated system gathered over a million images as it moved through the fields. Its Computer Vision System was able to detect and segment individual plants – even those that were touching each other – with 98% accuracy.