Report from OpenClinica conference
Although open source has not conquered the lucrative market for electronic health records (EHRs) used by hospital systems and increasingly by doctors, it is making strides in many other important areas of health care. One example is clinical research, as evidenced by OpenClinica in field of Electronic Data Capture (EDC) and LabKey for data integration. Last week I attended a conference for people who use OpenClinica in their research or want to make their software work with it.
At any one time, hundreds of thousands of clinical trials are going on around the world, many listed on an FDA site. Many are low-budget and would be reduced to using Excel spreadsheets to store data if they didn’t have the Community edition of OpenClinica. Like most companies with open-source products, OpenClinica uses the “open core” model of an open Community edition and proprietary enhancements in an Enterprise edition. There are about 1200 OpenClinica installations around the world, although estimation is always hard to do with open source projects.
What is Electronic Data Capture? As the technologically archaic name indicates, the concept goes back to the 1970s and refers simply to the storage of data about patients and their clinical trials in a database. It has traditionally been useful for reporting results to funders, audit trails, printing in various formats, and similar tasks in data tracking.
Report from 2013 Health Privacy Summit
The timing was superb for last week’s Health Privacy Summit, held on June 5 and 6 in Washington, DC. First, it immediately followed the 2000-strong Health Data Forum (Health Datapalooza), where concern for patients rights came up repeatedly. Secondly, scandals about US government spying were breaking out and providing a good backdrop for talking about protection our most sensitive personal information–our health data.
The health privacy summit, now in its third year, provides a crucial spotlight on the worries patients and their doctors have about their data. Did you know that two out of three doctors (and probably more–this statistic cites just the ones who admit to it on a survey) have left data out of a patient’s record upon the patient’s request? I have found that the summit reveals the most sophisticated and realistic assessment of data protection in health care available, which is why I look forward to it each year. (I’m also on the planning committee for the summit.) For instance, it took a harder look than most observers at how health care would be affected by patient access to data, and the practice of sharing selected subsets of data, called segmentation.
What effect would patient access have?
An odd perceptual discontinuity exists around patient access to health records. If you go to your doctor and ask to see your records, chances are you will be turned down outright or forced to go through expensive and frustrating magical passes. One wouldn’t know that HIPAA explicitly required doctors long ago to give patients their data, or that the most recent meaningful use rules from the Department of Health and Human Services require doctors to let patients view, download, and transmit their information within four business days of its addition to the record.
Report from the Health Data Forum
Computing practices that used to be religated to experimental outposts are now taking up residence at the center of the health care field. From natural language processing to machine learning to predictive modeling, you see people promising at the health data forum (Health Datapalooza IV) to do it in production environments.
We need to provide data to patients in a form they can understand
Would you take a morning off from work to discuss health care costs and consumer empowerment in health care? Over a hundred people in the Boston area did so on Monday, May 6, for the conference “Empowering Healthcare Consumers: A Community Conversation Conference” at the Suffolk Law School. This fast-paced and wide-ranging conference lasted just long enough to show that hopes of empowering patients and cutting health care costs (which is the real agenda behind most of the conference organizers) run up against formidable hurdles–many involving the provision of data to these consumers.
Review of Mayer-Schönberger and Cukier's Big Data
Measuring a world-shaking trend with feet planted in every area of human endeavor cannot be achieved in a popular book of 200 pages, but one has to start somewhere. I am happy to recommend the adept efforts of Viktor Mayer-Schönberger and Kenneth Cukier as a starting point. Their recent book Big Data: A Revolution That Will Transform How We Live, Work, and Think (recently featured in a video interview on the O’Reilly Strata site) does not quite unravel the mystery of the zeal for recording and measurement that is taking over governments and business, but it does what a good popularization should: alert us to what’s happening, provide some frameworks for talking about it, and provide a launchpad for us to debate the movement’s good and evil.
Because readers of this blog have been grappling with these concerns for some time. I’ll provide the barest summary of topics covered in Mayer-Schönberger and Cukier’s extensive overview, then provide some complementary ideas of my own.
Talks with the Association for Computing Machinery, Open Technology Institute, and Open Source Initiative.
Taking advantage of a recent trip to Washington, DC, I had the privilege of visiting three non-profit organizations who are leaders in the application of computers to changing society. First, I attended the annual meeting of the Association for Computing Machinery’s US Public Policy Council (USACM). Several members of the council then visited the Open Technology Institute (OTI), which is a section of New America Foundation (NAF). Finally, I caught the end of the first general-attendance meeting of the Open Source Initiative (OSI).
In different ways, these organizations are all putting in tremendous effort to provide the benefits of computing to more people of all walks of life and to preserve the vigor and creativity of computing platforms. I found out through my meetings what sorts of systemic change is required to achieve these goals and saw these organizations grapple with a variety of strategies to get there. This report is not a statement from any of these groups, just my personal observations.
Quality and security drive adoption, but community is rising fast
I recently talked to two managers of Black Duck, the first company formed to help organizations deal with the licensing issues involved in adopting open source software. With Tim Yeaton, President and CEO, and Peter Vescuso, Executive Vice President of Marketing and Business Development, I discussed the seventh Future of Open Source survey, from which I’ll post a few interesting insights later. But you can look at the slides for yourself, so this article will focus instead on some of the topics we talked about in our interview. While I cite some ideas from Yeaton and Vescuso, many of the observations below are purely my own.
The spur to collaboration
One theme in the slides is the formation of consortia that develop software for entire industries. One recent example everybody knows about is OpenStack, but many industries have their own impressive collaboration projects, such as GENIVI in the auto industry.
What brings competitors together to collaborate? In the case of GENIVI, it’s the impossibility of any single company meeting consumer demand through its own efforts. Car companies typically take five years to put a design out to market, but customers are used to product releases more like those of cell phones, where you can find something enticingly new every six months. In addition, the range of useful technologies—Bluetooth, etc.—is so big that a company has to become expert at everything at once. Meanwhile, according to Vescuso, the average high-end car contains more than 100 million lines of code. So the pace and complexity of progress is driving the auto industry to work together.
All too often, the main force uniting competitors is the fear of another vendor and the realization that they can never beat a dominant vendor on its own turf. Open source becomes a way of changing the rules out from under the dominant player. OpenStack, for instance, took on VMware in the virtualization space and Amazon.com in the IaaS space. Android attracted phone manufacturers and telephone companies as a reaction to the iPhone.
A valuable lesson can be learned from the history of the Open Software Foundation, which was formed in reaction to an agreement between Sun and AT&T. In the late 1980s, Sun had become the dominant vendor of Unix, which was still being maintained by AT&T. Their combination panicked vendors such as Digital Equipment Corporation and Apollo Computer (you can already get a sense of how much good OSF did them), who promised to create a single, unified standard that would give customers increased functionality and more competition.
The name Open Software Foundation was deceptive, because it was never open. Instead, it was a shared repository into which various companies dumped bad code so they could cynically claim to be interoperable while continuing to compete against each other in the usual way. It soon ceased to exist in its planned form, but did survive in a fashion by merging with X/Open to become the Open Group, an organization of some significance because it maintains the X Window System. Various flavors of BSD failed to dislodge the proprietary Unix vendors, probably because each BSD team did its work in a fairly traditional, closed fashion. It remained up to Linux, a truly open project, to unify the Unix community and ultimately replace the closed Sun/AT&T partnership.
Collaboration can be driven by many things, therefore, but it usually takes place in one of two fashions. In the first, somebody throws out into the field some open source code that everybody likes, as Rackspace and NASA did to launch OpenStack, or IBM did to launch Eclipse. Less common is the GENIVI model, in which companies realize they need to collaborate to compete and then start a project.
A bigger pie for all
The first thing on most companies’ minds when they adopt open source is to improve interoperability and defend themselves against lock-in by vendors. The Future of Open Source survey indicates that the top reasons for choosing open source is its quality (slide 13) and security (slide 15). This is excellent news because it shows that the misconceptions of open source are shattering, and the arguments by proprietary vendors that they can ensure better quality and security will increasingly be seen as hollow.
Fit2Cure taps the public's visual skills to match compounds to targets
In the inspiring tradition of Foldit, the game for determining protein shapes, Fit2Cure crowdsources the problem of finding drugs that can cure the many under-researched diseases of developing countries. Fit2Cure appeals to the player’s visual–even physical–sense of the world, and requires much less background knowledge than Foldit.
There about 7,000 rare diseases, fewer than 5% of which have cures. The number of people currently engaged in making drug discoveries is by no means adequate to study all these diseases. A recent gift to Harvard shows the importance that medical researchers attach to filling the gap. As an alternative approach, abstracting the drug discovery process into a game could empower thousands, if not millions, of people to contribute to this process and make discoveries in diseases that get little attention to scientists or pharmaceutical companies.
The biological concept behind Fit2Cure is that medicines have specific shapes that fit into the proteins of the victim’s biological structures like jig-saw puzzle pieces (but more rounded). Many cures require finding a drug that has the same jig-saw shape and can fit into the target protein molecule, thus preventing it from functioning normally.
How the field of genetics is using data within research and to evaluate researchers
Editor’s note: Earlier this week, Part 1 of this article described Sage Bionetworks, a recent Congress they held, and their way of promoting data sharing through a challenge.
Data sharing is not an unfamiliar practice in genetics. Plenty of cell lines and other data stores are publicly available from such places as the TCGA data set from the National Cancer Institute, Gene Expression Omnibus (GEO), and Array Expression (all of which can be accessed through Synapse). So to some extent the current revolution in sharing lies not in the data itself but in critical related areas.
First, many of the data sets are weakened by metadata problems. A Sage programmer told me that the famous TCGA set is enormous but poorly curated. For instance, different data sets in TCGA may refer to the same drug by different names, generic versus brand name. Provenance–a clear description of how the data was collected and prepared for use–is also weak in TCGA.
In contrast, GEO records tend to contain good provenance information (see an example), but only as free-form text, which presents the same barriers to searching and aggregation as free-form text in medical records. Synapse is developing a structured format for presenting provenance based on the W3C’s PROV standard. One researcher told me this was the most promising contribution of Synapse toward the shared used of genetic information.
Observations from Sage Congress and collaboration through its challenge
The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for lack of one key fuel: data. Here they are left with the equivalent of trying to build skyscrapers with lathes and screwdrivers.
Sage Congress, held this past week in San Francisco, investigated the multiple facets of data in these field: gene sequences, models for finding pathways, patient behavior and symptoms (known as phenotypic data), and code to process all these inputs. A survey of efforts by the organizers, Sage Bionetworks, and other innovations in genetic data handling can show how genetics resembles and differs from other disciplines.
An intense lesson in code sharing
At last year’s Congress, Sage announced a challenge, together with the DREAM project, intended to galvanize researchers in genetics while showing off the growing capabilities of Sage’s Synapse platform. Synapse ties together a number of data sets in genetics and provides tools for researchers to upload new data, while searching other researchers’ data sets. Its challenge highlighted the industry’s need for better data sharing, and some ways to get there.