|
|
|||||
Collaborative genetics, Part 1: The ambitious goals of Sage Commons CongressIn a field rife with drug-addicted industries that derive billions of dollars from a single product, and stocked with researchers who scramble for government grants (sadly cut back by the recent US federal budget), the open sharing of genetic data and tools may seem a dream. But it must be more than a dream when the Sage Commons Congress can draw 150 attendees (turning away many more) from research institutions such as the Netherlands Bioinformatica Centre and Massachusetts General Hospital, leading universities from the US and Europe, a whole roster of drug companies (Pfizer, Merck, Novartis, Lilly, Genentech), tech companies such as Microsoft and Amazon.com, foundations such as Alfred P. Sloan, and representatives from the FDA and the White House. I felt distinctly ill at ease trying to fit into such a well-educated crowd, but was welcomed warmly and soon found myself using words such as "phenotype" and "patient stratification." Money is not the only complicating factor when trying to share knowledge about our genes and their effect on our health. The complex relationships of information generation, and how credit is handed out for that information, make biomedical data a case study all its own. Update, May 25, 2011: presentations from the Sage Commons Congress are now available online. The complexity of health research dataI listened a couple weeks ago as researchers at this congress, held by Sage Bionetworks, questioned some of their basic practices, and I realized that they are on the leading edge of redefining what we consider information. For most of the history of science, information consisted of a published paper, and the scientist tucked his raw data in a moldy desk drawer. Now we are seeing a trend in scientific journals toward requiring authors to release the raw data with the paper (one such repository in biology is Dryad). But this is only the beginning. Consider what remains to be done:
A repeated theme at the Congress was "going beyond the narrative." The narrative here is the published article. Each article tells a story and draws conclusions. But a lot goes on behind the scenes in the art and science of medicine. Furthermore, letting new hypotheses emerge from data is just as important as verifying the narrative provided by one's initial hypothesis. One of the big questions raised in my mind--and not covered in the conference--was the effect it would have on the education of the next generation of scientists were teams to expose all those hidden aspects of data: the workflows, the curation and validation techniques, the interpretations. Perhaps you wouldn't need to attend the University of California at Berkeley to get a Berkeley education, or risk so many parking tickets along the way. Certainly, young researchers would have powerful resources for developing their craft, just as programmers have with the source code for free software. I've just gone over a bit of the material that the organizers of the Sage Commons Congress want their field to share. Let's turn to some of structures and mechanisms. Of networksTake a step back. Why do geneticists need to share data? There are oodles of precedents, of course: the Human Genome Project, biobricks, the Astrophysics Data System (shown off in a keynote by Alyssa A. Goodman from Harvard), open courseware, open access journals, and countless individual repositories put up by scientists. A particularly relevant data sharing initiative is the International HapMap Project, working on a public map of the human genome "which will describe the common patterns of human DNA sequence variation." This is not a loose crowdsourcing project, but more like a consortium of ten large research centers promising to release results publicly and forgo patents on the results. The field of genetics presents specific challenges that frustrate old ways of working as individuals in labs that hoard data. Basically, networks of genetic expression requires networks of researchers to untangle them. In the beginning, geneticists modeled activities in the cell through linear paths. A particular protein would activate or inhibit a particular gene that would then trigger other activities with ultimate effects on the human body. They found that relatively few activities could be explained linearly, though. The action of a protein might be stymied by the presence of others. And those other actors have histories of their own, with different pathways triggering or inhibiting pathways at many points. Stephen Friend, President of Sage Bionetworks, offers the example of an important gene implicated in breast cancer, the Human Epidermal growth factor Receptor 2, HER2/neu. The drugs that target this protein are weakened when another protein, Akt, is present. Trying to map these behaviors, scientists come up with meshes of paths. The field depends now on these network models. And one of its key goals is to evaluate these network models--not as true or false, right or wrong, because they are simply models that represent the life of the cell about as well as the New York subway map represents the life of the city--but for the models' usefulness in predicting outcomes of treatments. Network models containing many actors and many paths--that's why collaborations among research projects could contribute to our understanding of genetic expression. But geneticists have no forum for storing and exchanging networks. And nobody records them in the same format, which makes them difficult to build, trade, evaluate, and reuse. The Human Genome Project is a wonderful resource for scientists, but it contains nothing about gene expression, nothing about the network models and workflows and methods of curation mentioned earlier, nothing about software tools and templates to promote sharing, and ultimately nothing that can lead to treatments. This huge, multi-dimensional area is what the Sage Commons Congress is taking on. More collaboration, and a better understanding of network models, may save a field that is approaching crisis. The return on investment for pharmaceutical research, according to researcher Aled Edwards, has gone down over the past 20 years. In 2009, American companies spent one hundred billion dollars on research but got only 21 drugs approved, and only 7 of those were truly novel. Meanwhile, 90% of drug trials fail. And to throw in a statistic from another talk (Vicki Seyfert-Margolis from the FDA), drug side effects create medical problems in 7% of patients who take the drugs, and require medical interventions in 3% or more cases. This posting is one of a five-part series. Next installment: Five Easy Pieces, Sage's Federation. |
|||||
|
|||||