Citizens as partners in the use of clinical data

A Knowledge Currency Exchange for health and wellness

This article was written together with Mike Kellen, Director of Technology at Sage Bionetworks, and Christine Suver, Senior Scientist at Sage Bionetworks.

The current push towards patient engagement, when clinical researchers trace the outcomes of using pharmaceuticals or other treatments, is a crucial first step towards rewiring the medical-industrial complex with the citizen at the center. For far too long, clinicians, investigators, the government, and private funders have been the key decision makers. The citizen has been at best a research “subject,”and far too often simply a resource from which data and samples can be extracted. The average participant in clinical study never receives the outcomes of the study, never has contact with those analyzing the data, never knows where her samples flow over time (witness the famous story of Henrietta Lacks), and until the past year didn’t even have access to the published research without paying a hefty rental fee.

This is changing. The recent grants by the Patient-Centered Outcomes Research Institute (PCORI) are the most visible evidence of change, but throughout the medical system one finds green shoots of direct patient engagement.

However, we must not stop at the idea of engaging patients to tell us about their outcomes – the promise of patient-centered research is not simply to survey people about their health, medicines, and results. This is instead the first pace in a three-step dance: patient engagement into clinical study, syndication of engaged patient data to networks of researchers who can analyze and re-analyze the data, and a feedback loop of results to the participants.

Patient engagement into clinical study

Sage Bionetworks is a 501(c)(3) nonprofit biomedical research organization created to accelerate biological understanding by promoting collaborations and enabling community-based discovery. At Sage Bionetworks, we believe in the importance of iteratively generating and testing novel hypotheses transparently and collaboratively. We also believe that successful biomedical research requires a cultural shift and the active participation of all stakeholders. We are re-imagining the role of citizens in research and are building a software platform, called BRIDGE, to empower them to contribute both their data and expertise to research as they see fit.

The BRIDGE software platform is a virtual meeting place where people partner together in communities to contribute their health data, concerns, suggestions, expertise, and wisdom about diseases to advance research. Within BRIDGE, participants can access educational material, track their data and health timelines, and connect with each other as virtual teams to solve the problems that matter most to them. The health data contributed by individual participants on BRIDGE are coded and collated with data from other participants into aggregated, de-identified datasets that are then transferred to Synapse, the Sage Bionetworks collaborative research platform. One key difference between BRIDGE and many patient support forums is that patients control who gets their data at all times.

The goal of BRIDGE is to enable communities to foster research, recognizing that this endeavor will require a continuous link between participants and their data and thus cannot be completed using the traditional siloed approaches for protection of human data. As such, BRIDGE is orthogonal to most existing clinical protocols, which limit data use to a predefined study as a means to protect the interests of research participants. It is important to note that participation in BRIDGE is completely voluntary and does not affect participation in other traditional research endeavors.

The ability of the individual to participate in managing their health and to access his personal health data has exploded in recent years. There is a growing trend for citizens to organize their own personal health records and supplement this information by generating additional data—such as genomic profiles, or food and activity logs—on their own. Indeed, we are now capable of creating data about our own health using direct-to-consumer genotyping, legally demanding our health records from insurance companies and health providers, or using “self tracking”devices and applications. These records may empower individuals to learn from their own data and potentially aid researchers to better understand health and human disease. These new capabilities provide an unprecedented opportunity to draw individual citizens and researchers together to enable the meaningful analysis of health data and to develop innovative approaches to treatment and health management.

Many digital heath tracker platforms are available to capture self-contributed citizen health and lifestyle data and then collate this information into private databases or biobanks. These platforms typically use citizen-contributed data as a commodity, selling the information to interested researchers or using their databases as a fee-for-service patient recruitment tool. The data collected is often aggregated, repackaged, and sold for data mining to insurance, pharmaceutical or other companies.

Data subjects sign up for these platforms because they offer a service, such as assistance in record keeping or aid in health prognosis, but they have no knowledge of where their data is being used by whom or for which purpose. In contrast, individuals who join BRIDGE collaborate as equal partners with scientists. Within BRIDGE, individuals can contribute their knowledge and data to ethically approved research studies of their choice.

The key features of BRIDGE are:

  • Personal Health trackers where citizen can record and track their health information, history and timeline in a meaningful way and see it change over time.
  • Study databases where data aggregates into powerful statistics to compare, rate, and learn from averaging users’ experience.
  • Synapse, an analysis platform that shares de-identified aggregate data with analysts for community research, knowledge and insights.
  • Community resource and education centers where information relevant to a community such as blogs, study announcements, study conclusions, and literature informs and supports the disease communities.

Sage Bionetworks has consulted with multiple stakeholders in the development of BRIDGE: public/patients advocacy groups, research scientists, physicians, industry leaders in pharmaceutical or biotechnology, research funders, and regulators. By and large, these groups want to see citizens and scientists working together to form communities around diseases, lifestyles, and therapies to accelerate biomedical research that can lead to faster cures. Many communities have asked to participate in the beta testing of BRIDGE.

The data in BRIDGE is not meant for medical diagnosis. It is used for education, community engagement, and research only. Sage Bionetworks is not a medical or health care provider, health plan, or clearinghouse. It is not a HIPAA-covered entity subject to the HIPAA privacy rule set forth by US Health and Human Services. However we uphold the protection of health information and participant autonomy, and we are submitting BRIDGE to rigorous independent ethical and institutional review.

Syndication of engaged patient data to researchers

One of the most persistent problems in patient-centered research is the gap between the people who have data or can generate data about themselves and the people who can analyze that data to generate new insights. The average individual may well be able to sequence her genome, track her fitness and food, and access her medical record–and pool it with others like her via BRIDGE–but finding someone to make sense of that data is an additional challenge.

At Sage Bionetworks, we have spent years building a data repository and computational analysis platform called Synapse where registered users can compute and co-analyze datasets, use tools to build models of diseases, and interact to derive maximal information from analyses. Synapse is currently hosted online free for public use, the source code is available under an open source license on GitHub, and our developer wiki documentation and bug tracker are also publicly available. Synapse currently supports a growing user base of data scientists, predominately in large-scale collaborative projects like the DREAM challenges or The Cancer Genome Atlas (TCGA) working groups.

The Synapse platform is a set of shared web services that support a website designed to facilitate collaboration among scientific teams, and integrations with analysis tools and programming environments. The Synapse web portal is a collaborative online environment for scientists to interact and share data, models, and analysis methods, both in the context of specific research projects, and broadly across otherwise disparate projects. The organizing construct of the portal revolves around allowing users to define their own online project spaces to which they can post content (data, code, analysis history, and results) and document their work online immediately upon production. Publication simply exposes completed work to a larger audience, ensuring that the broader community is eventually able to access the same resources as project team members.

As with our BRIDGE project, Synapse has been developed with advice from independent ethical advisors, legal counsels, and an IRB to implement appropriate governance policies and procedures to support Synapse operations. These efforts enable the contribution and use of human data for research purposes, while protecting personal information and respecting individuals’ expectations for privacy protection.

Within Synapse, aggregated BRIDGE datasets will be made available to researchers. Analyses results will then be returned to BRIDGE communities to inform participants of research progress. TCGA providers an example of how this works: 250 researchers from 30 different institutions participated in the TCGA Pan-Cancer consortium to do integrative analysis on data that encompassed six different biomolecular technologies (RNA-Seq, RPPA, methylation arrays, CNV-arrays, DNA-Seq, and miRNA-Seq) acquired on 12 different tumor types. Individual research teams engaged in over 60 distinct sub-projects, each based at least in part on this common data resource. Many of these projects were interdependent, requiring multi-stage analysis and sharing of intermediate results, such that results from one group were used as input for the analyses of other groups.

Synapse distributed versioned sets of the raw and initially-processed forms of the data, providing a common starting point for analysts that minimized common preliminary steps of data processing. Individual researchers then worked in their own subfolders of the master TCGA project, allowing them to the freedom to work independently and merge in their own additional data.

Now that the Pan-cancer consortium has published a Nature focus of 17 papers, this Synapse resource has become public as a compendium to these papers. As a result, we are seeing much of the TCGA Pan-Cancer Working Group’s data and results being reused by independent research efforts such as the Progenitor Cell Biology Consortium and the TCGA gastric cancer and TCGA skin cutaneous melanoma working groups.

This is precisely the sort of analytic network effect and cognitive surplus that can power the analysis of BRIDGE-collected data. The Synapse architecture facilitates new ways for analysts to work, just as BRIDGE facilitates new ways for patients to self-organize.

Closing the loop: returning results to participants

The question of how best to return insights generated via Synapse analysis to BRIDGE participants does not have a simple answer. Statistical analysis is by nature usually incomplete: inferences and probabilities change as new data comes in, as models evolve, as the consensus of analysts change. And legal tools that attempt to compel information return are complex–at what point must a researcher share an insight, and in what form, and with what enforcement?

Our approach for now has been to enforce transparency inside Synapse for participants in BRIDGE. This makes it easy for a participant to see, instantly, where her data might be in use or driving insights in analysis. Once a participant has enrolled she is given a random unique Participant Study identifier, distinct from the User Identifier that is associated with her data in lieu of a name. A user who participates in numerous study will have numerous distinct Participant Study Identifiers but only one User Identifier.

The study data she contributes will be pooled with that from other study participants in an aggregate dataset viewable in the study community space and in Synapse. Participants can explore the aggregate data and relevant statistics to compare their own data to that of the average study population. Study participants can ask questions in the community forum and contact a study monitor to share their question, concerns, and comments. Within a community, individuals can aggregate their research questions and connect with each other as virtual teams to solve problems.

We also have implemented access requirements for certain studies in Synapse that require peer-reviewed publications be made available on the Internet, free of charge, within a year of first publication (consistent with the US NIH public access policy). This ensures that participants donating data will be able to read the insights published in more traditional settings.

The rationale for implementing transparency and data return is simple. When a person makes the choice to engage in patient-centered research, she should get a return. And it’s entirely unclear if economic returns are the appropriate ones–the cash value of any one person’s data is quite low. But the knowledge value of data invested into open research like BRIDGE and Synapse enable can be massive, and ensuring the ability to see the insights derived from one’s own data is simply the right thing to do.


We sit at a moment of tremendous opportunity. The outlines of the coming data revolution in health have emerged: genetic sequence data, laboratory results, medical records, patient-reported outcomes, and mobile device data. But far too many of the conversations are focused simply on the collection of data. We need to ensure these conversations recognize the entire lifecycle in which data is converted to knowledge, and to embed the patient at all stages of that life cycle. It’s time we stopped thinking about people as sources of data for research, and started thinking of them as our partners in research. Only when citizens become partners can we create a knowledge currency exchange. Here, data and insights from citizens, doctors and researchers alike can generate open, dynamic knowledge around wellness and disease that is robust enough to drive real-time decisions about lifestyle and care.

tags: , , , , , , , , ,