# Genomics and Privacy at the Crossroads

## Would you let people know about your dandruff problem if it might mean a cure for Lupus?

Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might be in the works using the data and samples we have contributed, and to network with the researchers and each other.

The Personal Genome Project (PGP) is a very different type of beast from the traditional research study model, in several ways. To begin with, it is a Open Consent study, which means that all the data that participants donate is available for research by anyone without further consent by the subject. In other words, having initially consented to participate in the PGP, anyone can download my genome sequence, look at my phenotypic traits (my physical characteristics and medical history), or even order some of my blood from a cell line that has been established at the Coriell biobank, and they do not need to gain specific consent from me to do so. By contrast, in most research studies, data and samples can only be collected for one specific study, and no other purposes. This is all in an effort to protect the privacy of the participants, as was famously violated in the establishment of the HeLa cell line.

The other big difference is that in most studies, the participants rarely receive any information back from the researchers. For example, if the researcher does a brain MRI to gather data about the structure of a part of your brain, and sees a huge tumor, they are under no obligation to inform you about it, or even to give you a copy of the scan. This is because researchers are not certified as clinical laboratories, and thus are not authorized to report medical findings. This makes sense, to a certain extent, with traditional medical tests, as the research version may not be calibrated to detect the same things, and the researcher is not qualified to interpret the results for medical purposes.

But this model falls apart when you are talking about Whole Genome Sequencing (WGS). In most cases, the sequencing done by a researcher is done in the same facility that a commercial clinical laboratory would use. And a WGS isn’t like a traditional medical test, it’s a snapshot of your entire genetic makeup, and contains a wealth of information that has nothing to do with whatever a specific researcher may be investigating. Shouldn’t you know if a WGS ordered by a researcher to look at autism also discovers that you have one of the “bad” BRCA1 mutations for breast cancer?

Historically, the high cost of WGS has made the problem largely academic, but not anymore. The cost of WGS in bulk is now approaching or under $2000, with$1000 expected to be the going rate very shortly. At this kind of price, it becomes an invaluable tool for scientists looking for links between genetic mutations and particular traits (good and bad.) They can use a technique called a Genome Wide Association Study (GWAS) to search for correlations between changes in DNA and diseases, for example.

The increasing use of GWAS is precisely why the PGP and it’s Open Consent model was created. Suppose you have 20 people who all have had gallstones, and you want to find out if they all share a common mutation. Because there are so many random mutations in our DNA, there are likely to be a large number of mutations that they will share by happenstance. What you need is a large control population without gallstones, so that you can rule out mutations that occur in people who have not gone on to develop the condition. There are databases that tell you how often a mutation occurs in the general public, but they don’t tell you how often they occur in people without gallstones. Because the PGP participants have not only consented to have their data used by anyone who wants to, but have (and continue to) contribute a rich set of phenotypic trait data, you can find PGP members who have or have not developed stones, and download their genomes.

The price that PGP members ask for the free and open use of their existing data is that new data be returned to the PGP members and made available for others to use. For example, I’ll be getting copies of my brain MRI and uploading them to my PGP profile, and the data on my microbiome (the bacteria in my gut and on my skin) has already been placed there by the University of Colorado “American Gut” research project. Not only does this let other researchers gain access to the new data, but it lets the more curious of the PGP participants learn things about themselves (woohoo, 3.2% of my stool bacteria is Ruminococcus!) One of the things that PGP members have to agree to is the understanding that any data they receive is not to be used for diagnostic purposes, although in practice several participants have used their PGP WGS data to determine the cause of illnesses that they had suffered from without explanations.

The future of GWAS, and genomic research in general, rests on the availability of a rich and diverse group of participants willing to serve as controls and cases for new studies, without the researchers having to go to the effort and cost of “consenting” the study sample each and every time. The goal of the PGP is to eventually enroll 100,000 members, to help meet this need.

But there’s a larger issue lurking beyond the question of consent, and that deals with privacy. There’s not much likelihood that a researcher or other entity could identify you from an MRI scan of your brain, but as public databases of genomic data grow, the chance that at least your surname can be intuited from your genome is becoming more of a fact than a possibility. This was demonstrated at GET 2013, along with the fact that with only three pieces of data (age, gender and zip code), it is almost always possible to narrow down to a single individual using publicly available data. At a minimum, this means that someone with your genome and a list of your traits is in a good position to link you to your medical problems, which could cause a problem when applying for life insurance (as an example.) It gets more complex still if you imagine what would happen if some seriously detrimental mutation is discovered at a later date. Suppose it was suddenly common knowledge that you had an allele that was strongly linked to psychopathy?

As a result, the PGP participants have recently been given notice by the project researchers that they should no longer depend on the expectation of privacy at all. All of the participants knew this risk going in, as it was explicitly spelled out as part of the original test that you had to pass in order to participate, but it’s now the reality on the ground. Rather than cling to the hope that they will remain anonymous, many PGP members have publicly revealed their PGP identifiers (I’m PGP65 for example, should you wish to learn about my valiant battle with gallstones revealed in my phenotypic data), and the project is considering adding photos and real names as optional data available on PGP records.

As we learned at the conference, in the Canadian version of the PGP there are essentially no concerns about privacy. Canada, in fact, lacks even the minimal protections that GINA provides in the United States. But since Canada is a single-payer health care system, the concern that a mutation might be considered a pre-existing condition is eliminated, which evidently provides enough reassurance to Canadians that they are willing to share their genetic data. This is in spite of the risks of job or life insurance discrimination, both of which are possible in Canada at the moment.

So where does this leave us? The reality of WGS, which will probably be as routinely ordered as a chest X-ray within a few years, is upon us. Because of the ability of our genetic data to uniquely identify us, and the way that Big Data is now linking more and more of our life into a common thread, the day may not be too far off when you get a popup ad on your browser advertising a cure of a disease you didn’t even know you had. We can either choose to embrace our lack of privacy, as the PGP members are doing, trading it for greater insight into ourselves and the potential to help improve the quality of life for others, or we can try to put the genie back in the bottle.

Related:

Strata Rx Heath Data Conference — Strata Rx brings together the diverse communities driving innovations in big data analytics for health care. Learn about the transformation of health care through big data and how to position your company to benefit from these trends. Learn more.

### Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.