Genomics and the Role of Big Data in Personalizing the Healthcare Experience

Increasingly available data spurs organizations to make analysis easier

This article was written with Ellen M. Martin and Tobi Skotnes. Dr. Feldman will deliver a webinar on this topic on September 18 and will speak about it at the Strata Rx conference.

Genomics is making headlines in both academia and the celebrity world. With intense media coverage of Angelina Jolie’s recent double mastectomy after genetic tests revealed that she was predisposed to breast cancer, genetic testing and genomics have been propelled to the front of many more minds.

In this new data field, companies are approaching the collection, analysis, and turning of data into usable information from a variety of angles.

What is Genomics?

Genomics is the study of the complete genetic material (genome) of organisms.  The field includes sequencing, mapping, and analyzing a wide range of RNA and DNA codes, from viruses and mitochondria to many species across the kingdoms of life. Most pertinent here are intensive efforts to determine the entire DNA sequence of many individual humans in order to map and analyze individual genes and alleles as well as their interactions. The primary goal that drives these efforts is to understand the genetic basis of heritable traits, and especially to understand how genes work in order to prevent or cure diseases.

The amount of data being produced by sequencing, mapping, and analyzing genomes propels genomics into the realm of Big Data. Genomics produces huge volumes of data; each human genome has 20,000-25,000 genes comprised of 3 million base pairs. This amounts to 100 gigabytes of data, equivalent to 102,400 photos. Sequencing multiple human genomes would quickly add up to hundreds of petabytes of data, and the data created by analysis of gene interactions multiplies those further.

Forces driving data analysis of genomics

Forces driving data analysis of genomics

Genomics Fuels Personalized Medicine

Personal genomics–understanding each individual’s genome–is a necessary foundation for predictive medicine, which draws on a patient’s genetic data to determine the most appropriate treatments. Medicine should accommodate people of different shapes and sizes. By combining sequenced genomic data with other medical data, physicians and researchers can get a better picture of disease in an individual. The vision is that treatments will reflect an individual’s illness, and not be a one treatment fits all, as is too often true today.

Genomics Analysis Techniques

Three notable genomic start-ups have different approaches to turning genomic data into usable information that improves individual and population health.

  • Bina Technologies has created a platform that helps users access and analyze genomic sequence data. They use a hybrid architecture that keeps some data on the premises and some in the cloud, pushing computation back to where the data is, in order to reduce the data 1000-fold while speeding up sequencing time and facilitating the movement of the data.
  • Portable Genomics uses a mobile visualization platform for genomics that is comparable to the consumers’ well-known iTunes platform. The visualization concept brings genomics to consumers and professionals in a very simple way, immediately understandable and usable in personalized and preventive medicine.
  • NextBio offers a platform that sits on top of existing systems to aggregate and analyze genomics data in relation to other relevant medical data.

These different approaches illustrate the current rapidly evolving state of the art in using genomics to help personalize medicine. Bina illustrates the power of genomics to improve population health, Portable Genomics exemplifies bringing the power of information to the individual, and NextBio is the epitome of a one-stop shop, analyzing and aggregating large data from different streams to personalize individual treatments.

Human Genome: Then and Now

As we’ve shown, research in the field of genomics has come a long way in the past 60 years. The pioneering effort in studying the human genome and its effect on disease is the Human Genome Project (1990-2003), which changed sequencing from a manual process to an automated one.

Timeline for Human Genome Project
Timeline for Human Genome Project

Driven by advances in technology that have dramatically reduced costs, Genome Wide Association Studies (GWAS) are expanding on the Human Genome Project in discovering connections between genes and diseases. GWAS tests single nucleotide polymorphisms (SNPs) for association with diseases to find links. (A single nucleotide polymorphism is one in which there is a one nucleotide difference between two genes. For example, two sequenced DNA fragments, AAGCCTA versus AAGCTTA, have one differing nucleotide. In old-fashioned genetic terminology, these would be different alleles of the same gene.)

More than 1600 genome publications have connected 2000 gene associations with more than 300 common human disease traits.

So far, GWAS hasn’t proven directly useful for guiding individual health, but we may be on the brink of changing this. There are three near future clinical applications for GWAS:

  • Predictive models to identify high risk patients, as in Type 1 Diabetes patients.
  • Classifying disease subtypes of potential use for more precisely guided clinical trials, and targeted treatments (e.g. cancers).
  • Provide better information for screening drug candidates for toxicity and efficacy before clinical trials.

The Era of the $1000 Genome: the Archon X-prize

By the time the Human Genome Project was completed, the cost of sequencing the human genome was $40 million, down from $95 million just two years before. Academics and companies have been working hard to make sequencing affordable and therefore available to the public. Today an individual human genome can be sequenced for around $5000 consistently and accurately.

The rapidly declining cost of gene sequencing
The rapidly declining cost of gene sequencing


The current Holy Grail in genomics is the “$1000 Genome,” the attempt to make sequencing and mapping individual genomes cheap enough to be a part of every patient’s medical record. The Archon X-PRIZE “$1000 Genome” prize competition challenges researchers to decrease the price of sequencing to under $1000. Through advanced genomic sequencing and developing rapid, inexpensive, and accurate whole genome sequencing technologies, the ultimate goal is to usher in an era of personalized medicine. The competition will take place this fall, and will award a purse of $10 million to the first team to successfully sequence the whole human genome of 100 centenarians within 30 days at a maximum cost of $1000 per genome and an error rate no greater than 1 per million base pairs.

tags: , , , , , , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.