Khaled El Emam
Exploring an upcoming Strata Rx 2013 session on big data and privacy
Databases of health data are widely shared among researchers and for commercial purposes, and they are even put online in order to promote health research and data-driven health app development, so preserving the privacy of patients is critical. But are these data sets de-identified properly? If not, it could be re-identified. Just look at the two high profile re-identification attacks that have been publicized in recent months.
The first attack involved individuals who voluntarily published their genomic data online as a way to support open data for research. Besides their genomic data, they posted their basic demographics such as date of birth and zip code. The demographic data, not their genomic data, was used to re-identify a subset of the individuals.