"health data" entries
The "six C's": understanding the health data terrain in the era of precision medicine.
Ian Eslick, Tuhin Sinha, and Rob Rustad contributed to this post.
Download a free copy of “Navigating the Health Data Ecosystem,” the first in a series of reports covering our recent investigation into the health data ecosystem, funded by the Robert Wood Johnson Foundation.A few years ago, O’Reilly became interested in health topics, running the Strata RX conference, writing a report on How Data Science is Transforming Health Care: Solving the Wanamaker Dilemma, and publishing Hacking Healthcare. Our social network grew to include people in the health care space, informing our nascent thoughts about data in the age of the Affordable Care Act and the problems and opportunities facing the health care industry. We had the notion that aggregating data from traditional and new device-based sources could change much of what we understand about medicine — thoughts now captured by the concept of “precision medicine.”
From that early thinking, we developed the framework for a grant with the Robert Wood Johnson Foundation (RWJF) to explore the technical, organizational, legal, privacy, and other issues around aggregating health-related data for research — to provide empirical lessons for organizations also interested in pushing for data in health care initiatives. Our new free report, Navigating the Health Data Ecosystem, begins the process of sharing what we’ve learned.
After decades of maturing in more aggressive industries, data-driven technologies are being adopted, developed, funded, and deployed throughout the health care market at an unprecedented scale. February 2015 marked the inaugural working group meeting of the newly announced NIH Precision Medicine Initiative designed to aggregate a million-person cohort of genotype/phenotype dense longitudinal health data, where donors provide researchers with the raw epidemiological evidence to develop better decision-making, treatment, and potential cures for diseases like cancer. In the past several years, many established companies and new startups have also started to apply collective intelligence and “big data” platforms to health and health care problems. All these efforts encounter a set of unique challenges that experts coming from other disciplines do not always fully appreciate. Read more…
How do we motivate sustained behavior change when the external motivation disappears—like it's supposed to?
If you’ve ever tried to count calories, go on a diet, start a new exercise program, change your sleep patterns, spend less time sitting, or make any other type of positive health change, then you know how difficult it is to form new habits. New habits usually require a bit of willpower to get going, and we all know that that’s a scarce resource. (Or at least, a limited one.)
Change is hard. But the real challenge comes after you’ve got a new routine going—because now you’ve got to keep it going, even though your original motivations to change may no longer apply. Why keep dieting when you no longer need to lose weight? We’ve all had the idea at some point that we really should reward ourselves for that five-pound weight loss with a cupcake, right?
When the death of trust meets the birth of BYOD
Dr. Andrew Litt, Chief Medical Officer at Dell, made a thoughtful blog post last week about the trade-offs inherent in designing for both the security and accessibility of medical data, especially in an era of BYOD (bring your own device) and the IoT (internet of things). As we begin to see more internet-enabled diagnostic and monitoring devices, Litt writes, “The Internet of Things (no matter what you think of the moniker), is related to BYOD in that it could, depending on how hospitals set up their systems, introduce a vast array of new access points to the network. … a very scary thought when you consider the sensitivity of the data that is being transmitted.”
As he went on to describe possible security solutions (e.g., store all data in central servers rather than on local devices), I was reminded of a post my colleague Simon St.Laurent wrote last fall about “security after the death of trust.” In the wake of some high-profile security breaches, including news of NSA activities, St.Laurent says, we have a handful of options when it comes to data security—and you’re not going to like any of them.
The 30,000-foot view and the nitty gritty details of working with electronic health data
Ever wonder what the heck “meaningful use” really means? By now, you’ve probably heard it come up in discussions of healthcare data. You might even know that it specifically pertains to electronic health records (EHRs). But what is it really about, and why should you care?
If you’ve ever had to carry a large folder of paper between specialists, or fill out the same medical history form in different offices over and over—with whatever details you happen to remember off the top of your head that day—then you already have some idea of why EHRs are a desirable thing. The idea is that EHRs will lead to better care—and better research data—through more complete and accurate record-keeping, and will eventually become part of health information exchanges (HIEs) with features like trend analysis and push-notifications. However, the mere installation of EHR software isn’t enough; we need not just cursory use but meaningful use of EHRs, and we need to ensure that the software being used meets certain standards of efficiency and security.
By Julie Yoo, Chief Product Officer at Kyruus
Once upon a time, a world-renowned surgeon, Dr. Michael DeBakey, was summoned by the President when the Shah of Iran, a figure of political and strategic importance, fell ill with an enlarged spleen due to cancer. Dr. DeBakey was whisked away to Egypt to meet the Shah, made a swift diagnosis, and recommended an immediate operation to remove the spleen. The surgery lasted 80 minutes; the spleen, which had grown to 10 times its normal size, was removed, and the Shah made a positive recovery in the days following the surgery – that is, until he took a turn for the worse, and ultimately died from surgical complications a few weeks later. 
Sounds like a routine surgery gone awry, yes? But consider this: Dr. DeBakey was a cardiovascular surgeon – in other words, a surgeon whose area of specialization was in the operation of the heart and blood vessels, not the spleen. He was most well-known for his open heart bypass surgery techniques, and the vast majority of his peer-reviewed articles relate to cardiology-related operating techniques. High profile or not, why was a cardiovascular surgeon selected to perform an abdominal surgery?
Exploring an upcoming Strata Rx 2013 session on big data and privacy
Databases of health data are widely shared among researchers and for commercial purposes, and they are even put online in order to promote health research and data-driven health app development, so preserving the privacy of patients is critical. But are these data sets de-identified properly? If not, it could be re-identified. Just look at the two high profile re-identification attacks that have been publicized in recent months.
The first attack involved individuals who voluntarily published their genomic data online as a way to support open data for research. Besides their genomic data, they posted their basic demographics such as date of birth and zip code. The demographic data, not their genomic data, was used to re-identify a subset of the individuals.
Researchers begin to scale up pattern recognition, machine-learning, and data management tools.
My first job after leaving academia was as a quant1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, with occasional forays into machine-learning (clustering, classification, anomalies). More recently, I’ve been closely following the emergence of tools that target large time series and decided to highlight a few interesting bits.
Time-series and big data:
Over the last six months I’ve been encountering more data scientists (outside of finance) who work with massive amounts of time-series data. The rise of unstructured data has been widely reported, the growing importance of time-series much less so. Sources include data from consumer devices (gesture recognition & user interface design), sensors (apps for “self-tracking”), machines (systems in data centers), and health care. In fact some research hospitals have troves of EEG and ECG readings that translate to time-series data collections with billions (even trillions) of points.
Five ways we can improve the information we collect to help us solve hard problems in health care.
I was honored to chair O’Reilly’s inaugural edition of Strata Rx, our conference on data science in health care, this past October along with Colin Hill. As we’re beginning to plan this year’s event, I find myself thinking a lot about a theme that emerged from some of the keynotes last fall: in order to solve the problems we’re facing in health care — to lower costs and provide more personal, targeted treatments to patients — we don’t just need more data; we need better data.
Much has been made about the era of big data we find ourselves in. But though the data we collect is straining the limits of our tools and models, we’re still not making the kind of headway we hoped for in areas like health care. So big data isn’t enough. We need better data.
What does it mean to have better data in health care? Here are some things on my list; perhaps you can think of others. Read more…
Which data formats should the DocGraph project support?
The DocGraph project has an interesting issue that I think will become a common one as the open data movement continues. For those that have not been keeping up, DocGraph was announced at Strata RX, described carefully on this blog, and will be featured again at Strata 2013. For those that do not care to click links, DocGraph is a crowdfunded open data set, which merges open data sources on doctors and hospitals.
As I recently described on the DocGraph mailing list, work is underway to acquire the data sets that we set out to merge. The issue deals with file formats.
The core identifier for doctors, hospitals and other healthcare entities is the National Provider Identifier (NPI). This is something like a Social Security number for doctors and hospitals. In fact it was created in part so that doctors would not need to use their Social Security numbers or other identifiers in order to participate in healthcare financial transactions (i.e. paid by insurance companies for their services). The NPI is the “one number to rule them” in healthcare and we want to map data from other sources accurately to that ID.
Each state releases none, one or several data files that can be purchased and also contain doctor data. But these file downloads are in “random file format X.” Of course we are not yet done with our full survey of the files and their formats, but I can assure you that they are mostly CSV files and a troubling number of PDF files. It is our job to take these files and merge them against the NPI, in order to provide a cohesive picture for data scientists.
But the data available from each state varies greatly. Sometimes they will have addresses, sometimes not. Sometimes they will have fax numbers, sometimes not, sometimes they will include medical school information, some will not. Sometimes they will simply include the name of the medical school, sometimes they will use a code. Sometimes when they use codes they will make up their own …
I am not complaining here. We knew what we were getting ourselves into when we took on the DocGraph project. The community at large has paid us well to do this work! But now we have a question? What data formats should we support? Read more…
A joint effort by New York City, San Francisco, and Yelp brings government health data into Yelp reviews.
One of the key notions in my “Government as a Platform” advocacy has been that there are other ways to partner with the private sector besides hiring contractors and buying technology. One of the best of these is to provide data that can be used by the private sector to build or enrich their own citizen-facing services. Yes, the government runs a weather website but it’s more important that data from government weather satellites shows up on the Weather Channel, your local TV and radio stations, Google and Bing weather feeds, and so on. They already have more eyeballs and ears combined than the government could or should possibly acquire for its own website.
That’s why I’m so excited to see a joint effort by New York City, San Francisco, and Yelp to incorporate government health inspection data into Yelp reviews. I was involved in some early discussions and made some introductions, and have been delighted to see the project take shape.
My biggest contribution was to point to GTFS as a model. Bibiana McHugh at the city of Portland’s TriMet transit agency reached out to Google, Bing, and others with the question: “If we came up with a standard format for transit schedules, could you use it?” Google Transit was the result — a service that has spread to many other U.S. cities. When you rejoice in the convenience of getting transit timetables on your phone, remember to thank Portland officials as well as Google. Read more…