The "six C's": understanding the health data terrain in the era of precision medicine.
Ian Eslick, Tuhin Sinha, and Rob Rustad contributed to this post.
Download a free copy of “Navigating the Health Data Ecosystem,” the first in a series of reports covering our recent investigation into the health data ecosystem, funded by the Robert Wood Johnson Foundation.A few years ago, O’Reilly became interested in health topics, running the Strata RX conference, writing a report on How Data Science is Transforming Health Care: Solving the Wanamaker Dilemma, and publishing Hacking Healthcare. Our social network grew to include people in the health care space, informing our nascent thoughts about data in the age of the Affordable Care Act and the problems and opportunities facing the health care industry. We had the notion that aggregating data from traditional and new device-based sources could change much of what we understand about medicine — thoughts now captured by the concept of “precision medicine.”
From that early thinking, we developed the framework for a grant with the Robert Wood Johnson Foundation (RWJF) to explore the technical, organizational, legal, privacy, and other issues around aggregating health-related data for research — to provide empirical lessons for organizations also interested in pushing for data in health care initiatives. Our new free report, Navigating the Health Data Ecosystem, begins the process of sharing what we’ve learned.
After decades of maturing in more aggressive industries, data-driven technologies are being adopted, developed, funded, and deployed throughout the health care market at an unprecedented scale. February 2015 marked the inaugural working group meeting of the newly announced NIH Precision Medicine Initiative designed to aggregate a million-person cohort of genotype/phenotype dense longitudinal health data, where donors provide researchers with the raw epidemiological evidence to develop better decision-making, treatment, and potential cures for diseases like cancer. In the past several years, many established companies and new startups have also started to apply collective intelligence and “big data” platforms to health and health care problems. All these efforts encounter a set of unique challenges that experts coming from other disciplines do not always fully appreciate. Read more…
How generating conversations can become one of the most important data assets for any organization.
At O’Reilly Research, we focus our attention on trends in technology adoption — which tools are adopted and in which industries. In doing so, we uncover interesting cross-disciplinary opportunities and discover what we can learn from innovations in other fields.
We’ve recently learned about the increasing role of data in the fashion industry, so we set out to uncover some of the players who are making disruptive changes using technology and analytics.
Our team asked Liza Kindred, founder of Third Wave Fashion, and Julie Steele, coauthor of Beautiful Visualization and Designing Data Visualizations, to take a closer look at these developments in their new report, “Fashioning Data: How fashion industry leaders innovate with data and what you can learn from what they know.” We think you’ll find some surprising applications of data and analytics in the fashion industry — applications that are useful regardless of the industry or organization you work within. And, we know we’re just at the beginning of what is likely a growing trend. Read more…
In-memory data storage, SQL, data preparation and asking the right questions all emerged as key trends at Strata + Hadoop World.
At our successful Strata + Hadoop World conference (including successfully avoiding Sandy), a few themes emerged that resonated with my interests and experience as a hands-on data analyst and as a researcher who tracks technology adoption trends. Keep in mind that these themes reflect my personal biases. Others will have a different take on their own key takeaways from the conference.
1. In-memory data storage for faster queries and visualization
Interactive or real-time query for large datasets is seen as a key to analyst productivity (real-time as in query times fast enough to keep the user in the flow of analysis, from sub-second to less than a few minutes). The existing large-scale data management schemes aren’t fast enough and reduce analytical effectiveness when users can’t explore the data by quickly iterating through various query schemes. We see companies with large data stores building out their own in-memory tools, e.g., Dremel at Google, Druid at Metamarkets, and Sting at Netflix, and new tools, like Cloudera’s Impala announcement at the conference, UC Berkeley’s AMPLab’s Spark, SAP Hana, and Platfora.
We saw this coming a few years ago when analysts we pay attention to started building their own in-memory data store sandboxes, often in key/value data management tools like Redis, when trying to make sense of new, large-scale data stores. I know from my own work that there’s no better way to explore a new or unstructured data set than to be able to quickly run off a series of iterative queries, each informed by the last. Read more…
What does winning look like? No enemy has been vanquished, but open source is now mainstream and a new norm.
I heard the comments a few times at the 14th OSCON: The conference has lost its edge. The comments resonated with my own experience — a shift in demeanor, a more purposeful, optimistic attitude, less itching for a fight. Yes, the conference has lost its edge, it doesn’t need one anymore.
Open source won. It’s not that an enemy has been vanquished or that proprietary software is dead, there’s not much regarding adopting open source to argue about anymore. After more than a decade of the low-cost, lean startup culture successfully developing on open source tools, it’s clearly a legitimate, mainstream option for technology tools and innovation.
And open source is not just for hackers and startups. A new class of innovative, widely adopted technologies has emerged from the open source culture of collaboration and sharing — turning the old model of replicating proprietary software as open source projects on its head. Think Git, D3, Storm, Node.js, Rails, Mongo, Mesos or Spark.
We see more enterprise and government folks intermingling with the stalwart open source crowd who have been attending OSCON for years. And, these large organizations are actively adopting many of the open source technologies we track, e.g., web development frameworks, programming languages, content management, data management and analysis tools.
We hear fewer concerns about support or needing geek-level technical competency to get started with open source. In the Small and Medium Business (SMB) market we see mass adoption of open source for content management and ecommerce applications — even for self-identified technology newbies.
Agility, simplicity, and curiosity will define the next generation of apps and devices.
The speakers at the recent Webstock conference in New Zealand gravitated toward many of the same themes. Taken together, these themes create a framework for building the next generation of services, applications and devices.
How a days-long data process was completed in minutes.
We recently faced the type of big data challenge we expect to become increasingly common: scaling up the performance of a machine learning classifier for a large set of unstructured data. In this post, we explain how a set-oriented approach led to huge performance gains.
Lessons from adjacent media companies can inform publishers' ebook strategies.
Will internal constituencies bias how publishers value print book and ebook business models? Roger Magoulas examines that question and looks at the complementary relationship between print and electronic forms.