NASA technology leads to better medical decisions

NASA's Chris Mattmann discusses object-oriented data and health IT.

You may also download this file. Running time: Time: 19:03

Can a data-sharing technology developed at NASA’S Jet Propulsion Laboratory create better outcomes for medicine?

In fact, it already is.

In this podcast, Chris Mattmann, a senior computer scientist at the NASA Jet Propulsion Laboratory, talks with me about objected-oriented data technology (OODT) and health IT.

Mattman dives in to the following questions:

  • What is object-oriented data technology (OODT) and how does it relate to health IT?
  • How did NASA’s Jet Propulsion Laboratory get involved with applying OODT to health IT?
  • What’s it been like for a NASA project to work within the Apache Incubator and the open source community?
  • What is the Virtual Pediatric Intensive Care Unit?
  • How will data-driven tools help doctors, researchers and patients make better medical decisions?

Health IT at OSCON 2010Chris Mattmann will speak about grid software and healthcare IT in the health track at next month’s OSCON conference.

tags: , ,
  • Alex Tolley

    It would be a lot more useful to have data on whether the goals are really being met versus the cost rather than rhapsodizing on the data platform.

    We know that just having checklists is a very low hanging fruit to improving medical care by reducing mistakes.

    Changing the way medical research is published to eliminate bogus positive results for various drug and equipment trials might be another.

    Analyzing patient data is definitely going to help, although I suspect that this will focus on the data that is available, which may miss the key factors. By analogy, data analysis of wounded soldier survival during the early C19th century wouldn’t have helped much as no-one was even thinking about sterilizing surgical tools or surgeons washing their hands. Data analysis is good, but should come after human insights through observation.

    If evidence based analysis is to be evenly applied, it should also apply to the techniques being promulgated, in this case databases and data management as a means to improve medical outcomes. If we don’t do this, we will forever be tantalized by the next shiny new idea that will change everything if implemented…

  • Chris Mattmann

    Hi Alex,

    Thanks for your insightful comments. In fact, it is important to study both the collected, historical data, as well as to experiment and observe and see what happens as that definitely leads to escaping what I would call “local” minima, or peaks-and-valley syndrome. In computer science, there are whole classes of algorithms to solve problems that are related to this notion, e.g., simulated annealing, hill climbing, etc.

    So, I definitely agree that it’s important to consider both. At the same time, I also believe it’s important to try and develop models and to look for observations in historical data as well, and to develop repeatable, accurate models of the decision making processes that led to specific outcomes and events. Sensitivity, specificity, and accuracy are important in these models, and should be looked at extremely carefully, but my feeling is that the observations are important also to study.



  • Alex Tolley


    I would be interested in reading about a study that shows that this data analysis approach both works and is cost effective.

    “…to develop repeatable, accurate models of the decision making processes that led to specific outcomes and events.”

    In practice, I think that there is going to be relatively low signal to noise ratio. Consider the variability of patient responses to drug trials. This occurs despite careful selection of patients, double blinding the trial to eliminate investigator bias, experienced researchers who are aware of other sources of bias. The patient pool is not a genetically identical cohort of Sprague-Dawley rats with near identical environmental backgrounds, but a highly varied population with widely different histories and also responding to the environment, including how much they like the duty nurse, their fellow patients, etc. It might seem that the variability can be reduced by using a very large number of patient histories, and it will certainly help, but it is likely that many of the key variables are not recorded. For example, we know that genetics (genomics) is important, but almost certainly that data is not going to be available. Having lots of data therefore might not be as useful as one might think. As a Nasa person, it might be useful to examine whether having data on tap would have exposed the O-ring problem of the Challenger disaster, to name one high profile issue that seemed to escape the internal investigators.

    The next issue to my mind is the problem of data mining. Once there is a lot of data available, their is a natural inclination to mine the data for interesting relationships. What is often forgotten (or ignored) is that with a enough variables, the “curse of dimensionality” will create all sorts of spurious relationships. We see that effect in drug trial papers that purport to show some effect, even when that effect was not the purpose of the trial, just a convenient outcome.

    But assume that everything I’ve said above can be dismissed. My next question is: is it worth the effort, and what are the consequences of taking the data driven approach? I don’t know the answers to that, although I’ve been subjected to needless form filling for companies who believe that computer analysis will expose success factors, even though those factors never seemed to be communicated. (Age can be a bit jaundicing.)

    None of what I’ve said should be interpreted as being against data driven, experienced based research. I think that making rational decisions based on evidence, rather than belief is very important, and I do applaud efforts in that direction. What I am saying is that thoughtful, small scale experiments might be better (and more cost effective) than attempting to build large data capture systems. But as I said, I would like to see evidence that this approach actually works.

  • Dave


    While you bring up interesting points, I think you are being needlessly pessimistic about the possible success of this direction of research. There is a host of exactly this kind of work taking place right now producing promising (albeit preliminary) results: Stuart Russell’s work at Berkeley as a part of the C-BICC project, MIT’s MIMIC 2 database and the physionet project more generally, and UCLA’s Neural Systems and Dynamics Lab.

    However, responding more directly to your criticisms, I would like to add the following:

    1. This project is not building large, expensive data capture systems; those are in place and are called electronic health records (EHRs), which we all know and love now that the two most recent administrations have sold the American public on the premise that they’ll transform medicine. Switching to EHRs, particularly at large institutions, costs tens or hundreds of millions of dollars, but once the switch is made, the EHR vendors begin capturing data AND storing it themselves. In the meantime, however, it is inaccessible to clinical researchers. The goal here is to spend a fraction of the money already spent switching to simply enable access to and sharing of this data for research and development of new decision support tools. There is some expense in developing this framework, but it is dwarfed (by orders of magnitude) by the cost of actually moving to EHRs. Once the move is made and the data is already captured, the real waste is NOT using the data.

    2. Your criticism about the nature of the results obtained from analyzing this data would apply even more so to current methodologies used in retrospective clinical research in general. This project uses exactly the same data that clinical researchers at hospitals around the world use to conduct studies and publish papers every year, except that it aims to enable access to much larger datasets and to automate the analysis using modern machine learning and data mining techniques. Clinical researchers are thrilled to work with datasets on the scope of tens of thousands of patients (rather than tens or hundreds), and they rightfully believe that this data should be made easily accessible for research and analysis in the era of EHRs.

    3. The true goal is to produce research results and decision support tools that realistically inform the way clinicians ACTUALLY practice medicine. Double-blind, randomized, highly controlled studies are wonderful, but they often have limited, long-delayed impact on the way clinicians actually practice at the bedside. Any honest clinician will tell you that they are instead much more influenced by the direct experience that they have in “natural” experiments at the bedside; however, at the same time, humans have strong biasing mechanisms built into the way they manage data and learn from it. If we can build tools that learn from the same data that people are learning from and provide immediate feedback and context at the beside (rather than results published a year later), then perhaps we can actually change the way medicine is practiced.

    I would like acknowledge that everything you said about the signal to noise ratio, the curse of dimensionality and similar worries is true; however, these are challenges that inform how we go about executing such a project, rather than death knells that suggest we should give up.

    Finally, I would respond to your last sentence, that you would like to see evidence that this approach actually works: well, we don’t evidence either way, because it’s never been tried (although the work at MIT and Berkeley is quite promising). I guess we’ll find out!

  • Chris Mattmann

    Hi Alex,

    Again, thanks for your comments. In terms of specific studies that reference the importance of increasing the amount of data to “analyze” as being important, I would recommend checking out the Nature magazine special issue on BigData from 2008:

    I found it very enlightening.

    Thanks very much, again.


  • Dave

    There are a variety of ongoing projects at prestigious universities that are demonstrating great success in this very domain (intensive care health data):

  • Alex Tolley

    Chris, thanks for the Nature link.

    While it does have material relating to the collection and storage of data, there is nothing about successful analysis or cost effectiveness, especially in more fuzzy domains like health care.

    Having worked with genomic data, I know that micro-array data sets are quite variable even from one source, and data from different sources is not often comparable, even when using the same technology and protocols. To me this is a clear example of “big data” not being useful. I would expect this sort of problem to occur in other domains too. This is not the problem for single source data creators, like Cern’s LHC, the Hubble telescope or the Sanger Center, so it makes sense for them to generate a lot of data for analysis, especially Cern, which is looking for the proverbial needles in a haystack.

  • Duval and Stachenfeld

    Wow. Just thinking about this spins my head in circles. It will be so interesting to see how this all plays out. And to think about how this type of technology might be used in ways that are not in the public eye or inside the arm of the law.