Four short links: 25 June 2010

More NoSQL, Data Medicine, Startups to Government, and Cake-and-eat-it Open Source

  1. Membasean open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. (via Hacker News)
  2. Sergey’s Search (Wired) — Sergey Brin, one of the Google founders, learned he had a gene allele that gave him much higher odds of getting Parkinson’s. His response has been to help medical research, both with money and through 23andme. Langston decided to see whether the 23andMe Research Initiative might be able to shed some insight on the correlation, so he rang up 23andMe’s Eriksson, and asked him to run a search. In a few minutes, Eriksson was able to identify 350 people who had the mutation responsible for Gaucher’s. A few clicks more and he was able to calculate that they were five times more likely to have Parkinson’s disease, a result practically identical to the NEJM study. All told, it took about 20 minutes. “It would’ve taken years to learn that in traditional epidemiology,” Langston says. “Even though we’re in the Wright brothers early days with this stuff, to get a result so strongly and so quickly is remarkable.”
  3. (YouTube) — Anil Dash talk at Personal Democracy Forum on applying insights from startups to government. I hope the more people say this, the greater the odds it’ll be acted on.
  4. Open Core Software — Marten Mickos (ex-MySQL) talks up “open core” (open source base, proprietary extensions) as a way to resolve the conflict of “change the world with open source” and “make money”. Brian Aker disagrees: There has been no successful launch of an open core company that has reached any significant size, especially of the size that Marten hints at in the article. My take: there are three reasons for open source (freedoms, price, and development scale) and if you close the source to part of your product then the whole product loses those benefits. If you open source enough that the open source bit has massive momentum, then you probably don’t have enough left proprietary to gain huge financial benefit.
tags: , , , , , , ,
  • Alex Tolley

    re: Sergey’s Search

    Yet again we get a breathless article on the wonders of big data, suggesting that the 123andme genetic database is a gold mine. This particular story contains the extract that the link between Gaucher’s and Parkinsons, found by by researchers published in the NEJM (appeal to authority) supports the claim that there is real gold in them thar genetic mines.

    Firstly, this is a correlation. And whilst we all know that that does not imply causation, every day you can read about some new epidemiology study that links X with Y, pure correlations, every one. And every day we can read about a study that contradicts and earlier study. We are just seeing randomly generated “significant results”, which will get worse with data mining.

    The problem with this approach, is that exhaustive data mining of large data sets will generate a very large number of purely random, spurious links between variables, of which only a very few will turn out to be real.

    If we are going to [inevitably?] engage in large scale data mining, we are going to need to develop better tools to eliminate these spurious links to better narrow them down to those that are worth doing scientific research on.

    A good question to ask of the 123andme database is: “how many variables [SNPs?] would also generate the same link with Parkinsons as Gaucher’s syndrome?”

  • Alex Tolley

    follow up comment.

    Given that Sergey’s wife runs 123andme, one might think that there is already a small team of people data mining the patient data with at least an eye open for anything that might be relevant to Parkinsons to report back on. So why wasn’t that link with Gaucher’s found before the NEJM article was published?

    My guess is that even with Google’s resources, this is a problem that is more than exponentially hard to crack. In other words it is easy to find the link when you know what you are looking for, but hard to find that needle in a haystack of false positives.

  • Alex Popescu

    For those impatient to find out more about Membase, I have summarized the most important aspects on the NoSQL focused blog myNoSQL

  • Jim Stogdill

    Open Core attempts to answer the question “how can we intentionally commoditize this stuff for massive and rapid distribution while retaining pricing leverage so we eventually go big in philanthropy like Bill?” The actual answer is, you can’t. Or, maybe open core actually answers the question “how should we structure our open source project to preserve a future poison pill for Oracle when they buy us?”

  • Hamranhansenhansen

    > There has been no successful launch
    > of an open core company that has reached
    > any significant size

    Apple. The core software in all of their products is open source.

  • Sam Penrose

    I would like Sergey (or Larry, I’m not picky) to please develop whatever disease is going to cause the most tragedy in my family. They need to do it now, so that my beloved has as much lead time as possible. I don’t know what the disease is, sorry, but maybe they can ask some programmers to figure that out.

    Congress is full of Republicans who vote to slash spending on health except for Disease X, where Disease X afflicts one of their children. May I humbly suggest that the personal experiences of rich and powerful people are perhaps not the best heuristic for making big choices about health care?