The Solid State Storage Revolution: If you haven’t seen it, I recommend you watch Andy Bechtolsheim’s keynote at the recent Mysqlconf. We covered SSD’s in our just published report on Big Data management technologies. Since then, we’ve gotten additional signals from our network of alpha geeks and our interest in them remains high.
R and Linked Data Streams: I had a chance to visit with Dataspora founder and blogger Mike Driscoll, an enthusiastic advocate for the use of the open source statistical computing language, R. After founding and leading online retailer CustomInk.com, Mike went back to grad school and earned a doctorate in Bioinformatics. He has applied data analysis and programming in a variety of domains including retail, biotech, academia, and government projects.
Having been an avid user of S/S-Plus in the 1990’s, I seamlessly switched over to R in the early 2000’s. To this day, I consider the S/S-Plus user manuals to be the best reference and introductory books on the R programming language. (Mike wholeheartedly agrees.) R has been popular in the statistics community for many years, but I’ve been noticing that its visualization and analytic capabilities are attracting interest from developers. Moreover, recent efforts by the R community to improve its ability to scale large data sets (see brief update from Jay Emerson), will strengthen R’s place in the Big Data stack.
While we talked about statistics and R, our main focus was Big Data. Mike is particularly excited about the growing number of open data sources, and the potential for linking them together to create interesting applications. The growing importance of data is something we’ve covered in recent years. Tim highlighted early on that companies who accumulate data are usually able to develop interesting services, many of which involve non-obvious uses of their vast data collections (see “Data as the new Intel Inside”). In addition, the concept of linking different data sources was at the heart of our Money:Tech conference.