Strata Gems: Three key data trends for 2011

Data markets, real-time technology, and the race for developers

We’ve published a Strata Gem each day this December through to the 24th. Yesterday’s Gem: CouchDB in the browser.

Strata 2011 To conclude our Strata Gems series, let’s take a look at what lies in store for the data world in 2011.

The market for data

Data marketplaces will come of age in 2011. Startup Infochimps is growing fast, recently having acquired DataMarketplace.com. Microsoft launched its Windows Azure Data Market in 2010, and social data provider Gnip scored a coup by becoming the official data reseller for Twitter.

Marketplaces mean two things. Firstly, it’s easier than ever to find data to power applications, which will enable new projects and startups and raise the level of expectation—for instance, integration with social data sources will become the norm, not a novelty. Secondly, it will become simpler and more economic to monetize data, especially in specialist domains.

The knock-on effect of this commoditization of data will be that good quality unique databases will be of increasing value, and be an important competitive advantage. There will also be key roles to play for trusted middlemen: if competitors can safely share data with each other they can all gain an improved view of their customers and opportunities.

The rise of real-time

This year’s big data poster child, Hadoop, has limitations when it comes to responding in real-time to changing inputs. Despite efforts by companies such as Facebook to pare Hadoop’s MapReduce processing time down to 30 seconds after user input, this still remains too slow for many purposes.

Google has led the way in 2010: their Caffeine indexer replaced the old MapReduce index, enabling instant updates to Google’s search index in response to crawled data. It’s important to note that MapReduce hasn’t gone away, but systems are now becoming hybrid, with both an instant element in addition to the MapReduce layer.

The drive to real-time, especially in analytics and advertising, will continue to expand the demand for NoSQL databases. Expect growth to continue for Cassandra and MongoDB. In the Hadoop world, HBase will be ever more important as it can facilitate a hybrid approach to real-time and batch MapReduce processing.

The race for developers

Much of the change in the way data is being handled, celebrated at Strata, comes from the grassroots. Open source projects and cloud infrastructure means developers can evaluate and learn to love technologies without requiring support or approval from above.

As the industry recognizes again that developers are the kingmakers, keeping barriers to adoption low will be key. Industry incumbents will need to ensure that their products are compatible with the likes of Hadoop, as means of remaining a relevant technology choice in the eye of developers.

On the other side of the equation, the scope of Hadoop’s penetration into traditional database shops will be extended by glue technologies that seek to present Hadoop through a recognizable interface such as SQL or spreadsheets. This opportunity has already spawned strong startups in Karmasphere and Datameer, and will continue to be a growth area.

  • Find out more about the changing data world by joining us at Strata, 1-3 February 2011, Santa Clara.
tags: , , , ,