Data is the Intel Inside: One More Small Sign

I’ve long argued that one of the important changes that’s part of Web 2.0 is that data is the “Intel Inside” of the next generation of computer applications. I’ve also remarked on Hal Varian’s comment that SQL is the new HTML. But it was still a bit of a surprise to see this week’s treemap visualization of Bookscan’s point of sale data on computer book sales:

treemap image with database in upper left corner

If you don’t look at these treemaps every week, you might not notice anything significant. But as Roger Magoulas, our Director of Research, pointed out in his weekly summary: “Database is now the top left category, displacing programming languages, which has traditionally appeared in that position. This week, database sales (11,170 units), exceeded programming languages (11,109 units) for the first time, creating the change in layout. (The treemap algorithm places the largest items in the top left.)”

It was a small change — database book sales up 1% for the week, programming language book sales down 1% — but it was enough to change the layout. This might be only a minor weekly variance that will shortly reverse, or a tipping point, but I thought it was worth a note. Is it significant that in this one week, more books sold on databases than on programming languages? Maybe not. But it’s certainly thought-provoking.

(Aside: That’s one of the things I love about visualization tools — they help you spot changes that you’d otherwise miss. These two categories are so nearly even — a difference of less than half a percent — that you’d never notice.)

For more information on our book data and visualization methodology, see Book Sales as a Technology Trend Indicator, or my various State of the Computer Book Market postings.

tags:
  • Risento T. Becker

    Well, if pay is anything to go by, it seems like it’s a good time to know SQL as well as Java or .NET.

    http://money.cnn.com/galleries/2007/news/0702/gallery.jobs_in_demand/6.html

  • http://dev.aol.com/blog/kevinfarnham/tim-oreilly-data-google-vger Kevin Farnham

    To me, this is a kind of tipping point. Significant enough that I spent a lot of this weekend thinking about it and writing about it!

    It appears that we’ve entered a realm where concise and useful designs for aggregating and indexing data have become more important than the languages that are applied to perform the aggregation.

    It makes me think of how, in Web 1.0, a company like AOL could become the one-stop portal, but that couldn’t last because the Web grew too fast and too wide for any one company to possibly encompass all of it.

    Today people look at Google, and think that they may indeed accomplish their stated mission of bringing all the world’s information into their complexly indexed platform.

    I doubt this will last, if it ever happens. Can even Google’s ingenious method of indexing data suffice for 2027’s needs? That would be surprising to me, given the changes I’ve seen in the past 40 years.

    Data–its aggregation, its ordering, its indexing, its organization via methods we’ve not yet heard of–is the defining and critically important “substance” of the future.

    Databases take over the top book sales spot, ahead of programming languages. Then, you look at AIM Pages, where the developers shunned even database software as adding too much code complexity, and chose to store the pages in HTML form on disk, constructing the viewed pages within each user’s browser through spontaneous population of the microformat elements that compose the page. The Web itself becomes the database!

  • http://www.koona.com Tomas Sancio

    A 486 Intel chip in an old PC can still work today for Word-Processing and calculations that don’t require much horsepower. An old version of MS Excel working on that chip will still give precise results.

    How much mid-nineties data is useful today? If a kid fills up his MySpace page with how much he loves the TV shows and rock bands of 2007, will that data be of use in 2009? Google a word today and then do that tomorrow. It changes.

    Sorry, it’s just that the analogy is quite confusing. Perhaps its because the real advantage is the way the data is processed quickly than the data itself and I just didn’t get it. Still, Google is not as transparent as an adding machine such as an Intel chip that will do anything the software asks it to. Google and other web aggregators will index the data in the way they want, like it or not.

  • http://www.base4.net/blog.aspx?Tag=Data2.0 Alex James

    Tim,

    I have long argued (although not as long as you probably) that the future belongs to data. This is just a reflection of that… I’ve been banging on about ‘Data2.0′ for a while now, for want of a better term.

  • http://tim.oreilly.com Tim O'Reilly

    Tomas — I hear you that it’s a bit of a stretch to equate “data” to Intel’s near monopoly on the PC processor. But there is a lot of meat to the analogy. The point is that Intel got their lock by being a single source for an important component.

    And that’s what Web 2.0 companies are doing today, becoming single sources, despite the open standards, ubiquitous nature of the web. If you build a big enough database, with the right kind of network effects, it becomes self reinforcing, and a kind of single source.

    Put up an auction site using technology that’s better than ebay’s. Why don’t people come to you? Because ebay has the critical mass of buyers and sellers. That’s not “data” per se, but all of the data they have invested in ebay that keeps them there rather than easily moving somewhere else is.

    Google Adwords are similarly self-reinforcing. People get better results from Google, so they keep diverting more and more of their ad dollars there. Google is well on their way to becoming a single source for search. Amazon is well on their way to becoming a single source for online books.

  • http://www.isUseful.com Matt Hobbs

    Sounds like great news to me – I’d much prefer to re-factor or replace the code base over a well designed and executed database, than fix a bad database design once it’s full of active data. Although there have been many attempts to replace it, SQL is still the best bang for the buck outside of hand rolling your own optimized data store.

  • http://www.pineywoodspicknparlor.com/ Charles Goodwin

    Definatly with you Matt on fixing a bad database design. I hope this is how the future is shaping up. Will be interesting to look back on this data in a year or two.