Big Data: An opportunity in search of a metaphor

Big data as a discipline or a conference topic is still in its formative years.

Strata job board
The crowd at the Strata Conference could be divided into two broad contingents:

  1. Those attending to learn more about data, having recently discovered its potential.
  2. Long-time data enthusiasts watching with mixed emotions as their interest is legitimized, experiencing a feeling not unlike when a band that you’ve been following for years suddenly becomes popular.

A data-oriented event like this, outside a specific vertical, could not have drawn a large crowd with this level of interest, even two years ago. Until recently, data was mainly an artifact of business processes. It now takes center stage; organizationally, data has left the IT department and become the responsibility of the product team.

Of course “data,” in its abstract sense, has not changed. But our ability to obtain, manipulate, and comprehend data certainly has. Today, data merits top billing due to a number of confluent factors, not least its increased accessibility via on-demand platforms and tools.   Server logs are the new cash-for-gold: act now to realize the neglected riches within your upper drive bay.

But the idea of “big data” as a discipline, as a conference subject, or as a business, remains in its formative years and has yet to be satisfactorily defined.  This immaturity is perhaps best illustrated by the array of language employed to define big data’s merits and its associated challenges. Commentators are employing very distinct wording to make the ill-defined idea of “big data” more familiar; their metaphors fall cleanly into three categories:

  • Natural resources (“the new oil,” “goldrush” and of course “data mining”): Highlights the singular value inherent in data, tempered by the effort required to realize its potential.
  • Natural disasters (“data tornado,” “data deluge,” data tidal wave”): Frames data as a problem of near-biblical scale, with subtle undertones of assured disaster if proper and timely preparations are not considered.
  • Industrial devices (“data exhaust,” “firehose,” “Industrial Revolution”): A convenient grab-bag of terminologies that usually portrays data as a mechanism created and controlled by us, but one that will prove harmful if used incorrectly.

If Strata’s Birds-of-a-Feather conference sessions are anything to go by, the idea of “big data” requires the definition and scope these metaphors attempt to provide. Over lunch you could have met with like-minded delegates to discuss big data analysis, cloud computing, Wikipedia, peer-to-peer collaboration, real-time location sharing, visualization, data philanthropy, Hadoop (natch’), data mining competitions, dev ops, data tools (but “not trivial visualizations”), Cassandra, NLP, GPU computing, or health care data. There are two takeaways here: the first is that we are still figuring out what big data is and how to think about it; the second is that any alternative is probably an improvement on “big data.”

Strata is about “making data work” — the tenor of the conference was less of a “how-to” guide, and more about defining the problem and shaping the discussion. Big data is a massive opportunity; we are searching for its identity and the language to define it.

tags: , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • http://www.mendeley.com/profiles/stephen-henderson/ Stephen

    The older phrase used to be ‘data mining’- which I think encapsulated facets of machine learning and stats, novel information discovery (patterns or outliers), larger datasets, dense visualisation– and the name suggested commercial application too.

    Indeed when I read ‘big data’ my brain just replaces it with ‘data mining’.

  • John B

    Perhaps, looking at genomics, “Information’s emergent behavior from raw data” or the like?

  • Cristian

    Ok, Data Mining…. yes.
    But this used to represent some private enterprise information around the gigabytes of data..
    BigData represents something very different in terms of size and scope of information, we’re talking about Petabytes of Public/Semipublic Information…