Big Data: Technologies and Techniques for Large-Scale Data

Our belief that proficiency in managing and analyzing large amounts of data distinguishes market leading companies, led to a recent report designed to help users understand the different large-scale data management techniques. Our report on Big Data Technologies was the result of interviews with over thirty experts, including research scientists, (open-source) hackers, vendors, data analysts, and entrepreneurs. Rather than endorse specific vendors and technologies, we provide a framework to help readers navigate the wide variety of options available. (NOTE: If you’re interested in purchasing the report as a single-issue of Release 2.0, we can provide you with a DISCOUNT CODE. Contact information is at the end of the video clip below.)

I recently sat down with my co-author, Roger Magoulas (Director of Research at O’Reilly), who agreed talk about our report and Big Data in general. Roger begins by speaking passionately of the importance of data management and analysis. He proceeds to highlight what we believe to be the key technology dimensions for evaluating data management solutions. The video ends with a glimpse into future technologies and general advice to organizations interested in improving their proficiency in handling data.

The full program is available in four extended clips:

  • What is Big Data and why is it important? (3:33 minutes)
  • Big Data Technologies (1:35 minutes)
  • Key Technology Dimensions (4:52 minutes)
  • A Look Into The Future and Closing Summary (3:42 minutes)
  • [ Head over to O’Reilly Media’s Youtube channel for other interesting videos. ]

    tags: , , , , , ,

    Get the O’Reilly Data Newsletter

    Stay informed. Receive weekly insight from industry insiders.

    • Ben and Roger –

      Your interview gives a great overview of some of the nuts and bolts choices in Big Data management — Hadoop vs RDBMS (column vs row-based), Mapreduce vs SQL, and real-time solutions.

      I’d like to hear more about what happens once Big Data is managed, namely how it’s analyzed. What questions can firms now answer that they previously couldn’t? Ultimately it’s this analysis that delivers a competitive advantage.

      • Hi Michael,
        Thanks for your comments. In the report we highlighted the three key aspects of data: acquisition, management, analysis/insight. We focused on data management because we know many companies who have been struggling to handle their data. To the point that even rudimentary analysis (counts, averages) becomes extremely difficult. At the same time, using different data management tools/techniques, some of the companies we track seemed to be doing fine with similar amounts of data.

        All companies eventually want to get to the analysis/insight phase, we wanted to help them get there quickly by making the task of data management less daunting.

        Given my background, I’m equally interested in the analysis/insight phase. We’ll have more to say on that front in the near future.


    • Ben,

      We have released a horizontally scaled open source semantic web / graph database named bigdata(R) [1]. The core implementation is a distributed B+Tree architecture with dynamic sharding. The semantic web database is layered over that, but you can use bigdata(R) for other applications as well. Unlike hadoop or hbase, you can efficiently execute joins and high-level query against multiple indices — without the overhead of an RDBMS.

      People do not automatically associate the semantic web with large scale data sets, but we think that is about to change. The combination of a horizontally scaled architecture with the semantic web standards is an excellent tool for people interested in analytics since you can fluidly mash up very large data sets, declaratively align the data using some semantic web primitives (owl:sameAs, owl:equivalentClass, etc.), and then slice and dice the data in any manner you choose.