Big data, but with a familiar face

Martin Hall explains how Karmasphere is integrating Hadoop into enterprises.

Strata Conference 2011To prepare for O’Reilly’s upcoming Strata Conference, we’re continuing our series of conversations with some of the leading innovators working with big data and analytics. Today, we have a brief chat with Martin Hall, co-founder, president, and CEO of Karmasphere.

Karmasphere is one of several companies shipping commercial tools that make big data more accessible to developers and analysts. Hall said the company’s products focus on making the data accessible by integrating with tools and languages familiar to developers — like SQL.

“We’re focused on providing a new kind of software for working with big data stored in Hadoop clusters,” Hall said during a recent interview. “In particular, tools for developers and analysts, and doing it in such a way that they get familiar tools and familiar environments and can quickly be very productive analyzing and transforming data stored in Hadoop clusters.”


The integration of big data into business will be discussed at the Executive Summit at the upcoming Strata Conference (Feb. 1-3, 2011). Save 30% on registration with the code STR11RAD.


Karmasphere Studio is the company’s main product for developers. It’s a graphical interface for programming and debugging MapReduce jobs, and it integrates within IDEs like Eclipse and NetBeans. The company recently announced Karmasphere Analyst, which offers a familiar SQL interface for querying Hadoop clusters.

Hall said businesses typically dip their toes into big data with a small research and development cluster. “Once they see success with that, they deploy it into production. Once they have it in production, they’re looking to connect it with other data sources.”

Over the past 18 months, customers have been asking Karmasphere for more and better visualization tools, not only at the front end where decision-makers need insights, but for developers who need “more ability to see what’s going on in the cluster, to see the progress of their jobs, to analyze and debug what’s going on.” Hall said they’re working on more hook-ins with existing visualization packages.

“We don’t expect people who are embracing Hadoop to have to sweep away everything they’ve invested in, in terms of skill sets, hardware, or software,” Hall said. “It’s an integration story.”

You’ll find the full interview in the following video:

tags: , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

  • Michaela

    I was trying to explain “big data” to someone not in this field. What definition would you give it?

  • Davd Sims

    My friend Edd Dumbill uses a very simple definition that I’ve adopted: Big Data is what we call a set of data when it has grown too large to be managed by conventional methods and tools. It’s not an entirely new problem: web searching tools, financial transactions, and data from sensors have been accumulating pools of Big Data for years. But the tools that have come out of what’s been learned, and their availability to a wider group of developers & analysts, are part of a changing dynamic. The term also seems like it’s helpful as a shorthand when discussing the opportunities presented by large data sets, whether that’s in finding information, serving ads, measuring risk, observing patterns, predicting change, or any of a plethora of other uses.

    How does that strike you?