Working in the Hadoop Ecosystem

Working with big data and open source software

I recently sat down with Mark Grover (@mark_grover), a Software Engineer at Cloudera, to talk about the Hadoop ecosystem. He is a committer on Apache Bigtop and a contributor to Apache Hadoop, Hive, Sqoop, and Flume. He also contributed to O’Reilly Media’s Programming Hive title.

Key highlights include:

  • Marks spends a lot of time in and around the Hadoop ecosystem. So I asked him to provide an overview of the environment and why someone would want to use these tools. He tells us how Hadoop has applications in finance, marketing, advertising, and healthcare industries. It’s completely changed how data is mined and how we make use of it. [Discussed at 0:24]
  • Hadoop is a step in the right direction to handle big data regardless of whether it’s structured or unstructured. [Discussed at 1:32]
  • While Hadoop is a cheaper cost per terabyte solution, its flexibility in handling today’s increasing amounts of unstructured data make it a big data environment regardless of cost. [Discussed at 2:39]
  • Mark gives examples of Hadoop’s use in cancer research and suicide prevention. [Discussed at 4:10]
  • How to get support from the community of resources including vendors, enthusiasts, and developers. [Discussed at 5:26]
  • Finding and choosing a vendor. [Discussed at 6:46]
  • Creating a test environment. [Discussed at 7:40]
  • How to get involved in an open source project. [Discussed at 8:44]
  • You can view the full interview here:


tags: , , ,