Dealing with Data in the Hadoop Ecosystem

Hadoop, Sqoop, and ZooKeeper

Kathleen Ting (@kate_ting), Technical Account Manager at Cloudera, and our own Andy Oram (@praxagora) sat down to discuss how to work with structured and unstructured data as well as how to keep a system up and running that is crunching that data.

Key highlights include:

  • Misconfigurations consist of almost half of the support issues that the team at Cloudera is seeing [Discussed at 0:22]
  • ZooKeeper, the canary in the Hadoop coal mine [Discussed at 1:10]
  • Leaky clients are often a problem ZooKeeper detects [Discussed at 2:10]
  • Sqoop is a bulk data transfer tool [Discussed at 2:47]
  • Sqoop helps to bring together structured and unstructured data [Discussed at 3:50]
  • ZooKeep is not for storage, but coordination, reliability, availability [Discussed at 4:44]

You can view the full interview here:


or listen to it here:

Related:

tags: , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.

Get the O’Reilly Web Ops and Performance Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

Get the O’Reilly Programming Newsletter

Weekly insight from industry insiders. Plus exclusive content and offers.

Get the O’Reilly Hardware Newsletter

Get weekly insight and knowledge on how to design, prototype, manufacture, and market great connected devices.

Get Four Short Links in Your Inbox

Sign up to receive Nat’s eclectic collection of curated links every weekday.

Get the O’Reilly Design Newsletter

Stay informed. Receive weekly insight from industry insiders.

Get the O’Reilly Web Platform Newsletter

Stay informed. Receive weekly insight from industry insiders—plus exclusive content and offers.