DeepDive — DeepDive is targeted to help users extract relations between entities from data and make inferences about facts involving the entities. DeepDive can process structured, unstructured, clean, or noisy data and outputs the results into a database.
Running Kafka at Scale (LinkedIn Engineering) — This tiered infrastructure solves many problems, but it greatly complicates monitoring Kafka and assuring its health. While a single Kafka cluster, when running normally, will not lose messages, the introduction of additional tiers, along with additional components such as mirror makers, creates myriad points of failure where messages can disappear. In addition to monitoring the Kafka clusters and their health, we needed to create a means to assure that all messages produced are present in each of the tiers, and make it to the critical consumers of that data.