The Lambda Architecture has its merits, but alternatives are worth exploring.
Nathan Marz wrote a popular blog post describing an idea he called the Lambda Architecture (“How to beat the CAP theorem“). The Lambda Architecture is an approach to building stream processing applications on top of MapReduce and Storm or similar systems. This has proven to be a surprisingly popular idea, with a dedicated website and an upcoming book. Since I’ve been involved in building out the real-time data processing infrastructure at LinkedIn using Kafka and Samza, I often get asked about the Lambda Architecture. I thought I would describe my thoughts and experiences.
What is a Lambda Architecture and how do I become one?
The Lambda Architecture looks something like this:
If all companies are software companies, then all companies must learn to manage their online operations.
Two years ago, I wrote What is DevOps. Although that article was good for its time, our understanding of organizational behavior, and its relationship to the operation of complex systems, has grown.
A few themes have become apparent in the two years since that last article. They were latent in that article, I think, but now we’re in a position to call them out explicitly. It’s always easy to think of DevOps (or of any software industry paradigm) in terms of the tools you use; in particular, it’s very easy to think that if you use Chef or Puppet for automated configuration, Jenkins for continuous integration, and some cloud provider for on-demand server power, that you’re doing DevOps. But DevOps isn’t about tools; it’s about culture, and it extends far beyond the cubicles of developers and operators. As Jeff Sussna says in Empathy: The Essence of DevOps:
…it’s not about making developers and sysadmins report to the same VP. It’s not about automating all your configuration procedures. It’s not about tipping up a Jenkins server, or running your applications in the cloud, or releasing your code on Github. It’s not even about letting your developers deploy their code to a PaaS. The true essence of DevOps is empathy.
Use teaching stacks to drive growth.
Elliott Hauser is CEO of Trinket, a startup focused on creating open sourced teaching materials. He is also a Python instructor at UNC Chapel Hill.
Well-developed tools for teaching are crucial to the spread of open source software and programming languages. Stacks like those used by the Young Coders Tutorial and Mozilla Software Carpentry are having national and international impact by enabling more people to teach more often.
The spread of tech depends on teaching
Software won’t replace teachers. But teachers need great software for teaching. The success and growth of technical communities are largely dependent on the availability of teaching stacks appropriate to teaching their technologies. Resources like try git or interactivepython.org not only help students on their own but also equip instructors to teach these topics without also having to discover the best tools for doing so. In that way, they play the same function as open source Web stacks: getting us up and running quickly with time-tested and community-backed tools. Thank goodness I don’t need to write a database just to write a website; I can use open source software instead. As an instructor teaching others to code websites, what’s the equivalent tool set? That’s what I mean by Teaching Stack: a collection of open tools that help individual instructors teach technology at scale.
Elements of a great teaching stack
Here are some of the major components of a teaching stack for a hands-on technology course:
Many more companies want to highlight how they're using Apache Spark in production.
One of the trends we’re following closely at Strata is the emergence of vertical applications. As components for creating large-scale data infrastructures enter their early stages of maturation, companies are focusing on solving data problems in specific industries rather than building tools from scratch. Virtually all of these components are open source and have contributors across many companies. Organizations are also sharing best practices for building big data applications, through blog posts, white papers, and presentations at conferences like Strata.
These trends are particularly apparent in a set of technologies that originated from UC Berkeley’s AMPLab: the number of companies that are using (or plan to use) Spark in production1 has exploded over the last year. The surge in popularity of the Apache Spark ecosystem stems from the maturation of its individual open source components and the growing community of users. The tight integration of high-performance tools that address different problems and workloads, coupled with a simple programming interface (in Python, Java, Scala), make Spark one of the most popular projects in big data. The charts below show the amount of active development in Spark:
For the second year in a row, I’ve had the privilege of serving on the program committee for the Spark Summit. I’d like to highlight a few areas where Apache Spark is making inroads. I’ll focus on proposals2 from companies building applications on top of Spark.
A new mantra for your next (programming) meditation session.
You might feel fine.