Behind the Scenes of the First Spark Summit

How it All Started

Spark is a popular open source cluster computing engine for Big Data analytics and a central component of the Berkeley Data Analytics Stack (BDAS). It started as a research project in the UC Berkeley AMPLab and was developed with a focus on attracting production users as well as a diverse community of open source contributors. A community quickly began to grow around Spark, even as a young project in the AMPLab. Before long, this community began gathering at monthly meetups and using mailing lists to discuss development efforts and share their experiences using Spark. More recently the project entered the Apache Incubator.

This year, the core Spark team spun out of the AMPLab to found Databricks, a startup that is using Spark to build next-generation software for analyzing and extracting value from data. At Databricks, we are dedicated to the success of the Spark project and are excited to see the community growing rapidly. This growth demonstrates the need for a larger event that brings the entire community together beyond the meetups. Thus we began planning the first Spark Summit. The Summit will be structured like a super-sized meetup. Meetups typically consist of a single talk, a single sponsor, and dozens of attendees, whereas the Summit will consist of 30 talks, 18 sponsors, hundreds of attendees, and a full day of training exercises.

We understand that an open source project is only as successful as its underlying community. Therefore, we want the Summit to be a community driven event.

Behind the Scenes with the Summit Ops and the Program Committee

The first thing we did was bring in a third-party event producer with a track record of creating high quality open source community events. By separating out the event production we allowed all of the community leaders to share ownership of the technical portion of the event. For example, instead of inviting speakers directly, we hosted an open call for talk submissions. Then we assembled a Program Committee consisting of representatives from 12 of the leading organizations in the Spark community. Finally, the PC members voted on all of the talk submissions to decide the final summit agenda.

The event has been funded by assembling a sponsor network consisting of organizations within the community. With sponsors that are driving the development of the platform, the summit will be an environment that facilitates connections between developers with Spark skills and organizations searching for such developers. We decided that Databricks would participate in the Summit as a peer in this sponsor network.

Also, in the spirit of open source, we felt that it was important for the event to be live streamed and archived online at no cost.

The Summit program will consist of keynote talks by community leaders as well as technical talks selected by the program committee which will cover a wide range of topics, including:

  • deploying and operating Spark at scale
  • building vertical applications and business solutions on top
  • debugging, developing, and extending Spark
  • next generation research
  • the development roadmap

We have been thrilled to see the community take ownership and drive the summit to enormous success. So far, hundreds have registered to attend the Summit, and the day of Spark Training is sold-out.

So, whether you are new to Spark or a seasoned Spark developer, register to attend in person (use the code spark-summit-oreilly for 10% off) or watch via live streaming for free.

We look forward to seeing you there!

Related Resources

Strata SC 2014 Image

O’Reilly Strata Conference — Strata brings together the leading minds in data science and big data — decision makers and practitioners driving the future of their businesses and technologies. Get the skills, tools, and strategies you need to make data work.

Strata in Santa Clara: February 11-13 | Santa Clara, CA

tags: , ,