"training" entries

Validating data models with Kafka-based pipelines

A case for back-end A/B testing.

Start the O’Reilly “Introduction to Apache Kafka” training video for free. In this video, Gwen Shapira shows developers and administrators how to integrate Kafka into a data processing pipeline.

A/B testing is a popular method of using business intelligence data to assess possible changes to websites. In the past, when a business wanted to update its website in an attempt to drive more sales, decisions on the specific changes to make were driven by guesses; intuition; focus groups; and ultimately, which executive yelled louder. These days, the data-driven solution is to set up multiple copies of the website, direct users randomly to the different variations and measure which design improves sales the most. There are a lot of details to get right, but this is the gist of things.

When it comes to back-end systems, however, we are still living in the stone age. Suppose your business grew significantly and you notice that your existing MySQL database is becoming less responsive as the load increases. Suppose you consider moving to a NoSQL system, you need to decide which NoSQL solution to pick — there are a lot of options: Cassandra, MongoDB, Couchbase, or even Hadoop. There are also many possible data models: normalized, wide tables, narrow tables, nested data structures, etc.

A/B testing multiple data stores and data models in parallel

It is surprising how often a company will pick a solution based on intuition or even which architect yelled louder. Rather than making a decision based on facts and numbers regarding capacity, scale, throughput, and data-processing patterns, the back-end architecture decisions are made with fuzzy reasoning. In that scenario, what usually happens is that a data store and a data model are somehow chosen, and the entire development team will dive into a six-month project to move their entire back-end system to the new thing. This project will inevitably take 12 months, and about 9 months in, everyone will suspect that this was a bad idea, but it’s way too late to do anything about it. Read more…

Comment: 1

Announcing Cassandra certification

A new partnership between O’Reilly and DataStax offers certification and training in Cassandra.

apache-cassandra-certified-300x300I am pleased to announce a joint program between O’Reilly and DataStax to certify Cassandra developers. This program complements our developer certification for Apache Spark and — just as in the case of Databricks and Spark — we are excited to be working with the leading commercial company behind Cassandra. DataStax has done a tremendous job growing and nurturing the Cassandra community, user base, and technology.

Once the certification program is ready, developers can take the exam online, in designated test centers, and at select training courses. O’Reilly will also be developing books, training days, and videos targeted at developers and companies interested in the Cassandra distributed storage system.

Cassandra is a popular component used for building big data and real-time analytic platforms. Its ability to comfortably scale to clusters with thousands of nodes makes it a popular option for solutions that need to ingest and make sense of large amounts of time series and event data. As noted in an earlier post, real-time event data are at the heart of one of the trends we’re closely following: the convergence of cheap sensors, fast networks, and distributed computation. Read more…

Comments: 2

Wrap-up from FLOSS Manuals book sprint at Google

Mixtures of grassroots content generation and unique expertise have existed, and more models will be found. Understanding the points of commonality between the systems will help us develop such models.

Comments: 3

FLOSS Manuals books published after three-day sprint

Joining the pilgrimage that all institutions are making toward wider data use, FLOSS Manuals is exposing more and more of the writing process.

Comment

Day two of FLOSS Manuals book sprint at Google Summer of Code summit

As a relatively conventional book, the KDE manual was probably a little easier to write (but also probably less fun) than the more high-level approaches taken by some other teams that were trying to demonstrate to potential customers that their projects were worth adopting.

Comment

Day one of FLOSS Manuals book sprint at Google Summer of Code summit

Four teams at Google launched into endeavors that will lead, less than 72 hours from now, to complete books on four open source projects.

Comment: 1

FLOSS Manuals sprint starts at Google Summer of Code summit

Four free software projects have each sent three to five volunteers to write books about the projects this week. Along the way we'll all learn about the group writing process and the particular use of book sprints to make documentation for free software.

Comment
Four short links: 8 October 2010

Four short links: 8 October 2010

Training Tricks, Visualizing Code, ASM+XML=ASMXML, and Poetic License

  1. Training Lessons Learned: Interactivity (Selena Marie Deckelmann) — again I see parallels between how the best school teachers work and the best trainers. I was working with a group of people with diverse IT backgrounds, and often, I asked individuals to try to explain in their own words various terms (like “transaction”). This helped engage the students in a way that simply stating definitions can’t. Observing their fellow students struggling with terminology helped them generate their own questions, and I saw the great results the next day – when students were able to define terms immediately, that took five minutes the day before to work through.
  2. Software Evolution Storylines — very pretty visualizations of code development, inspired by an xkcd comic.
  3. asmxml — XML parser written in assembly language. (via donaldsclark on Twitter)
  4. Poetic License — the BSD license, translated into verse. Do tractor workers who love tractors a lot translate tractor manuals into blank verse? Do the best minds of plumber kid around by translating the California State Code into haikus? Computer people are like other people who love what they do. Computer people just manipulate symbols, whether they’re keywords in Perl or metrical patterns in software licenses. It’s not weird, really. I promise.
Comment: 1

Four roles for publishers: staying relevant when you are no longer a gatekeeper

In many areas of publishing, there are enormous resources of free
online material and innumerable forums where individuals can quickly
and conveniently post their own observations. Since we are no longer
gatekeepers, publishers have to focus on how we add quality.

Comments: 40