"strata conference" entries

Announcing Spark Certification

A new partnership between O’Reilly and Databricks offers certification and training in Apache Spark.

Editor’s note: full disclosure — Ben is an advisor to Databricks.

spark-logoI am pleased to announce a joint program between O’Reilly and Databricks to certify Spark developers. O’Reilly has long been interested in certification, and with this inaugural program, we believe we have the right combination — an ascendant framework and a partnership with the team behind the technology. The founding team of Databricks comprises members of the UC Berkeley AMPLab team that created Spark.

The certification exam will be offered at Strata events, through Databricks’ Spark Summits, and at training workshops run by Databricks and its partner companies. A variety of O’Reilly resources will accompany the certification program, including books, training days, and videos targeted at developers and companies interested in the Apache Spark ecosystem. Read more…

Make us think: a call for Strata keynote videos

Submit your suggestions for videos that make us think about how data, visualizations, and technology are changing us

Each year at Strata, we warm up the crowd in the main keynote sessions with short videos that will make people think. These videos demonstrate the ways that data, technology, and visualization are changing us. Some are funny; some are clever; some are downright disturbing.

For Strata New York + Hadoop World in October, we’re hoping you’ll join in and suggest some videos for us. If you’ve got something you feel captures the zeitgeist of technology at the fringes, then complete this form, and we’ll check it out. We’ll choose some of them as we kick off the event this fall.

Read more…

The 0th Law of Data Mining

Preview of The Laws of Data Mining Session at Strata Santa Clara 2013

Many years ago I was taught about the three laws of thermodynamics. When that didn’t stick, I was taught a quick way to remember originally identified by C.P. Snow:

  • 1st Law: you can’t win
  • 2nd Law: you can’t draw
  • 3rd Law: you can’t get out of the game

These laws (well the real ones) were firmly established by the mid 19th century. Yet, it wasn’t until the 1930s that the value of the 0th law was identified.

At Strata I’m going to be talking about the 9 Laws of Data Mining – a set of principles identified by Tom Khabaza and very closely related to the CRISP-DM data mining methodology.

They may possibly, just possibly, not be as important as the laws of thermodynamics, but at Strata they will be supported by an equally important 0th Law.

Read more…

Design matters more than math

Design compels. Math is proof. Both sides will defend their domains at Strata's next Great Debate.

At Strata Santa Clara later this month, we’re reprising what has become a tradition: Great Debates. These Oxford-style debates pit two teams against one another to argue a hot topic in the fields of big data, ubiquitous computing, and emerging interfaces.

What matters more? Our teams for the Great Debate.Part of the fun is the scoring: attendees vote on whether they agree with the proposal before the debaters; and after both sides have said their piece, the audience votes again. Whoever moves the needle wins.

This year’s proposition — that design matters more than math — is sure to inspire some vigorous discussion. The argument for math is pretty strong. Math is proof. Given enough data — and today, we have plenty — we can know. “The right information in the right place just changes your life,” said Stewart Brand. Properly harnessed, the power of data analysis and modeling can fix cities, predict epidemics, and revitalize education. Abused, it can invade our lives, undermine economies, and steal elections. Surely the algorithms of big data matter!

But your life won’t change by itself. Bruce Mau defines design as “the human capacity to plan and produce desired outcomes.” Math informs; design compels. Without design, math can’t do its thing. Poorly designed experiments collect the wrong data. And if the data can’t be understood and acted upon, it may as well not have been crunched in the first place.

This is the question we’ll be putting to our debaters: Which matters more? A well-designed collection of flawed information — or an opaque, hard-to-parse, but unerringly accurate model? From mobile handsets to social policy, we need both good math and good design. Which is more critical? Read more…

Deconstructing a Twitter spam attack

Data analysis shows the structure of a network can separate true influencers from fake accounts.

There has been a lot of discussion recently about the effect fake Twitter accounts have on brands trying to keep track of social media engagement. A recent tweet spam attack offers an instructive example.

On the morning of October 1, the delegates attending the Strata Conference in London started to notice that a considerable number of spam tweets were being sent using the #strataconf hashtag. Using a tool developed by Bloom Agency, with data from DataSift, an analysis has been done that sheds light on the spam attack directed at the conference.

The following diagram shows a snapshot of the Twitter conversation after a few tweets had been received containing the #strataconf hashtag. Each red or blue line represents a connection between two Twitter accounts and shows how information flowed as a result of the tweet being sent. By 11 a.m., individual communities had started to emerge that were talking to each other about the conference, and these can clearly be seen in the diagram.

Strataconf tweeting communities

Read more…

Live from the O’Reilly Strata Conference in London

Catch live keynotes from this week's Strata Conference in London.

Experts from across the data world are coming together at the O’Reilly Strata Conference in London this week. You can watch live keynotes from the event below (full broadcast schedule is available here).

Strata Week: Big data boom and big data gaps

One report says the Hadoop market is booming while another says federal data usage isn't.

In this week's big data news, an IDC report points to the booming market for Hadoop and MapReduce (and if proposals for Strata are any indication, this is indeed a good time for big data).

Visualization of the Week: Visualizing the Strata Conference

The Information Lab visualizes the Strata Conference's attendees.

This week's visualization comes from The Information Lab and shows who was at the Strata Conference, how far they traveled, and the data their companies produce.

Big data in the cloud

How do the cloud offerings from Amazon, Google and Microsoft compare?

Big data and cloud technology go hand-in-hand: but it's comparatively early days. Strata conference chair Edd Dumbill explains the cloud landscape and compares the offerings of Amazon, Google and Microsoft.

Strata Week: Genome research kicks up a lot of data

Where to store all that genome data? Also, clarifying the work of digital humanities scholars.

We take a look at the big data obstacles and opportunities for genomics, digital humanities scholars respond to Stanley Fish's mischaracterization of what they do with data, and Hadoop World and the Strata Conference merge.