Strata Week: Building data startups

Strata registration opens, making money with data, dolphins and cellphones, data in the dirt

Here’s a look at the latest data news and developments that caught my eye.

Registration open for Strata 2011

Strata RegistrationYou can find out just what has us so excited about data at the O’Reilly Strata Conference, Feb. 1-3, 2011, in Santa Clara, Calif. early registration rates are available through December 14.

The Strata program features tutorials on data and visualization, an executive-level briefing event on big data, and two days of conference sessions and keynotes. We’ll hear from big business, startups, and the brightest developers and researchers. Watch for further details about the schedule over the coming weeks.

We find ourselves at the beginning of an industrial revolution of data, heralded by unprecedented volumes of data and connectivity, cheap and ubiquitous computing, and advances in interface technology. Strata will be the defining event of this movement, so I very much hope you’ll join us there.

From data to money: Building a startup

Thanks to commodity computing power, it’s possible to build a startup business based around big data and analytics. But what does it take to do this, and how can you make money? These questions were addressed recently in blog posts by Russell Jurney and Pete Warden.

Jurney takes on the question of how many people you need to start a data product team. He draws out the ideal roles for such a team including: customer, market strategist, deal maker, product manager, experience designer, interaction designer, web developer, data hacker and researcher.

Quite the cast, and not really the ideal starting point for a product or business startup, so Jurney condenses these roles into the more succinct definitions of “hustler,” “designer” and “prodineer” — a minimum of three people.

Analytic products are such a multidisciplinary undertaking that in a data startup a founding team is at minimum three people. Ideally all are founders. There are probably exceptions, but that is the minimum number of bodies required to flesh out all these areas with passionate people who share the vision and are deeply invested in the success of the company. Someone needs to be good at and enjoy each of these roles.

Once you start, and have a minimal product, Jurney recommends quickly connecting with real customers, and taking it from there. The next step is making money, of course, which is what Pete Warden has been thinking about.

After running through a “thousand ways not to do it,” Warden reckons finding a way to make money is the most important question for big data startups. He paints the stages of evolution a data product goes through to actually deliver value to customers.

  • Data: You need it, but selling it raw is the lowest level of business. Warden writes “The data itself, no matter how unique, is low value, since it will take somebody else a lot of effort to turn it into something they can use to make money”.
  • Charts: Simple graphs, which at least help users understand what you have, but “still leaves them staring at a space shuttle control panel, though, and only the most dogged people will invest enough time to understand how to use it.”
  • Reports: Bring a focus to what the customer wants. Many data-driven startups stop here and make good money doing that. But there’s further to go: “It can be very hard to defend this position. Unless you have exclusive access to a data source, the barriers to entry are low and you’ll be competing against a lot of other teams”.
  • Recommendations: Your product now goes from raw data and produces actionable recommendations, a much more defensible business. “To get here you also have to have absorbed a tremendous amount of non-obvious detail about the customer’s requirements, which is a big barrier to anyone copying you,” Warden writes.

Ending his piece, Warden offers this pithy advice: “More actionable means more valuable!”

Data in the dirt

What would you say to a pub full of people about data? That was my challenge as I gave a talk at Ignite Sebastopol 4, held in O’Reilly’s hometown of Sebastopol, Calif. Explaining some of the 200-year history of Strata, I had to use twenty slides for 15 seconds each to get my point across.

Dolphins, cellphones and social networks

A couple of recent research reports bring interesting insights from social networks outside of the online worlds of Facebook and Twitter. Writing in Ars Technical, Casey Johnston reports on how the mathematics of text messaging might help mobile phone networks plan capacity. Researchers discovered that text-messaging patterns were generally bimodal.

Text message sets often start off with a burst: the times between messages are short and follow a power-law distribution (that is, there are a lot of text messages with short intervals between them).

Outside of an initial two- to 20-minute window, though, the time between messages falls dramatically. There are fewer, longer intervals between messages, and the tail can extend up to five or six hours past the initial burst, as the intervals continue to grow longer and the texts less frequent.

The researchers took these observations, and developed models to explain what they saw. The model assumed that text exchanges were primarily task-focused, dealing with some issue the conversants had in common, such as deciding what to eat for dinner.

Cliques in a dolphin community.
Cliques in a dolphin community noted in a Microsoft research report (PDF).

Karate students and dolphin pods feature in recent research from Microsoft, explained by Christopher Mims in his Technology Review blog. Using a new approach built on game theory, researchers were able to model cliques in communities. Possible applications of the research include urban development, criminal intelligence and marketing. Mims explains the wide applicability of the technique:

Intriguingly, two of the data sets the researchers tested their work on, which are apparently standard for this kind of research, were data gathered by anthropologists about a Karate academy, and data gathered by marine biologists about a pod of 64 dolphins. Applying their game-theoretic approach to both networks, they were able to resolve cliques that other approaches missed entirely.

Resolving cliques also has applications in determining identity, Mims points out. Individuals with non-unique names can be identified instead by the community footprint generated by their clique membership.

Gangsta test data

Perhaps one of the best known pieces of test data is the Lorem Ipsum text, used by graphic designers as a substitute for real text during the “greeking” process. This venerable text has now received an update for contemporary culture, courtesy of a couple of Dutch developers.

The Gangsta Lorem Ipsum generator serves up such modern nonsenses as Lorizzle bling bling dolor we gonna chung amizzle, consectetuer adipiscing dizzle.

Send us news

Email us news, tips and interesting tidbits at

tags: , , , ,