Strata Week: There's money in data sifting

DataSift lands funding, popping the hood on Google Plus, data products for education

Here are a few of the data stories that caught my attention this week.

Big bucks for DataSift and for data from Twitter’s firehose

DataSiftThe social media data mining platform DataSift — one of the two companies that has the rights to re-syndicate the data from Twitter’s firehose — announced this week that it has raised $6 million in a Series A round. (The other company with those rights is Gnip, whose handling of the firehose we recently covered.) DataSift aggregates data from other social media streams as well as Twitter, including Facebook, WordPress, and Digg. While providing the tools to “sift” this content and layering it with other metadata makes DataSift compelling, it’s the company’s connection to Twitter that may have piqued the most interest.

DataSift grew out of the company MediaSift, the same business that created Tweetmeme, a tool that fell into disfavor when Twitter launched its own sharing button. That move on the part of Twitter to take over functions that third-party developers once provided has had some negative implications on those in the Twitter ecosystem. At this stage, it seems like Twitter is willing to leave some of the big data processing to other companies.

Investor Mark Suster of GRP Partners, whose firm was one of the leaders in this round of DataSift’s investment, made the announcement that he was “doubling down on the Twitter ecosystem.” For its part, DataSift “has a product that will turn the stream into a lake,” says Suster. In other words, “The Twitter stream like most others is ephemeral. If you don’t bottle it as it passes by you it’s gone. DataSift has a product that builds a permanent database for you of just the information you want to capture.”

But Suster’s announcement also reiterates the importance of Twitter, something that seems particularly relevant in light of the new Google Plus. Suster describes Twitter as real-time, open, asymmetric, social, viral, location-aware, a referral network, explicit, and implicit. But as the buzz over Google Plus continues, it’s not clear that Twitter really holds the corner on all of these characteristics any longer.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

What’s under the Google Plus hood?

Google PlusSpeaking of Google Plus, there’s been lots of commentary and speculation about how successful the launch of the new social network has been, gauged in part on how quickly that network is growing. According to Ancestry.com founder Paul Allen, Google Plus was set to break the 10 million user mark on July 12, just two weeks after its launch. Ubermedia’s Bill Goss went so far as to predict that Google Plus would become the fastest growing social network in history.

So how does Google do it (in terms of the technology)? According to the project’s technical lead — and OSCON speakerJoseph Smarr:

Our stack is pretty standard fare for Google apps these days: we use Java servlets for our server code and JavaScript for the browser-side of the UI, largely built with the (open-source) Closure framework, including Closure’s JavaScript compiler and template system. A couple nifty tricks we do: we use the HTML5 History API to maintain pretty-looking URLs even though it’s an AJAX app (falling back on hash-fragments for older browsers); and we often render our Closure templates server-side so the page renders before any JavaScript is loaded, then the JavaScript finds the right DOM nodes and hooks up event handlers, etc. to make it responsive (as a result, if you’re on a slow connection and you click on stuff really fast, you may notice a lag before it does anything, but luckily most people don’t run into this in practice). Our backends are built mostly on top of BigTable and Colossus/GFS, and we use a lot of other common Google technologies such as MapReduce (again, like many other Google apps do).


(Google’s Joseph Smarr, a member of the Google+ team, will discuss the future of the social web at OSCON. Save 20% on registration with the code OS11RAD.)

Data products for education

DonorsChooseThe charitable giving site DonorsChoose has been running a contest called Hacking Education, and the contest’s finalists have just been announced. DonorsChoose lets people make charitable contributions to public schools, supporting teachers’ projects with a Kickstarter-like site for education. DonorsChoose opened up its data to developers — this data encompassed more than 300,000 classroom projects that have inspired some $80 million in charitable giving.

The finalists were chosen from over 50 apps and analyses and included a visualization of the kinds of projects teachers proposed and the kinds donors supported, a .NET Factbook, and an automatic press release system so that local journalists could be notified about projects. The grand prize winner has yet to be chosen, but that project will receive a trophy — and a big thumbs up — from Stephen Colbert.

Got data news?

Feel free to email me.

Related:

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.