Strata Week: Why ThinkUp matters

ThinkUp and data ownership, DataSift turns on its Twitter firehose, and Google cracks opens the door to BigQuery.

Here are a few of the data stories that caught my attention this week.

ThinkUp hits 1.0

ThinkUpThinkUp, a tool out of Expert Labs, enables users to archive, search and export their Twitter, Facebook and Google+ history — both posts and post replies. It also allows users to see their network activity, including new followers, and to map that information. Originally created by Gina Trapani, ThinkUp is free and open source, and will run on a user’s own web server.

That’s crucial, says Expert Labs’ founder Anil Dash, who describes ThinkUp’s launch as “software that matters.” He writes that “ThinkUp’s launch matters to me because of what it represents: The web we were promised we would have. The web that I fell in love with, and that has given me so much. A web that we can hack, and tweak, and own.” Imagine everything you’ve ever written on Twitter, every status update on Facebook, every message on Google+ and every response you’ve had to those posts — imagine them wiped out by the companies that control those social networks.

Dash writes:

Why would I ascribe such awful behavior to the nice people who run these social networks? Because history shows us that it happens. Over and over and over. The clips uploaded to Google Videos, the sites published to Geocities, the entire relationships that began and ended on Friendster: They’re all gone. Some kind-hearted folks are trying to archive those things for the record, and that’s wonderful. But what about the record for your life, a private version that’s not for sharing with the world, but that preserves the information or ideas or moments that you care about?

It’s in light of this, no doubt, that ReadWriteWeb’s Jon Mitchell calls ThinkUp “the social media management tool that matters most.” Indeed, as we pour more of our lives into these social sites, tools like ThinkUp, along with endeavors like the Locker Project, mark important efforts to help people own, control and utilize their own data.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

DataSift opens up its Twitter firehose

DataSiftDataSift, one of only two companies licensed by Twitter to syndicate its firehose (the other being Gnip), officially opened to the public this week. That means that those using DataSift can in turn mine all the social data that comes from Twitter — data that comes at a rate of some 250 million tweets per day. DataSift’s customers can analyze this data for more than just keyword searches and can apply various filters, including demographic information, sentiment, gender, and even Klout score. The company also offers data from MySpace and plans to add Google+ and Facebook data soon.

DataSift, which was founded by Tweetmeme’s Nick Halstead and raised $6 million earlier this year, is available as a pay-as-you-go subscription model.

Google’s BigQuery service opens to more developers

Google announced this week that it was letting more companies have access to its piloting of BigQuery, its big data analytics service. The tool was initially developed for internal use at Google, and it was opened to a limited number of developers and companies at Google I/O earlier this year. Now, Google is allowing a few more companies into the fold (you can indicate your interest here), offering them the service for free — with the promise to notify them in 30 days if it plans to charge — as well as adding some user interface improvements.

In addition to a GUI for the web-based version, Google has improved the REST API for BigQuery as well. The new API offers granular control over permissions and lets you run multiple jobs in the background.

BigQuery is based on the Google tool formerly known as Dremel, which the company discussed in a research paper published last year:

[Dremel] is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data.

In the blog post announcing the changes to BigQuery, Google cites Michael J. Franklin, Professor of Computer Science at UC Berkeley, who calls BigQuery’s ability to process big data “jaw-dropping.”

Got data news?

Feel free to email me.

Related:

tags: , , , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.