Strata Week: The social graph that isn’t

Pinboard founder questions the social graph, Cloudera and Kaggle raise money for big data.

Here are a few of the data stories that caught my attention this week:

Not social. Not a graph.

Graph Paper by Calsidyrose, on FlickrIt’s hardly surprising that the founder of a “bookmarking site for introverts” would have something to say about the “social graph.” But what Pinboard’s Maciej Ceglowski has penned in a blog post titled “The Social Graph Is Neither” is arguably the must-read article of the week.

The social graph is neither a graph, nor is it social, Ceglowski posits. He argues that today’s social networks have failed to capture the complexities and intricacies of our social relationships (there’s no graph) and have become something that’s at best contrived and at worst icky (actually, that’s not the “worst,” but it’s the adjective Ceglowski uses).

From his post:

Imagine the U.S. Census as conducted by direct marketers — that’s the social graph. Social networks exist to sell you crap. The icky feeling you get when your friend starts to talk to you about Amway or when you spot someone passing out business cards at a birthday party, is the entire driving force behind a site like Facebook. Because their collection methods are kind of primitive, these sites have to coax you into doing as much of your social interaction as possible while logged in, so they can see it.

But if today’s social networks are troublesome, they’re also doomed, Ceglowski contends, much as the CompuServes and the Prodigys of an earlier era were undone. It’s not so much a question of their being out-innovated, but rather they were out-democratized. As the global network spread, the mass marketing has given way to grassroots efforts.

“My hope,” Ceglowski writes, “is that whatever replaces Facebook and Google+ will look equally inevitable and that our kids will think we were complete rubes for ever having thrown a sheep or clicked a +1 button. It’s just a matter of waiting things out and leaving ourselves enough freedom to find some interesting, organic, and human ways to bring our social lives online.”

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Cloudera raises $40 million

ClouderaThe Hadoop-based startup Cloudera announced this week that it has raised another $40 million in funding, led by Ignition Partners, Greylock, Accel, Meritech Capital Partners, and In-Q-Tel. This brings the total investment in the company to some $76 million, a solid endorsement of not just Cloudera but of the Hadoop big data solution.

Hadoop is a trend that we’ve covered almost weekly here as part of the Strata Week news roundup. And GigaOm’s Derrick Harris has run some estimates on the numbers of the Hadoop ecosystem at large, finding that: “Hadoop-based startups have raised $104.5 million since May. The same set of companies has raised $159.7 million since 2009 when Cloudera closed its first round.”

While it’s easy to label Hadoop as one of the buzzwords of 2011, the amount of investor interest, as well as the amount of adoption, is an indication that many people see this as a cornerstone of a big data strategy as well as a good source of revenue for the coming years.

Kaggle raises $11 million to crowdsource big data

KaggleIt’s a much smaller round of investment than Cloudera’s, to be sure, but Kaggle’s $11 million Series A round announced this week is still noteworthy. Kaggle provides a platform for running big data competitions. “We’re making data science a sport,” so its tagline reads.

But it’s more than that. There remains a gulf between data scientists and those who have data problems to solve. Kaggle helps bridge this gap by letting companies outsource their big data problems to third-party data scientists and software developers, with prizes going to the best solutions. Kaggle claims it has a community of more than 17,000 PhD-level data scientists, ready to take on and resolve companies’ data problems.

Kaggle has thus far enabled several important breakthroughs, including a competition that helped identify new ways to map dark matter in the universe. That’s a project that had been worked on for several decades by traditional methods, but those in the Kaggle community tackled it in a couple of weeks.

The Supreme Court looks at GPS data tracking

The U.S. Supreme Court heard oral arguments this week in United States v. Jones, a case that could have major implications on mobile data, GPS and privacy. At issue is whether police need a warrant in order to attach a tracking device to a car to monitor a suspect’s movements.

Surveillance via technology is clearly much easier and more efficient than traditional surveillance methods. Why follow a suspect around all day, for example, when you can attach a device to his or her car and just watch the data transmission? But it’s clear that the data you get from a GPS device is much more enhanced than human surveillance, so it raises all sorts of questions about what constitutes a reasonable search. And while you needn’t get a warrant to shadow someone’s car, attaching that GPS tracking device might just violate the Fourth Amendment and the protection against unreasonable search and seizure.

But what’s at stake is much larger than just sticking a tracking device to the underbelly of a criminal suspect’s vehicle. After all, every cell phone owner gives off an incredible amount of mobile location data, something that the government could conceivably tap into and monitor.

During oral arguments, Supreme Court justices seemed skeptical about the government’s power to use technology in this way.

Got data news?

Feel free to email me.

Photo: Graph Paper by Calsidyrose, on Flickr


tags: , , , , , ,