BuzzData: Come for the data, stay for the community

A Canadian startup aspires to be the GitHub of datasets.

BuzzDataAs the data deluge created by the activities of global industries accelerates, the need for decision makers to find a signal in the noise will only grow more important. Therein lies the promise of data science, from data visualization to dashboard to predictive algorithms that filter the exaflood and produce meaning for those who need it most. Data consumers and data producers, however, are both challenged by “dirty data” and limited access to the expertise and insight they need. To put it another way, if you can’t derive value, as Alistair Croll has observed here at Radar, there’s no such thing as big data.

BuzzData, based in Toronto, Canada, is one of several startups looking to help bridge that gap. BuzzData launched this spring with a combination of online community and social networking that is reminiscent of what GitHub provides for code. The thinking here is that every dataset will have a community of interest around the topic it describes, no matter how niche it might be. Once uploaded, each dataset has tabs for tracking versions, visualizations, related articles, attachments and comments. BuzzData users can “follow” datasets, just as they would a user on Twitter or a page on Facebook.

“User experience is key to building a community around data, and that’s what BuzzData seems to be set on doing,” said Marshall Kirkpatrick, lead writer at ReadWriteWeb, in an interview. “Right now it’s a little rough around the edges to use, but it’s very pretty, and that’s going to open a lot of doors. Hopefully a lot of creative minds will walk through those doors and do things with the data they find there that no single person would have thought of or been capable of doing on their own.”

The value proposition that BuzzData offers will depend upon many more users showing up and engaging with one another and, most importantly, the data itself. For now, the site remains in limited beta with hundreds of users, including at least one government entity, the City of Vancouver.

“Right now, people email an Excel spreadsheet around or spend time clobbering a shared file on a network,” said Mark Opauszky, the startup’s CEO, in an interview late this summer. “Our behind-the-scenes energy is focused on interfaces so that you can talk through BuzzData instead. We’re working to bring the same powerful tools that programmers have for source code into the world of data. Ultimately, you’re not adding and removing lines of code — you’re adding and removing columns of data.”

Opauszky said that BuzzData is actively talking with data publishers about the potential of the platform: “What BuzzData will ultimately offer when we move beyond a minimum viable product is for organizations to have their own territory in that data. There is a ‘brandability’ to that option. We’ve found it very easy to make this case to corporations, as they’re already spending dollars, usually on social networks, to try to understand this.”

That corporate constituency may well be where BuzzData finds its business model, though the executive team was careful to caution that they’re remaining flexible. It’s “absolutely a freemium model,” said Opauszky. “It’s a fundamentally free system, but people can pay a nominal fee on an individual basis for some enhanced features — primarily the ability to privatize data projects, which by default are open. Once in a while, people will find that they’re on to something and want a smaller context. They may want to share files, commercialize a data product, or want to designate where data is stored geographically.”

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

Open data communities

“We’re starting to see analysis happen, where people tell ‘data stories’ that are evolving in ways they didn’t necessarily expect when they posted data on BuzzData,” said Opauszky. “Once data is uploaded, we see people use it, fork it, and evolve data stories in all sorts of directions that the original data publishers didn’t perceive.”

For instance, a dataset of open data hubs worldwide has attracted a community that improved the original upload considerably. BuzzData featured the work of James McKinney, a civic hacker from Montreal, Canada, in making it so. A Google Map mashing up locations is embedded below:

The hope is that communities of developers, policy wonks, media, and designers will self-aggregate around datasets on the site and collectively improve them. Hints of that future are already present, as open government advocate David Eaves highlighted in his post on open source data journalism at BuzzData. As Eaves pointed out, it isn’t just media companies that should be paying attention to the trends around open data journalism:

For years I argued that governments — and especially politicians — interested in open data have an unhealthy appetite for applications. They like the idea of sexy apps on smart phones enabling citizens to do cool things. To be clear, I think apps are cool, too. I hope in cities and jurisdictions with open data we see more of them. But open data isn’t just about apps. It’s about the analysis.

Imagine a city’s budget up on BuzzData. Imagine the flow rates of the water or sewage system. Or the inventory of trees. Think of how a community of interested and engaged “followers” could supplement that data, analyze it, and visualize it. Maybe they would be able to explain it to others better, to find savings or potential problems, or develop new forms of risk assessment.

Open data journalism

“It’s an interesting service that’s cutting down barriers to open data crunching,” said Craig Saila, director of digital products at the Globe and Mail, Canada’s national newspaper, in an interview. He said that the Globe and Mail has started to open up the data that it’s collecting, like forest fire data, at the Globe and Mail BuzzData account.

“We’re a traditional paper with a strong digital component that will be a huge driver in the future,” said Saila. “We’re putting data out there and letting our audiences play with it. The licensing provides us with a neutral source that we can use to share data. We’re working with data suppliers to release the data that we have or are collecting, exposing the Globe’s journalism to more people. In a lot of ways, it’s beneficial to the Globe to share census information, press releases and statistics.”

The Globe and Mail is not, however, hosting any information there that’s sensitive. “In terms of confidential information, I’m not sure if we’re ready as a news organization to put that in the cloud,” said Saila. “Were just starting to explore open data as a thing to share, following the Guardian model.”

Saila said that he’s found the private collaboration model useful. “We’re working on a big data project where we need to combine all of the sources, and we’re trying to munge them all together in a safe place,” he said. “It’s a great space for journalists to connect and normalize public data.”

The BuzzData team emphasized that they’re not trying to be another data marketplace, like Infochimps, or replace Excel. “We made an early decision not to reinvent the wheel,” said Opauszky, “but instead to try to be a water cooler, in the same way that people go to Vimeo to share their work. People don’t go to Flickr to edit photos or YouTube to edit videos. The value is to be the connective tissue of what’s happening.”

If that question about “what’s happening?” sounds familiar to Twitter users, it’s because that kind of stream is part of BuzzData’s vision for the future of open data communities.

“One of the things that will become more apparent is that everything in the interface is real time,” said Opauszky. “We think that topics will ultimately become one of the most popular features on the site. People will come from the Guardian or the Economist for the data and stay for the conversation. Those topics are hives for peers and collaborators. We think that BuzzData can provide an even ‘closer to the feed’ source of information for people’s interests, similar to the way that journalists monitor feeds in Tweetdeck.”

Related:

tags: , , , , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.