Menu
  • Home
  • Shop Video Training & Books
  • Radar
  • Safari Books Online
  • Conferences
oreilly.com
O'Reilly Radar
RSS Feed Twitter Facebook Google+ Youtube
  • Home
  • Shop Video Training & Books
  • Radar
    • Radar
    • Animals
  • Safari Books Online
  • Conferences
Data
More Topics
  • Data
  • Design
  • Emerging Tech
  • IoT
  • Programming
  • Web Ops & Performance
  • Web Platform
We're in the process of moving Radar to the new oreilly.com. Check it out.

Strata Week: Harvard Library releases big data for its books

Harvard offers big data for books, Cloudera's new Hadoop distribution, Splunk goes public.

by Audrey Watters | @audreywatters | +Audrey Watters | April 26, 2012

Here’s what caught my attention in the data space this week.

Harvard Library’s metadata

Harvard College Library bookplate with withdrawal stamp by kladcat, on FlickrHarvard University announced this week that it would make more than 12 million catalog records from its 73 libraries publicly available. These records contain bibliographic information about books, manuscripts, maps, videos, and audio recordings. The Harvard Library is making these records available under a Creative Commons 0 license, in accordance with its Open Metadata Policy.

The records will be available for download from Harvard and via an API from the Digital Public Library of America (DPLA), an initiative that’s aiming to build an online national public library. The records released from Harvard are in the MARC21 format and include information that describes the various works — author, title, publisher, data, subject headings.

“This is Big Data for books,” David Weinberger, co-director of Harvard’s Library Lab told The New York Times’ Quentin Hardy. “There might be 100 different attributes for a single object.”

The hope is that by making the metadata openly available, other libraries will follow suit and developers will be able to build new applications. “By instituting a policy of open metadata, the Harvard Library has expressed its appreciation for the great potential that library metadata has for innovative uses,” said Stuart Shieber, Library Board Member and and Professor of Computer Science at Harvard in the press release.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

CDH4

Cloudera has released the latest beta version of its Hadoop distribution: CDH4. It offers upgrades to Flume, Sqoop, Hue, Oozie and Whirr, and support for new versions of Red Hat, Centos, SUSE, Ubuntu and Debian.

Cloudera says CDH4 has a great many enhancements over CDH3, including better availability, utilization, extensibility and security. The new version also contains a “significantly redesigned MapReduce.” However, Cloudera says it plans to support both generations of MapReduce for the life of CDH4.

A big data IPO

The “operational intelligence” company Splunk had its IPO this past week. As Forbes writer Josh Bersin noted, the initial offering was hot, coming in with “a valuation at 28X revenue ($3.2 billion). This valuation trumps the hot companies in social networking: Jive trades at 20X revenue, Google trades at 5X revenue, and Facebook, well we’ll see.” Bersin argues that “big data” is “big news” and “big business,” and he points to several things that the IPO and the market’s response point to for HR and talent management, including the observation that “most businesses today have plenty of data with which to make decisions.”

Got data news?

Feel free to email me.

Photo: Harvard College Library bookplate with withdrawal stamp by kladcat, on Flickr

Related:

  • Metadata isn’t a chore, it’s a necessity
  • Hadoop: What it is, how it works, and what it can do
tags: data company, data product, Hadoop, metadata, stratablog, strataweek
  • Twitter
  • YouTube
  • Slideshare
  • Facebook
  • Google+
  • RSS
  • View All RSS Feeds >
close

Get the O’Reilly Data Newsletter

Get weekly insight from industry insiders—plus exclusive content, offers, and more on the topic of data.