After launching just over a year ago with only 47 data sets, the “Raw Data Catalog” catalog on Data.gov now has 2,326 entries that have been collectively downloaded almost three-quarters of a million times. Of course, even these sizable download counts understate the actual impact of this data, which is being embedded in a variety of sites and apps, like those being developed for the Health 2.0 Developer Challenge.
The big Data.gov winner so far? The Department of the Interior’s “Worldwide M1+ Earthquakes, Past 7 Days” data set. My guess is that there is some great app or visualization out there making daily use of this file — if you know what it it is, report it in the comments.
Update: In the comments, Mike suggested that earthquake downloads could be driven by a recurring visualization in the Popular Mechanics iPad App. I tracked down the app’s developer, Jonathan Cousins, and he confirmed that “the app grabs data about the most recent seismic activity from USGS feeds via wifi or 3G. ” Not quite sure of the mechanics of how this is being tallied on Data.gov, but it’s a really great example of how someone is using this data to create new value.
The top 10 data sets by download count are:
- Worldwide M1+ Earthquakes, Past 7 Days. 122,888 downloads. Real-time, worldwide earthquake list for the past 7 days. Department of the Interior.
- Latest Volumes of Foreign Relations of the United States. 10,090 downloads. The feed for the latest ten volumes of the official historical documentary record of U.S. foreign policy in the Foreign Relations of the United States series. Department of State.
- U.S. Overseas Loans and Grants (Greenbook). 6,670 downloads. These data are U.S Economic and Military Assistance by country from 1946 to the present. US Agency for International Development.
- Child-Related Product Recalls. 2,784 downloads. Lists recalls from CPSC, the agency charged with protecting the public from unreasonable risks of serious injury or death from thousands of types of consumer products. US Consumer Product Safety Commission.
- Airline On-Time Performance and Causes of Flight Delays. 2,716 downloads. On-time arrival data for non-stop domestic flights by major air carriers, as well as additional items, such as departure and arrival delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times, cancelled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance. Department of Transportation.
- 2005 Toxics Release Inventory data for American Samoa. 2,628 downloads. The Toxics Release Inventory (TRI) is a publicly available EPA database that contains information on toxic chemical releases and waste management activities reported annually by certain industries as well as federal facilities. Environmental Protection Agency.
- OSHA Data Initiative – Establishment Specific Injury and Illness Rates. 2,588 downloads. The data used by OSHA to calculate establishment-specific injury and illness incidence rates. Department of Labor.
- 2001 Federal Register in XML. 2,506 downloads. The official daily publication for rules, proposed rules, and notices of Federal agencies and organizations, as well as executive orders and other presidential documents. National Archives and Records Administration.
- 2007 National RCRA Hazardous Waste Biennial Report Data Files. 2,266 downloads. Data on the generation of hazardous waste from large-quantity generators and on waste management practices from treatment, storage, and disposal facilities. Environmental Protection Agency.
- Residential Energy Consumption Survey (RECS) Files, All Data, 2005 2,000 Downloads. Data on the use of energy in residential housing units including physical housing unit types, appliances utilized, demographics, fuels, and other energy-use information from the Residential Energy Consumption Survey (RECS), which is conducted every four years. Department of Energy.
Interested in making sense of your data, or teaching others how? The O’Reilly Stata Conference: The Business of Data, is happening 1-3 February, 2011, in Santa Clara, CA.
Here’s a breakdown of the contributions by agency:
|Agency||Data sets contributed||Downloads|
|Environmental Protection Agency||474||160,716|
|Department of Defense||214||44,837|
|Department of the Interior||197||157,273|
|Department of Commerce||176||37,430|
|Department of Health and Human Services||144||43,697|
|Executive Office of the President||132||7,569|
|Department of the Treasury||93||49,859|
|Department of Justice||90||16,392|
|Department of Energy||86||12,965|
|All remaining agencies||740||209,872|
Finally, here’s a link to the data.gov catalog that includes the number of times the set has been downloaded. (If you’re interested in how this was done, check out Use BeautifulSoup to parse data.gov over on O’Reilly Answers).
Congrats to everyone at data.gov for creating this incredible resource for developers-at-large.