|
|
|||||
Strata Week: What happens when 200,000 hard drives work together?IBM is building a massive 120-petabyte array and Infochimps releases a unified geo schema.Here are a few of the data stories that caught my attention this week. IBM's record-breaking data storage array
Data storage at that scale creates a number of challenges, including — no surprise — cooling such a massive system. But other problems include handling failure, backups and indexing. The new storage array will benefit from other research that IBM has been doing to help boost supercomputers' data access. Its General Parallel File System was designed with this massive volume in mind. The GPFS spreads files across multiple disks so that many parts of a file can be read or written at once. This system already demonstrated that it can perform when it set a new scanning speed record last month by indexing 10 billion files in just 43 minutes. IBM's new 120-petabyte drive was built at the request of an unnamed client that needed a new supercomputer for "detailed simulations of real-world phenomena." Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.Save 30% on registration with the code ORM30 Infochimps' new Geo API
According to Infochimps, the API addresses several pain points that those working with geodata face:
To address these issues, Infochimps has created a new simple schema to help make data consistent and unified when drawn from multiple sources. The company has also created a "summarizer" to intelligently cluster and better display data. And finally, it has also enabled the API to handle queries other than just those traditionally associated with geodata, namely latitude and longitude. As we seek to pull together and analyze all types of data from multiple sources, this move toward a unified schema will become increasingly important. Hurricane Irene and weather dataThe arrival of Hurricane Irene last week reiterated the importance not only of emergency preparedness but of access to real-time data — weather data, transportation data, government data, mobile data, and so on.
As Alex Howard noted here on Radar, crisis data is becoming increasingly social:
Got data news?Feel free to email me. Hard drive photo: Hard Drive by walknboston, on Flickr Related: |
|||||
|
|||||
Comments: 1
mark [ 1 September 2011 03:07 PM]
When I hear "new schema definition" I always get worried about one more partial solution that hinders integration in the market. Was it necessary to create another Geo API? Does it conform to any of the international standards so different tools can plug into it, as one can with data on GeoCommons?