Tue

Oct 9
2007

Jesse Robbins

Jesse Robbins

Google & IBM giving students a distributed systems lab using Hadoop

hadoop-logo.jpg Google & IBM have partnered to give university students hands-on experience developing software for large-scale distributed systems. This initiative focuses on parallel processing for large data sets using Hadoop, an open source implementation of Google's MapReduce. (See Tim's earlier post about Yahoo & Hadoop)

“The goal of this initiative is to improve computer science students’ knowledge of highly parallel computing practices to better address the emerging paradigm of large-scale distributed computing. IBM and Google are teaming up to provide hardware, software and services to augment university curricula and expand research horizons. With their combined resources, the companies hope to lower the financial and logistical barriers for the academic community to explore this emerging model of computing.”

The project currently includes the University of Washington, Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the University of Maryland. Students in participating classes will have access to a dedicated cluster of "several hundred computers" running Linux under XEN virtualization. The project is expected to expand to thousands of processors and eventually be open to researchers and students at other institutions.

As part of this effort, Google and the University of Washington have released a Creative Commons licensed curriculum to help teach distributed systems concepts and techniques. IBM is also providing Hadoop plug-ins for Eclipse.

Note: You can also build similar systems using Hadoop with Amazon EC2. Tom White recently posted an excellent guide and Powerset has been using this in production for quite some time.



tags:   | comments: 2   | Sphere It
submit:

 
Previous  |  Next

Subscribe to Comments on this Entry:

0 TrackBacks

TrackBack URL for this entry: http://radar.oreilly.com/mt/mt-tb.cgi/9515

Comments: 2

Mark Johnson [10.09.07 11:27 PM]

And Powerset is using many components from the Hadoop stack! For example, we've got two full-time developers devoted to HBase, which is intended to be a BigTable clone. The more work done on Hadoop, the less time companies need to devote to infrastructure and the more can be placed on higher-level components.

Jesse Robbins [10.10.07 12:43 PM]

Mark, I couldn't agree more! Are your devs contributing this back to the core? (It sounds like it)

Post A Comment:

 (please be patient, comments may take awhile to post)






Type the characters you see in the picture above.

BUSINESS INTELLIGENCE

More Research Reports

RELEASE 2.0

Current Issue

Release 2.0 Current

Money 2.0
Issue 2.0.8

 
 

Back Issues

More Release 2.0 Back Issues

CURRENT CONFERENCES