Previous  |  Next

Tue

10.09.07

Jesse Robbins

Jesse Robbins

Google & IBM giving students a distributed systems lab using Hadoop

hadoop-logo.jpg Google & IBM have partnered to give university students hands-on experience developing software for large-scale distributed systems. This initiative focuses on parallel processing for large data sets using Hadoop, an open source implementation of Google's MapReduce. (See Tim's earlier post about Yahoo & Hadoop)

“The goal of this initiative is to improve computer science students’ knowledge of highly parallel computing practices to better address the emerging paradigm of large-scale distributed computing. IBM and Google are teaming up to provide hardware, software and services to augment university curricula and expand research horizons. With their combined resources, the companies hope to lower the financial and logistical barriers for the academic community to explore this emerging model of computing.”

The project currently includes the University of Washington, Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the University of Maryland. Students in participating classes will have access to a dedicated cluster of "several hundred computers" running Linux under XEN virtualization. The project is expected to expand to thousands of processors and eventually be open to researchers and students at other institutions.

As part of this effort, Google and the University of Washington have released a Creative Commons licensed curriculum to help teach distributed systems concepts and techniques. IBM is also providing Hadoop plug-ins for Eclipse.

Note: You can also build similar systems using Hadoop with Amazon EC2. Tom White recently posted an excellent guide and Powerset has been using this in production for quite some time.



tags: amazon ec2, cloud computing, distributed systems, ec2, gfs, google, hadoop, ibm, internet scale, mapreduce, operations, scale, web scale, yahoo  | comments: 2   | Sphere It
submit:

 

0 TrackBacks

TrackBack URL for this entry: http://orm3.managed.sonic.net/mt/mt-tb.cgi/2489

Comments: 2

Mark Johnson   [10.09.07 11:27 PM]

And Powerset is using many components from the Hadoop stack! For example, we've got two full-time developers devoted to HBase, which is intended to be a BigTable clone. The more work done on Hadoop, the less time companies need to devote to infrastructure and the more can be placed on higher-level components.

Jesse Robbins   [10.10.07 12:43 PM]

Mark, I couldn't agree more! Are your devs contributing this back to the core? (It sounds like it)


Post A Comment:

 (please be patient, comments may take awhile to post)




Remember Me?


Subscribe to this Site

Radar RSS feed

BUSINESS INTELLIGENCE

CURRENT CONFERENCES