Strata Gems: Use GPUs to speed up calculation

We’re publishing a new Strata Gem each day all the way through to December 24. Yesterday’s Gem: The emerging marketplace for social data. Early-bird pricing on Strata closes December 14: don’t forget to register!

The release in November of Amazon Web Services’ Cluster GPU instances highlights the move to the mainstream of Graphics Processing Units (GPUs) for general purpose calculation. Graphical applications require very fast matrix transformations, for which GPUs are optimized. Boards such as the NVIDIA Tesla offer hundreds of processor cores all able to work in parallel.

While debate is ongoing about the exact range of performance boost available by using GPUs, reports indicate that speedups over CPUs from 2.5 to 15x can be obtained for calculation-heavy applications.

NVIDIA has led the trend for general purpose computing on GPUs with the Compute Unified Device Architecture (CUDA). By using extensions to the C programming language, developers can write code that executes on the GPU, mixed in with code running on the CPU.

NVIDIA Tesla
NVIDIA’s Tesla M2050 GPU Computing Module

While CUDA is NVIDIA-only, OpenCL (Open Computing Language) is a standard for cross-platform general parallel programming. Originated by Apple, and heavily influenced by CUDA, it is now developed with cross industry participation. ATI and NVIDIA are among those who offer OpenCL support for their products.

Now with Amazon’s support for GPU clusters, it’s easier than ever to start accessing the power of GPUs for data analysis. OpenCL and CUDA bindings exist for many popular programming languages, including Java, Python and C++, and the R+GPU project gives GPU access for the R statistical package.

To get a quick impression of what GPU code looks like, check out this example from the Python OpenCL bindings. The code to execute on the GPU is called out in bold text.

import pyopencl as cl import numpy import numpy.linalg as la  a = numpy.random.rand(50000).astype(numpy.float32) b = numpy.random.rand(50000).astype(numpy.float32)  ctx = cl.create_some_context() queue = cl.CommandQueue(ctx)  mf = cl.mem_flags a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a) b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b) dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, b.nbytes)  prg = cl.Program(ctx, """     __kernel void sum(__global const float *a,     __global const float *b, __global float *c)     {       int gid = get_global_id(0);       c[gid] = a[gid] + b[gid];     }     """).build()  prg.sum(queue, a.shape, None, a_buf, b_buf, dest_buf)  a_plus_b = numpy.empty_like(a) cl.enqueue_read_buffer(queue, dest_buf, a_plus_b).wait()  print la.norm(a_plus_b - (a+b))

Amazon’s Werner Vogels will be among the keynote speakers at Strata.

Strata Gems: Use GPUs to speed up calculation

GPUs bring massively parallel computing into reach

Get the O’Reilly Data Newsletter