Parallel Computing on Late-Night TV

Jen-Hsun Huang, CEO Nvidia appeared on Charlie Rose last week and he touched on a wide range of subjects including his early years in a boarding school in Kentucky, the founding of Nvidia, CPUs, and GPUs. Amazingly, Charlie spent the last 10+ minutes of the show on the CUDA architecture (starting around minute 29:54 of the broadcast).

GPUs excel at mathematical computations, but until a few years ago there wasn’t an easy way to access the compute engine behind such manycore processors. With CUDA, a C programmer uses a few simple extensions to access abstractions (thread groups, shared memories, synchronization) that can be used for (fine-grained) parallel programming. Nvidia’s goal is to make every language now available in the CPU, also available in the GPU. The next wave of languages they are targeting are FORTRAN, Java, and C++.

In the interview Jen acknowledged that feedback from a few users encouraged them to start working on CUDA. To their credit, they acted quickly and if you visit the CUDA web site they highlight interesting applications mostly in the field of scientific computation, energy exploration and mathematical modeling. Other heavy users are hedge funds and other computational finance outfits.

Coincidentally, we talked to Nvidia late last year as part of our upcoming report on big data. For big data problems, they cited users who accelerated database computations such as sorts and relational joins, and bioinformatics researchers who used CUDA for their pattern matching algorithms. Their users also report that the combination of CPU/GPU in servers leads to smaller clusters and a substantial reduction in energy costs.

For now, the CUDA architecture is the province of C programmers and my fellow number crunchers. But Nvidia is allocating resources to make their tools even easier to use, and once that happens, surprising applications will emerge. Given that Apple and Intel have signaled that they too think GPUs are interesting, I’m fairly confident that simpler programming tools will emerge soon.

  • I read that the PS3 has massive capabilities for this.

  • Dennis Linnell

    I’m surprised you are hyping CUDA, the NVIDIA proprietary architecture, when open alternatives, such as OpenCL, may prove more promising. Consider reading these informative articles for more details.

  • Jon Gelsey

    Fast computation is the easy part of the problem. That’s why a 1990’s supercomputer-class processor costs $50 in the form of a GPU. The expensive part is feeding the beast, ie having a robust enough memory system to move data in and out of the processor fast enough so the processor isn’t spending most of it’s time idle pending on memory or disk. The hard part is software that actually takes advantage of parallelism, understands memory and disk latencies, can manage cache coherency and things like cache blocking. You either write that from scratch (unlikely if you are an ISV with products already in the market running on CPUs) or you have sophisticated compiler technilogy to automatically generate instruction streams appropriate for GPUs. The latter is the only approach that’s been successful in the market over the last 30 years…look at Convex, Convey, Alliant, even Cray in its later years.