Strata Gems: Clojure is a language for data

Built for expressiveness and concurrency, Clojure is a natural fit for data work

We’re publishing a new Strata Gem each day all the way through to December 24. Yesterday’s Gem: Who needs disks anyway?.

Strata 2011 The Clojure programming language has been rising in popularity in recent months. A Lisp-like language, it brings functional programming to the Java virtual machine (JVM) platform. One of the distinctives of Clojure is that data is expressed in the same way as code, making it ideal for writing powerful and concise domain-specific languages.

Clojure’s inventor, Rich Hickey, has ensured that its integration with the world of Java is as painless as possible. And for those who fear Lisp-like languages, Clojure also bends a little to be friendlier. The result is that Clojure joins two worlds previously estranged: powerful functional programming with widespread and mature APIs.

In the world of big data, this means that Clojure can be used with Cascading, an API for programmatically creating Hadoop processing pipelines. Nathan Marz of Backtype used Clojure’s power to create an entire query language for Hadoop, Cascalog.

Particular features of Clojure make it suitable for parallel data processing: immutable data types and built-in constructs for concurrency.

Cloud and big data go hand-in-hand. For working with the cloud, the jclouds project provides Java with a unified API to multiple cloud vendors, include Azure, Amazon and Rackspace. The jclouds API is often used with Clojure, exemplified by the Pallet project. Implemented in Clojure, Pallet automates the provisioning and control of cloud machine instances.

If you’re looking to learn a new programming language and expand the way you think about coding, give Clojure a whirl. The excellent Java support means you won’t be left isolated.

tags: , , , , ,