"data tool" entries

Need speed for big data? Think in-memory data management

We're launching an investigation into in-memory data technologies.

By Ben Lorica and Roger Magoulas

In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.

An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.

Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).

We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post.

Visualization of the Week: Urban metabolism

Visualizing cities' energy usage, population density, and material intensity.

This week's visualization is an interactive web-mapping tool that lets you explore energy usage, material intensity and the overall "urban metabolism" of major U.S. cities.

Data as seeds of content

A look at lesser-known ways to extract insight from data.

Visualizations are one way to make sense of data, but they aren't the only way. Robbie Allen reveals six additional outputs that help users derive meaningful insights from data.

Top stories: January 30-February 3, 2012

Hadoop deconstructed, the value of unstructured data, and a Moneyball approach to software teams.

This week on O'Reilly: Edd Dumbill examined the components and functions of the Hadoop ecosystem, Pete Warden gave a big thumbs-up to unstructured data, and Jonathan Alexander looked at how a Moneyball approach could help software teams.

What is Apache Hadoop?

A look at the components and functions of the Hadoop ecosystem.

Apache Hadoop has been the driving force behind the growth of the big data industry. But what does it do, and why do you need all its strangely-named friends, such as Oozie, Zookeeper and Flume?

Strata Week: A home for negative and null results

Figshare wants research data, Accel makes a huge data investment, LinkedIn shares its DataFu.

Figshare relaunches with an eye toward making more research data accessible. Elsewhere, Accel invests $52.5 million in Code 42 and LinkedIn open sources DataFu.

There’s a map for that

Can redistricting be opened to the public through open source and the web?

DistrictBuilder is a web-based redistricting tool that lets citizens draw their own maps, publish them online and submit them to redistricting authorities.

There's a map for that

Can redistricting be opened to the public through open source and the web?

DistrictBuilder is a web-based redistricting tool that lets citizens draw their own maps, publish them online and submit them to redistricting authorities.

BuzzData: Come for the data, stay for the community

A Canadian startup aspires to be the GitHub of datasets.

BuzzData looks to tap the gravitational pull of data, then keep people around through conversation and collaboration.

Visualizing hunger in the Horn of Africa

A map made with open data shows the extent of the humanitarian emergency in the Horn of Africa.

A new map made with open data shows the extent of the humanitarian emergency in the Horn of Africa. The data visualization at the the World Food Program website can also be embedded and shared, extending the reach of the request for aid.