"data tool" entries

Need speed for big data? Think in-memory data management

We're launching an investigation into in-memory data technologies.

by Ben Lorica | @bigdata | +Ben Lorica | January 18, 2013

In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.

An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.

Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).

We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post.

Visualization of the Week: Urban metabolism

Visualizing cities' energy usage, population density, and material intensity.

by Audrey Watters | @audreywatters | +Audrey Watters | May 18, 2012

This week's visualization is an interactive web-mapping tool that lets you explore energy usage, material intensity and the overall "urban metabolism" of major U.S. cities.

Data as seeds of content

A look at lesser-known ways to extract insight from data.

by Robbie Allen | @RobbieAllen | +Robbie Allen | April 5, 2012

Visualizations are one way to make sense of data, but they aren't the only way. Robbie Allen reveals six additional outputs that help users derive meaningful insights from data.

What is Apache Hadoop?

A look at the components and functions of the Hadoop ecosystem.

by Edd Dumbill | @edd | +Edd Dumbill | February 2, 2012

Apache Hadoop has been the driving force behind the growth of the big data industry. But what does it do, and why do you need all its strangely-named friends, such as Oozie, Zookeeper and Flume?

Strata Week: A home for negative and null results

Figshare wants research data, Accel makes a huge data investment, LinkedIn shares its DataFu.

by Audrey Watters | @audreywatters | +Audrey Watters | January 19, 2012

Figshare relaunches with an eye toward making more research data accessible. Elsewhere, Accel invests $52.5 million in Code 42 and LinkedIn open sources DataFu.

There’s a map for that

Can redistricting be opened to the public through open source and the web?

by Alex Howard | @digiphile | +Alex Howard | December 20, 2011

DistrictBuilder is a web-based redistricting tool that lets citizens draw their own maps, publish them online and submit them to redistricting authorities.

There's a map for that

Can redistricting be opened to the public through open source and the web?

by Alex Howard | @digiphile | +Alex Howard | December 20, 2011

DistrictBuilder is a web-based redistricting tool that lets citizens draw their own maps, publish them online and submit them to redistricting authorities.

BuzzData: Come for the data, stay for the community

A Canadian startup aspires to be the GitHub of datasets.

by Alex Howard | @digiphile | +Alex Howard | September 20, 2011

BuzzData looks to tap the gravitational pull of data, then keep people around through conversation and collaboration.

Visualizing hunger in the Horn of Africa

A map made with open data shows the extent of the humanitarian emergency in the Horn of Africa.

by Alex Howard | @digiphile | +Alex Howard | August 19, 2011

A new map made with open data shows the extent of the humanitarian emergency in the Horn of Africa. The data visualization at the the World Food Program website can also be embedded and shared, extending the reach of the request for aid.

"data tool" entries

Need speed for big data? Think in-memory data management

We're launching an investigation into in-memory data technologies.

Visualization of the Week: Urban metabolism

Visualizing cities' energy usage, population density, and material intensity.

Data as seeds of content

A look at lesser-known ways to extract insight from data.

Top stories: January 30-February 3, 2012

Hadoop deconstructed, the value of unstructured data, and a Moneyball approach to software teams.

What is Apache Hadoop?

A look at the components and functions of the Hadoop ecosystem.

Strata Week: A home for negative and null results

Figshare wants research data, Accel makes a huge data investment, LinkedIn shares its DataFu.

There’s a map for that

Can redistricting be opened to the public through open source and the web?

There's a map for that

Can redistricting be opened to the public through open source and the web?

BuzzData: Come for the data, stay for the community

A Canadian startup aspires to be the GitHub of datasets.

Visualizing hunger in the Horn of Africa

A map made with open data shows the extent of the humanitarian emergency in the Horn of Africa.