Lies, damn lies, and visualizations

The intersection -- and accompanying questions -- of data science and journalism.

Visualization comes up a lot in the context of data science, conjuring images of people in white lab coats doing dispassionate experiments in pursuit of Higher Truth. While this might occur in some contexts, such as medical or scientific fields, visualization is often used just to tell a good story. In this context, it’s much more useful to think of visualization as a tool of journalism, and of the storyteller as a journalist, rather than a scientist.

Adrian Holovaty’s (now part of EveryBlock) is one of the original, and perhaps still the best, examples of visualization as journalism. In case you’re unfamiliar with it, chicagocrime took data from publicly available sources in the Chicago area and plotted it on a map. While there was nothing there that didn’t already appear in the local paper, seeing the data superimposed on a map made it much more accessible and engaging. The project won a variety awards and inspired similar data-driven projects at and

Consequently, I’ve often found it useful to think through the hallmarks of good journalism when looking at a new visualization. First, is it accurate: do the underlying facts (or data) map to reality? Second, is it objective: has the storyteller kept an objective view of the data and presented it dispassionately? Third, since individual pieces often fall into a broader story, how does it fit within the larger informational context?

As an example, consider this recent visualization called Your New Health Care System from the minority members of the U.S. Senate’s Joint Economic Committee, led by Republican U.S Senator Sam Brownback of Kansas and Rep. Kevin Brady of Texas.

(Click for larger version or download PDF)

As a pure piece of design, it’s incredibly effective. Every element — shape, color, size, orientation, typography, and layout — conveys a sense of bewildering complexity and raises an unstated message of “See how complex this is? It can’t possibly work.”

As much as I admire this from an information design standpoint, it raises questions from a journalistic standpoint:

  • How accurately are the connections depicted? The accompanying article doesn’t provide any insight into where the connections come from or why they were drawn that way.
  • The design choices are simply too suspicious for this to be a completely neutral schematic. For example, note how the IRS is portrayed in an eye-catching, scary bold font, and how the “Physicians” and “Patients” elements are as far apart as possible across the bottom. (I’ve even had several debates with people about whether the choice of a 7-pointed, yellow star for “Patients” was an intentional choice to evoke references to the Holocaust, and while most people think that’s a reach, I still wonder.)
  • Finally, considering the larger context of lockstep Republican opposition to health reform and the source of the visualization, it’s hard not to conclude that this is a piece of advocacy journalism intended to guide the viewer toward a preordained conclusion, rather than a neutral schematic that accurately depicts a complex system.

Of course, there’s absolutely nothing wrong with taking a strong position, assuming the underlying data and facts are accurate. (This is a really good reason we need open data.) However, it’s important for the audience to recognize it as advocacy, not as science, even when it comes wrapped in a really cool visualization.


tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.