Sep 26
2007

Nat Torkington

### Chart Junk in the New York Times

Checking out the New York Times's infographic on the housing bubble, I thought "wow! Look at how much prices climbed!". Then I read the fine print and realized they've completely distorted the vertical scale to make the increase look enormous.

The Y-axis is the house price. 100=the 1987 price of a house, so if in 1992 the line of the graph is at 92 then a house that sold for \$100,000 in 1987 sells for \$92,000 in 1992. The problem is that it's about an inch from \$0 (the bottom of the graph) to \$100,000. But then each inch on the Y-axis corresponds to about \$10,000 in price gain. In effect, they've zoomed in on the area from 100-150 and magnified the growth in the last 15 years.

I'm the first to say the housing market was overinflated and is now crashing—I took a bath when I sold my house in Colorado—but shame on the NYT for using misleading graphics to build its case. Perhaps they should invest in a copy of Darrell Huff's magnificent How to Lie with Statistics—on page 62 this exact issue is illustrated and decried.

tags: affordances  | comments: 23   | Sphere It
submit:

[09.26.07 02:38 PM]

Similarly, when their intent last year was to show how much prices had risen, their graph set the "100" benchmark at 1890 and set the minimum Y value at 60.

[09.26.07 02:51 PM]

The graph is not only misleading but rather useless. No one buys a house in some mythical average town... we buy and sell houses in specific markets. Here in Seattle home prices are rising while in some markets they're falling, in others cratering and in some they're essentially flat. By averaging all of this diverse activity together the graphic *loses* information at the same time it distorts the point. Someone get that chart a beer, it's working awfully hard.

Not sure I'd characterize this as "distorting" the Y-axis. If they were "distorting" it the '91 to '99 region would appear nearly flat. I think the problem here is that the baseline is intended to be 100, not 0. But since they needed to show that it dropped below the baseline they raised it up off the bottom of the chart. Probably what they should have done is brought the X-axis labels up to the 100 baseline and then let the period from '91 to '99 overrun below it.

An excellent analysis and critique of the graph in question was just posted at the Calculated Risk blog.

The lowest point is at 92 in the end of 96. After 10 years of rally it climbs up to 160 at the beginning of 2007. That come to just about 5.7% gain per year. Not bad, but not spectacular. Besides this is the best performing 10 years. The previous 10 years is a bust with 0.8% annual lost.

This is nothing new, either in general or for the NYT. I think it's in Edward Tufte's 1st edition of "The Visual Display of Quantitative Information" where he presents a figure from the NYT that similarly distorts the data and proceeds to define the "Lie Factor." As defined by Tufte, the Lie Factor is equal to the size of the effect shown in the graphic divided by the size of the effect in the actual data. He asserts that a Lie Factor of between .95 and 1.05 is acceptable.

Tufte's book was published in the early 80s (82 or 83) and the original graphic he used was from the late 70s. Glad to see that times have not changed!

Alex: thanks for the pointer, that's a good piece.

You've misinterpreted. The bottom of the graph is not \$0 but rather near \$88,000 - \$90,000. The scale is accurate, though they may have overestimated their readers' ability to recognize it.

The Times often cuts off the bottom portion of charts like this. As well they should: the interesting part is the mountain top, everything below the lowest value is redundant.

Here's a correctly scaled version of the graph:

People with an axe to grind have been messing with the sense of proportion since long before computers were helping them do it.

While I agree in general with the idea that statistics and charts can lie, and that this technique in particular can be used to selectively distort the data, I don't find this graphic all that distortive. When you take a look at the full graphic as supplied by Ken Macnamara, you see just how big the run up has been as a percentage of the total house price. In some ways, the message is stronger than the one in the original graph with the bottom cut off.

Even adjusted, the graph is still (potentially) misleading. Showing one bust then boom is not enough.

To be useful you need to be shown what happened prior to 1987, in particular was 1987 near the top of a long rise itself? This way you can determine whether to expect a large fall back close to pre-boom or a relatively small one (like the one shown in the 1990s).

Another chart but frankly the conclusion remains the same and I dont agree at all with the original post about the chart @NYT.

[09.27.07 10:51 AM]

You didn't note that the term 'chartjunk' was coined by Ed Tufte, who talks extensively about this phenomena in 'The Visual Display of Quantitative Information.'

It wouldn't've hurt the original piece to include a small copy of the graphic with the bottom intact, to give a little more context, but it's not all that misleading.

More to the point, it's not chartjunk. It may be a junky chart, but chartjunk it's not.

For whatever reason, Google Finance plots all stock charts this way - which I find very very annoying. I think the (naive and wrong) argument is that you want to capture and zoom in on the part of the chart where there is most variation. Problem is, we loose all context and a share rising by 1% can look to have a better chart than a share that went up by 30%.

So - even the Google geniuses got it wrong. Why blame NY Times.

Anshu, I think you have this backwards. Google's way of showing stocks might result in a share gaining \$1 looking like it outperformed a share gaining \$30 (for example a \$10 stock going to \$11 vs a \$500 going to \$530), but not 1% vs 30%. And in fact, a \$10 share gaining \$1 is in fact outperforming the \$500 share gaining \$30.

As to the original post, it seems like you're attributing malice to the NYT undeservedly. As Ken McNamara's chart shows, there's little to be gained from printing blank space (especially in a space limited environment like a newspaper page!).

I agree with those who say this isn't a bad chart. Ken McNamara's remake of the chart demonstrates what's happening in the diagram -- but anyone who knows how to reach charts should have been able to do much of this visualization adjustment in their heads just by comparing the distance between the baseline and the 100 mark and comparing it to the rest of the chart. How anyone could think the baseline was *zero* is beyond me.

Really, Nat, did you think for even a second that the price of a house in America went up 1000% between 1995 and today? I think it's reasonable for the New York Times to expect their audience to be both educated and to use their common sense when interpreting their stories and charts. Ken McNamara's chart is, indeed, slightly more accurate... but to my eyes it's also "dumbed down" and frankly wastes a lot of space.

This sort of chart *cropping* (not at all *distorting*) is a standard practice. It requires a certain degree of chart-reading literacy, I suppose, but this display method is so common I'm surprised that anyone doesn't "get it" right away.

To those that say it's a NYT deliberate distortion because the NYT has some sort of agenda, well, jeez, please take the tin foil hats off. Again, refer to Ken McNamara's chart and it's clear that this bubble is freaking insane. Housing prices doubled in a decade. The NYT should not be accused of deliberate distortion for noting that a 200% increase in one decade is pretty crazy.

Finally, the word "chartjunk" doesn't simply mean that the chart is bad. It means that the chart contains useless and decorative elements that distract from the viewer's ability to read the chart. This chart does not include any significant chartjunk.

Also, to the tinfoil-hat wearers, please look at Sol's link that shows this bubble is anomalous for the last *century*. If anything, I suppose we could just as well accuse the Times of hiding the degree of the bubble's insanity, not exaggerating it.

[09.30.07 06:01 PM]

NYTimes has been distorting facts since it was established. I thought you libertarians in the valley knew about this back since back in college at least.

[09.30.07 06:05 PM]

And strictly speaking, this isn't chart junk. "Chart junk" was coined to describe something else.

Consider what this graph would look like as a sparkline before you're quick to condemn it.

Y'all seemed fine with it when Al Gore did it in his movie.

[10.04.07 07:38 AM]

I'm with Mr. Fahey above. On the one hand, the chart does exaggerate, but on the other, it also shows more detail by omitting the part of the data that doesn’t change — we can see more of the dips and peaks than we could if the entire range were shown. It just expects a little more of the reader.

It reminds me of visiting Italy before the Euro: the number of Lira for modest items was ridiculously high (I think the exchange rate at the time was like 1500 Lira for \$1). They could have divided by a hundred or a thousand and not lost any information because the least significant price digits almost never changed.

If we were to plot the absolute prices of items in Lira, it would tend to obscure differences, but what we really cared about would look essentially like the NYT graph, which omits the superfluous zeros and shows the variations in greater detail.

PS, Don't forget the old adage about "lies, damn lies, and statistics."

Type the characters you see in the picture above.