Is Your Survey Data Lying to You?

As the book industry continues to change, we are inundated
with statistics about user behavior:

49% of e-book readers are bought as gifts
[Bowker]
28% of US adults are avid (5+ hours/week)
readers [Verso]
– 64MM avid readers
The heart of the U.S. romance novel readership
is women aged 31-49 who are currently in a romantic relationship. [Romance
Writers of America]

These statistical nuggets are great because in
isolation they give us a glimpse into why people do what they do, and how we
can adjust our business to match market needs. But how often do we blindly
accept data because it comes with pretty graphs and sound bites that seem to make
sense? Probably more often than we’d like to admit.

The best way to ensure that we are not led astray,
is to look at what biases have been introduced into a study before using its
data to make a decision. Bias is
systematic favoritism in the data collection process which causes misleading
results. Two types of bias are hazards in
studies: selection bias and measurement bias.

Selection Bias can occur when the group that is
surveyed does not accurately reflect the target of the study, or is simply too
small to matter. For example, if a study claims to describe the behavior of all
readers in the U.S. but only surveys 30 stay-at-home moms in Indiana, it is
hardly representative of every reader in the country.
Measurement Bias occurs when the questions asked
favor a specific outcome. A survey question like “Do you agree that e-books are
replacing print books as the preferred medium?” will deliver very different
results than one that asks readers to choose their preferred medium from among
e-books, purchased p-books or books checked out from the library.

As you read a study, ask yourself the following
questions to determine if the authors tried to mitigate bias. Remember: the
target population is the group that you want to generalize about, and the
sample is the group that you actually survey in order to make those
generalizations.

Is the
target population (sometimes called the sampling frame) well-defined? If it
isn’t, the study may contain people outside the target, or it may exclude
people who are relevant. In researching e-book reader purchase behavior, a
well-defined population could be American consumers who purchased an e-book
reader either online or in a physical store over the last 2 years. But if a
study only looked at online shoppers at Christmas, the results could be skewed
towards gift givers, and they could not be generalized to consumers who bought
e-book readers in stores.
Is the
sample randomly selected from the target population? In a truly random sample, every member of the
target population has the same chance of being included in the study. When
asking this question be wary of surveys that are conducted exclusively on the
web, but draw generalizations about all people. These types of studies have
participants that are not randomly selected, as they only capture a
slice of the traffic to a given domain, and at best can only ever speak to the
habits of the users of the particular site conducting research.
Does the
sample represent the target population? Here it is important to look at all
of the characteristics of the target population to see if they are mirrored in
the sample. If you are looking to figure out the book purchase habits of
Americans, make sure the sample has the same diversity of ethnicity, geographic
distribution and age as is reported in the latest U.S. census.
Is the
sample large enough? The larger the sample the more accurate the results. A
quick way to estimate if a sample is large enough to produce a reasonably small
margin of error is to divide 1 by the square root of the sample size (Margin of
Error=√Sample Size). So a 1,500 person survey would produce a margin of error
of 2.58%. It is also important that the sample size in this calculation be the
number of people who responded to survey, not the number of survey requests
that were sent out.
What is
the response rate for the survey? The response rate is defined as the number of people in a target population who actually responded to a given survey. If the response rate is too low, a study may only reflect people who have a strong opinion about the topic, making the results biased toward their opinions and not the larger and less vociferous target population. A “good” response rate is dependent upon the margin of error that a study is looking to achieve (or that it claims), and the size of the target population being studied. There are two factors to consider here. The first one is a no-brainer: The higher the response rate, the more accurate the study. The second is a little more subtle. The larger the target poplulation being examined, the lower the response rate required for the same level of accuracy. The linked figure helps explain the correlation graphically. (According to the chart, for a study that is looking to achieve a margin of error of +/- 5%, and is studying a population of 2000 people, the response rate needs to approach 20% to achieve the desired result.) At the end of the day, know the response rate and make sure it closely matches the stated margin of error that study purports to achieve.
Do the
questions appear to be leading the respondents into a particular answer? If
they do, run the other way! This means that the researchers’ agenda is adding a
measurement bias and the results aren’t worth the paper they are printed on. Also be wary of any study that doesn’t
share its sampling method, sample characteristics and survey questions.

In the end, the goal of a
survey is to accurately describe a larger population. This can only be done if
great care is taken to 1) ensure that the results wouldn’t change much if
another sample was taken under the same conditions and to 2) reduce biases that
can be introduced into the system.

About the Author
Jeevan Padiyar is a technology entrepreneur and product strategist with ten years experience in e-commerce and product development. He is passionate about using data to validate growth strategies for new market penetration.

A pioneer in the book rental industry, Jeevan is CEO/CFO of BookSwim. Jeevan helped shape podcast monetization as chairman and CFO of RawVoice, Inc., making ad deals with GoDaddy, Citrix and HBO. Prior to that, he studied medicine at Albert Einstein College of Medicine as a Howard Hughes Medical Institute Fellow. Before coming to New York, Jeevan founded arena blimp manufacturer Simply Blimps. He led it to $30M in sales in five years, with clients like NHL, NBA, Yum Brands and Subway.

Jeevan holds degrees in chemistry and biochemistry from Kansas State, graduating Phi Beta Kappa