******************** THIS BLOG HAS MOVED TO WWW.LEGALINSURRECTION.COM ********************

This blog is moving to www.legalinsurrection.com. If you have not been automatically redirected please click on the link.

NEW COMMENTS will NOT be put through and will NOT be transferred to the new website.

Wednesday, March 9, 2011

Polling 101, Session 3: Handling Bias

This is the third in a series of GUEST POSTS by Matthew Knee, a Ph.D. candidate at Yale University specializing in campaigns and elections, ethnic voting patterns, public opinion, and quantitative and experimental approaches to political science.

Handling Bias

Today I will discuss ways to handle bias when you find it, and then apply some of what has been discussed to the PPP Wisconsin poll that Professor Jacobson discussed last week

1. Pollsters Are, Generally Speaking, Competent Professionals, & Polls Are Relatively Accurate

Conspiracy theories about polling are usually wrong. Despite its shortcomings, polling is a relatively mature science, and its practitioners do their jobs reasonably well.

This does still mean that outlier polls put out by campaigns are often wrong - sometimes, the pollster’s job is to persuade the voters that their client has a shot at winning. Usually this is due to using overly-optimistic assumptions about who will turn out to vote and similar subjective matters, but sometimes they do something shadier such as not rotating candidate order (There is an advantage to being first), priming with previous questions, or taking multiple samples and releasing the one in which their candidate does the best.

Public polls, on the other hand are conducted and released by people who are trying to get things right. Issue polling is inherently tricky, as I’ve explained above, but pollsters generally get elections about right on average. Why are media polls sometimes liberally-biased in their wording? I can’t speculate. These mistakes can even occur subconsciously or accidentally. But as much as conservatives are naturally skeptical of the media, we should examine polls on their own terms using solid methods, rather than discarding useful information just because we don’t like the source.

2. A Bias Is Not Always A Problem At All – Let Alone A Fatal One

Say you find a flaw that could bias a poll result, but do not know how large an impact it would have. Sometimes you do not need to know. Such a bias is about as likely to improve your argument as weaken it, depending on which way it biases the results. For instance, if you are arguing that a poll on unionization indicates more support for the anti-union side, and you find a question is biased for the unions, then you can say that your theory is probably even more true than the numbers indicate.

3. Compare That Which Is Mostly Similar

Most real, full-data-based polling analysis is based in large part on some kind of multiple regression analysis. Multiple regression finds relationships between the variable you are interested in and the outcome you care about in a hypothetical world in which other factors you think might get in the way (“controls”) are equal. Without a data set, you can’t do that, but you might be able to use similar principles. If a pollster has conducted multiple surveys on the same question over time, you might be able to compare the results without worrying that different question wordings or methodologies will get in the way. For repeated polls, you pretty much just have sampling error and change over time. Similarly, even comparing the same wording by different pollsters over time can be instructive, as long as you keep an eye out for differences between pollsters.

Less ideally, polls that ask about previous views or votes can be interesting, as long as one considers in what direction and to what degree people might lie about previous views.

4. You Can Estimate The Impact of Some Biases

What if you have some information about the impact of a bias, but not enough for exact results? In that case, you can estimate the maximum impact by testing possible but extreme scenarios.

Yesterday I laid out the general template for something like this. If we know what the over-sampled group thinks, and we know approximately what a representative group might think, the number espousing any particular view can be adjusted by the difference between those percentages times the number of percentage points off the sample is.

Now let’s try something more complicated, which will also incorporate the previous two items. Some have argued that the PPP “do-over” poll sampled too many union members and Barrett voters.

Say someone wishes to know how many people abandoned Walker in hindsight for Barrett, and wants to adjust for this oversampling. This makes sense because the poll asks who previously voted for Walker, so we can compare changes within the same group. This means that the slight difference in voter composition is unlikely to matter, since we are comparing the same people. It is unlikely that a slightly different voter composition would have a significantly different amount of change. In the PPP poll, the group sampled went from claiming they voted 47-47 to giving Barrett a 45-52 advantage, creating a 7 point deficit for Walker..

We have a 6% surplus of union members. We know that 37% of union members voted for Walker. We can multiply those numbers to find how many potential Walker abandoners are in that union surplus (2.2% of the sample). Thus, if every single union member in that 6% abandoned Walker, it would only account for 2.2 percentage points worth of abandoners . Realistically, it is very unlikely that even half of all union members changed their minds. Assuming this half all went over to Barrett (which is a more negative scenario than supported by the data), we still have only accounted for 2.2% of the 7% deficit that was created.

We are assuming that the demographic voting breakdowns are equivalent to the exit polls. This is not ideal, but a pretty good approximation. All of these numbers are subject to sampling error anyway.

If we want to be more specific, as in yesterday’s example, we then need to adjust for plausible numbers of abandoners in an equally-sized non-union group. In other words, what would the people who would otherwise have been sampled have done? We do not know exactly what this number is, but since the overall sample changed only slightly, it is likely not large. It might slightly enhance the effect of oversampling union members if non-union members became more supportive of Walker (it would blunt the effect if non-union members also became less supportive of Walker). All in all, this will probably not make much of a difference, since it is unlikely that even if non-union folks became more supportive of Walker, they would be anywhere nearly as skewed as our already-unreasonably skewed union group. Even if a quarter of them left Barrett for Walker and none moved the other way, (which is, again, completely unreasonable), this would net only 1.3 more points.
To make matters worse, the poll relies on asking for whom people voted in November. Since it appears that more people oppose Walker now than then, it is likely that to the degree this number is off, it is because there are people who did vote for Walker but do not want to admit it, rather than the other way around. Thus, this bias, while unmeasurable, militates in favor of Walker losing more support, rather than less.

From what we know, despite the somewhat unrepresentative sample, it looks like Walker has probably lost a bit of support between November and the PPP poll, since even ridiculously optimistic assumptions only account for half of his losses. Is it possible that even after adjusting for union support and people lying about for whom they voted, there is some other factor that contributed to the retrospective tie that can explain away Walker’s losses? In theory, yes, but it is extremely unlikely. Is it possible that this is due to sampling error? Yes, but again, the difference is large enough that that is also unlikely. Hopefully I have shown how little an impact small differences tend to have on final numbers.

*   *   *

Tomorrow, I will examine more polls on the conflict with public employee unions.

Follow me on Twitter, Facebook, and YouTube
Visit the Legal Insurrection Shop on CafePress!
Bookmark and Share


  1. Another great addition to the series. One that raises a question that I have: is it possible--given the PC culture, the fact that the left repeatedly attacks anyone who doesn't support Obama as RAAAAACIST, and the American people's not liking to be called RAAAACIST or even perceived as such--that people lie about their approval of Obama to avoid any attacks or even simply the appearance of being racist?

    I simply can't figure out how/why his approval numbers are so freaking high when he and his presidency has been one long disaster for America and her people. Can people really not see that he's the problem, that everything the EPA, DOJ, TSA, DHS, etc. and etc. are doing (that they hate) actually comes from the WH? I just don't buy that. I guess 2012 will tell, because if this "theory" is correct, these people gushing support for BO to pollsters will not vote for him in the privacy of the voting booth.

  2. Great series btw.

    But, Dang, I was so hoping to find my personal belief that polls are inaccurate (especially those who's outcomes I don't like) seems to have been shot down.

    However like most people, I'll just keep on believing that cause that's what I want to be true. HAH!

    (are you really, really, really sure the leftists aren't skewing the polls somehow? Hmmmmm?)

  3. To make matters worse, the poll relies on asking for whom people voted in November. Since it appears that more people oppose Walker now than then, it is likely that to the degree this number is off, it is because there are people who did vote for Walker but do not want to admit it, rather than the other way around. Thus, this bias, while unmeasurable, militates in favor of Walker losing more support, rather than less.

    I've got to say this is exactly the opposite of what I've observed on a right-to-left shift-- when there's something that's hot-button, a LOT of folks suddenly become "former supporters." (Came to mind because of a tweet, actually; it's also known in some circles as the "I was an observant Catholic" group.)

    I do notice a tendency to deny ever supporting someone on a left-to-right sift.
    (Several folks who I KNOW voted for Obama have suddenly "never much cared for him.")

    No idea how one would actually measure that, though, same way I don't know how one would account for the large number of college-age guys I know that enjoy bragging about choosing the most outlandish question on any survey that they won't be graded on.
    (Think along the lines of "Abe Lincoln was one of the founding fathers" type stuff; I made the mistake of asking some friends who didn't go to school with me if they'd heard of it...and the guys uniformly thought it was a wonderfully hilarious idea that they should apply in their lives.)