Monday, October 31, 2016

Political Polls and margin of error


At the time of my writing this, it is about a week until the election. It's Trump vs. Clinton - Duel of the Century. I thought I would look into political polls as a mathematics application this week. Specifically, let's look at what is called the margin of error.

I looked at a couple of what I think are reputable websites. However, they seemed to not quite get this concept. For example, something like this was stated by a couple of sites:

A poll states that candidate A is at 52% with a margin of error of +/- 3%. This means the candidate could actually be polling anywhere from 49% to 55%.

Unless I've been lied to in my past math classes, I believe this is wrong information. This is a common misconception, but I didn't think I would find news agencies writing this.

He is what I believe is the correct scoop. Polls usually have a confidence level. Part of the confusion is when CNN, NBC, etc mention their polls, they don't talk about this. Anyway, for most polls it is 90%. So, in actuality, a much truer fact is that there is a 90% chance that that candidate A is between 49% and 55%. She (or he - I'll stick with "she" the rest of the way so I don't have to mention both genders each time. Why "she" rather than "he"? I flipped a coin. Seriously.) is probably in that range, by she can't be certain of that.

You can never be certain of polls. Common sense tells you that you can't have absolute certainty. If there are millions of voters in the U.S., and your survey covers a few thousand, how do you know you didn't just happen to survey only ones that are against candidate A. Yes, unlikely, but it could happen. So if a poll states A is ahead of B, 57% to 42% with a margin of error of 5%, it's all over, right? No, it isn't. It's not looking good for B, but it's not all over.

We see surveys during election years a lot, but we see them often at other times without knowing it. The government's unemployment reports, bestselling books, the top TV shows for the week are all done by random sampling of a relatively small sample.

Students could figure out the margin of error. It goes like this:

Margin of error = z x squareroot(p(1-p)/n). The z-value is based on how accurate you want your poll result to be. You would have to look that up. The value of p is your polling result and n is the number in your sample. (Oddly, the number in your total group, whether it is the entire U.S., the state of Oregon, or your bowling league, has nothing to do with the answer.)

Common sense tells us that there is a trade-off. The more exact you want to be, the wider your interval is going to end up being. I might be able to state, from a recent survey of adult males, that I am 90% certain the average height of all adult males is between 5'7" and 5'11". One the other hand, if I want to be 99.99% certain, I might only be able to state that the average height is between 3' and 8'. You gain in certainty and you lose in precision.

Let's try one out.

We polled 1,000 people. Of those, 560 said they would vote for Candidate A. So, she is polling at 56%. We want to be 90% certain of the range her number would actually land in. Looking up the 90%, we find a z-value of 1.645.

1.645 x squareroot(.56(1-.56)/1,000) = .0258. If we round it to 2.5%, she is 90% sure of her actual number being between 53.5% and 58.5%.

Just for fun, here are some other possibilities.

Suppose we chose a confidence level of 95%:

95% corresponds to z = 1.96, so
1.96 x squareroot(.56(1-.56)/1,000) = 3.1%, giving a range of 52.9% to 59.1%

Suppose we take our original example and assume we surveyed twice as many people:
1.645 x squareroot(.56(1-.56)/2,000) = 1.8%, giving a range of 54.2% to 57.8%

I was right. That was fun.