Monday, March 20, 2017

Rating the Polls

I was going to call this week's post "March Mathness" and talk a little about the NCAA tournament. Let's do that next week. Let me go ahead, though, and apologize for the title now. I'm sure I'm not the only one to use this type of play on "March Madness". That still doesn't make it right.

There is a nice website by the name of fivethirtyeight.com. It presents information regarding polls and polling data (The 538 part comes from the fact that there are 538 electors in the electoral college.) One interesting part of the website is looking at various polls (there are a lot more than I would have imagined - they rate over three hundred polling firms).

The reason I got there is because I was trying to figure out how their site, can come up with in-game information like Arizona is ahead of St. Johns 55 to 46 with 3:38 left to play, thus Arizona has an 89% chance of winning. Wow. It's clear Arizona would probably win, but how do they come up with a percent like that? Anyway, we'll look at that next week.

I got side-tracked with a section that speaks to how they rate various polls. For example the Trump/Clinton election did not come out as most had predicted. Some polls are better than others. They rate them all. For example, one of the best seems to be the ABC News/Washington Post poll. On the other had, an organization called Research 2000 is not. An overview of their methodology is at:

https://fivethirtyeight.com/features/how-fivethirtyeight-calculates-pollster-ratings/

They don't really give enough information to show exactly how they do it. That would probably be beyond me anyway. Let me tell you something they have used in the past. It is an especially cool math application since it has a square root stuck in there.

Total Error = Square Root of (Sampling Error + Temporal Error + Pollster Induced Error)

Why don't polls come out exactly right:

  1. Sampling Error:  Sampling not enough people or not getting a representative sample
  2. Temporal Error:  The farther away it time a poll is from the event; the more error
  3. Pollster Induced Error:  Seems to be kind of a catch-all category for other things that can go wrong, such as assuming a too high or too low voter turnout.
Something else interesting they talk about is the concept of "herding". The companies that do the polling want to look good. It does not look good if they've wandered too fall away from the rest of the herd. If most every other poll has candidate A having around 55% of the vote and you predict he'll have 73%, you might make an "adjustment" to your results. Or you simply chose to not publish those results in which your company seems to be way off. 

That and other factors make it pretty complicated. Polling itself is complicated and then ranking the pollster even more so. 

I hope I've done justice to what they do. If you read what they have to see on their website you can see the complexity involved.

Next week, March Madness. Don't worry it will still be going on. In fact, it is actually March and slopping over into a little bit if April Madness.