Monday, October 24, 2016

Standard Deviations and Baseball

Its World Series time and I feel compelled to stick with a baseball theme this week. I've considered this application since I was not much more than a child. I wasn't sure how the math on it would work, and I'm still not certain, but I thought it would be worth exploring.

Batting averages are the ratio of hits to times at bat. So getting one hit in four times up to bat gives a batting average of .250.

It would make sense that the overall batting average in baseball might vary over the years. Things have changed since it started in 1869. There used to be no night games. That is mostly because the electric light hadn't been invented yet. Night games have made it harder for hitters. Although, they've outlawed spit balls. That has made it easier.

Does it all even out? Apparently not. There used to be quite a few batters that hit .400 or better for a season. No one has done that in the past few decades, though. I've wondered if there a way to even things out mathematical. I've seen some attempts at this.

I found a person's website that has the major league batting average for each season. Over a century worth, it is at .263. The highest year ever was 1894 when it was .309. So maybe a player that year could have their batting average dropped by .046 (.309 - .263 = .046). Similar adjustments could be made for players of each year.

Not a bad idea. I've seen other similar methods. However, I've thought that some measure of variance should come into play. I've had a theory that the standard deviation of the batting average statistics have been going decreasing over the years. So, there were more .400 hitters in the past, far above the league average, but I would guess that back then there were also more hitters far below the league average.

Why might that be? Now there are scouts going to colleges, high schools, Japan, Dominican Republic, etc. looking for possible talent. In the early days, they took what they could get. It wasn't necessarily the best baseball talent. Someone might come in from the coal mines, look pretty good, and you sign him to a contract. Over the years the process has improved.

To take a shot at that proving my theory, I used a website that showed the league average for each year. I then entered twenty years worth of yearly batting averages and found the standard deviation. Its not perfect, but I think it kind of backs me up. Here we go:

1871-1900  Standard deviation = 15.91
1901-1920  Standard deviation = 10.66
1921-1940  Standard deviation = 7.38
1941-1960  Standard deviation = 3.76
1961-1980  Standard deviation = 7.91
1981-2000  Standard deviation = 5.84
2001-2012  Standard deviation = 5.15

So to really do this right, I probably should find the standard deviations of each individual year using each player, rather than using the year as a whole. However, that seemed like a lot of work, so I settle for this. I bet there is some data base that has all the averages and the capability of adjusting the mean averages and the standard deviations for each year and adjusting each player's batting average accordingly. It won't be me, but somebody should take that on.