- Get link
- X
- Other Apps
Given the number of runs scored and runs allowed by a baseball team, what's a good estimate for that team's win fraction? Bill James famously came up with what he called the "Pythagorean expectation" w=R2R2+A2, which can also be written as w=(R/A)2(R/A)2+1. More generally, if team i scores Ri and allows Ai runs, the Pythagorean estimate for the probability of team 1 beating team 2 is w=(R1/A1)2(R1/A1)2+(R2/A2)2. We can see that the estimate of the team's win fraction is a consequence of this, as an average team would by definition have R2=A2. Now, there's nothing magical about the exponent being 2; it's a coincidence, and in fact is not even the "best" exponent. But what's a good way to estimate the exponent? Note the structural similarity of this win probability estimator and the Bradley-Terry estimator w=P1P1+P2. Here the Pi are what we could call the "Bradley-Terry power" of the team. This immediately suggests one way to estimate the expectation model's exponent - fit a Bradley-Terry model, then fit the log-linear regression log(Pi) vs log(Ri/Ai). The slope of this regression will be one estimate for the expectation exponent.
How well does this work? I get 1.727 for MLB in 2014. The R code and data files for MLB and other sports may be found in my GitHub repo.
How well does this work? I get 1.727 for MLB in 2014. The R code and data files for MLB and other sports may be found in my GitHub repo.
Comments
Post a Comment