## Wednesday, June 8, 2016

### A Simple Estimate for Pythagorean Exponents

Given the number of runs scored and runs allowed by a baseball team, what's a good estimate for that team's win fraction? Bill James famously came up with what he called the "Pythagorean expectation" $w = \frac{R^2}{R^2 + A^2},$ which can also be written as $w = \frac{{(R/A)}^2}{{(R/A)}^2 + 1}.$ More generally, if team $$i$$ scores $$R_i$$ and allows $$A_i$$ runs, the Pythagorean estimate for the probability of team $$1$$ beating team $$2$$ is $w = \frac{{(R_1/A_1)}^2}{{(R_1/A_1)}^2 + (R_2/A_2)^2}.$ We can see that the estimate of the team's win fraction is a consequence of this, as an average team would by definition have $$R_2 = A_2$$. Now, there's nothing magical about the exponent being 2; it's a coincidence, and in fact is not even the "best" exponent. But what's a good way to estimate the exponent? Note the structural similarity of this win probability estimator and the Bradley-Terry estimator $w = \frac{P_1}{P_1+P_2}.$ Here the $$P_i$$ are what we could call the "Bradley-Terry power" of the team. This immediately suggests one way to estimate the expectation model's exponent - fit a Bradley-Terry model, then fit the log-linear regression $$\log(P_i)$$ vs $$\log(R_i/A_i)$$. The slope of this regression will be one estimate for the expectation exponent.

How well does this work? I get 1.727 for MLB in 2014. The R code and data files for MLB and other sports may be found in my GitHub repo.