## Saturday, June 18, 2016

### What's the Value of a Win?

In a previous entry I demonstrated one simple way to estimate an exponent for the Pythagorean win expectation. Another nice consequence of a Pythagorean win expectation formula is that it also makes it simple to estimate the run value of a win in baseball, the point value of a win in basketball, the goal value of a win in hockey etc.

Let our Pythagorean win expectation formula be $w=\frac{P^e}{P^e+1},$ where $$w$$ is the win fraction expectation, $$P$$ is runs/allowed (or similar) and $$e$$ is the Pythagorean exponent. How do we get an estimate for the run value of a win? The expected number of games won in a season with $$g$$ games is $W = g\cdot w = g\cdot \frac{P^e}{P^e+1},$ so for one estimate we only need to compute the value of the partial derivative $$\frac{\partial W}{\partial P}$$ at $$P=1$$. Note that $W = g\left( 1-\frac{1}{P^e+1}\right),$ and so $\frac{\partial W}{\partial P} = g\frac{eP^{e-1}}{(P^e+1)^2}$ and it follows $\frac{\partial W}{\partial P}(P=1) = \frac{ge}{4}.$ Our estimate for the run value of a win now follows by setting $\frac{\Delta W}{\Delta P} = \frac{ge}{4}$ giving $\Delta W = 1 = \frac{ge}{4} \Delta P.$ What is $$\Delta P$$? Well $$P = R/A$$, where $$R$$ is runs scored over the season and $$A$$ is runs allowed over the season. We're assuming this is a league average team and asking how many more runs they'd need to score to win an additional game, so $$A$$ is actually fixed at $$L$$, the league average number of runs scored (or allowed). This gives us $1 = \frac{ge}{4} \Delta P = \frac{ge\Delta R}{4L}.$ Now $$L/g = l$$, the league average runs per game, so we arrive at the estimate $\Delta R = \frac{4l}{e}.$ In the specific case of MLB, we have $$e = 1.8$$ and $$l = 4.3$$, giving that a win is approximately $$\Delta R = 9.56$$ runs.

Bill James originally used the exponent $$e=2$$; in this case the formula simplifies to $$\Delta R = 2l$$, i.e. we get the particularly simple result that a win is equal to approximately twice the average number of runs scored per game.

Applying this estimate to the NBA, a win is approximately $$\Delta R = \frac{4\cdot 101}{16.4} = 24.6$$ points. Similarly, we get the estimates for a win of 4.5 goals for the NHL and 5.1 goals for the Premier League.

## Wednesday, June 8, 2016

### A Simple Estimate for Pythagorean Exponents

Given the number of runs scored and runs allowed by a baseball team, what's a good estimate for that team's win fraction? Bill James famously came up with what he called the "Pythagorean expectation" $w = \frac{R^2}{R^2 + A^2},$ which can also be written as $w = \frac{{(R/A)}^2}{{(R/A)}^2 + 1}.$ More generally, if team $$i$$ scores $$R_i$$ and allows $$A_i$$ runs, the Pythagorean estimate for the probability of team $$1$$ beating team $$2$$ is $w = \frac{{(R_1/A_1)}^2}{{(R_1/A_1)}^2 + (R_2/A_2)^2}.$ We can see that the estimate of the team's win fraction is a consequence of this, as an average team would by definition have $$R_2 = A_2$$. Now, there's nothing magical about the exponent being 2; it's a coincidence, and in fact is not even the "best" exponent. But what's a good way to estimate the exponent? Note the structural similarity of this win probability estimator and the Bradley-Terry estimator $w = \frac{P_1}{P_1+P_2}.$ Here the $$P_i$$ are what we could call the "Bradley-Terry power" of the team. This immediately suggests one way to estimate the expectation model's exponent - fit a Bradley-Terry model, then fit the log-linear regression $$\log(P_i)$$ vs $$\log(R_i/A_i)$$. The slope of this regression will be one estimate for the expectation exponent.

How well does this work? I get 1.727 for MLB in 2014. The R code and data files for MLB and other sports may be found in my GitHub repo.