Saturday, June 18, 2016

What's the Value of a Win?

In a previous entry I demonstrated one simple way to estimate an exponent for the Pythagorean win expectation. Another nice consequence of a Pythagorean win expectation formula is that it also makes it simple to estimate the run value of a win in baseball, the point value of a win in basketball, the goal value of a win in hockey etc.

Let our Pythagorean win expectation formula be \[ w=\frac{P^e}{P^e+1},\] where \(w\) is the win fraction expectation, \(P\) is runs/allowed (or similar) and \(e\) is the Pythagorean exponent. How do we get an estimate for the run value of a win? The expected number of games won in a season with \(g\) games is \[W = g\cdot w = g\cdot \frac{P^e}{P^e+1},\] so for one estimate we only need to compute the value of the partial derivative \(\frac{\partial W}{\partial P}\) at \(P=1\). Note that \[ W = g\left( 1-\frac{1}{P^e+1}\right), \] and so \[ \frac{\partial W}{\partial P} = g\frac{eP^{e-1}}{(P^e+1)^2}\] and it follows \[ \frac{\partial W}{\partial P}(P=1) = \frac{ge}{4}.\] Our estimate for the run value of a win now follows by setting \[\frac{\Delta W}{\Delta P} = \frac{ge}{4} \] giving \[ \Delta W = 1 = \frac{ge}{4} \Delta P.\] What is \(\Delta P\)? Well \(P = R/A\), where \(R\) is runs scored over the season and \(A\) is runs allowed over the season. We're assuming this is a league average team and asking how many more runs they'd need to score to win an additional game, so \(A\) is actually fixed at \(L\), the league average number of runs scored (or allowed). This gives us \[1 = \frac{ge}{4} \Delta P = \frac{ge\Delta R}{4L}.\] Now \(L/g = l\), the league average runs per game, so we arrive at the estimate \[\Delta R = \frac{4l}{e}.\] In the specific case of MLB, we have \(e = 1.8\) and \(l = 4.3\), giving that a win is approximately \(\Delta R = 9.56\) runs.

Bill James originally used the exponent \(e=2\); in this case the formula simplifies to \(\Delta R = 2l\), i.e. we get the particularly simple result that a win is equal to approximately twice the average number of runs scored per game.

Applying this estimate to the NBA, a win is approximately \( \Delta R = \frac{4\cdot 101}{16.4} = 24.6\) points. Similarly, we get the estimates for a win of 4.5 goals for the NHL and 5.1 goals for the Premier League.

Wednesday, June 8, 2016

A Simple Estimate for Pythagorean Exponents

Given the number of runs scored and runs allowed by a baseball team, what's a good estimate for that team's win fraction? Bill James famously came up with what he called the "Pythagorean expectation" \[w = \frac{R^2}{R^2 + A^2},\] which can also be written as \[w = \frac{{(R/A)}^2}{{(R/A)}^2 + 1}.\] More generally, if team \(i\) scores \(R_i\) and allows \(A_i\) runs, the Pythagorean estimate for the probability of team \(1\) beating team \(2\) is \[w = \frac{{(R_1/A_1)}^2}{{(R_1/A_1)}^2 + (R_2/A_2)^2}.\] We can see that the estimate of the team's win fraction is a consequence of this, as an average team would by definition have \(R_2 = A_2\). Now, there's nothing magical about the exponent being 2; it's a coincidence, and in fact is not even the "best" exponent. But what's a good way to estimate the exponent? Note the structural similarity of this win probability estimator and the Bradley-Terry estimator \[ w = \frac{P_1}{P_1+P_2}.\] Here the \(P_i\) are what we could call the "Bradley-Terry power" of the team. This immediately suggests one way to estimate the expectation model's exponent - fit a Bradley-Terry model, then fit the log-linear regression \(\log(P_i)\) vs \(\log(R_i/A_i)\). The slope of this regression will be one estimate for the expectation exponent.

How well does this work? I get 1.727 for MLB in 2014. The R code and data files for MLB and other sports may be found in my GitHub repo.