Of course, there's a mistake in the formula in the movie!

What is the Elo rating system? As originally proposed, it presumes that if two players A and B have ratings \(R_A\) and \(R_B\), then the expected score of player A is \[\frac{1}{1+10^{\frac{R_B-R_A}{400}}}.\] Furthermore, if A has a current rating of \(R_A\) and plays some more games, then the updated rating \({R_A}'\) is given by \({R_A}' = R_A + K(S_A-E_A)\), where \(K\) is an adjustment factor, \(S_A\) is the number of points scored by A and \(E_A\) was the expected number of points scored by A based on the rating \(R_A\).

Now, the expected score formula given above has the same form as a logistic regression model. What's the connection between the two? One answer is that Elo's rating system is a type of online version of a logistic model. An online algorithm is an algorithm that only sees each piece of data once. As applied to a statistical model, it's a model with parameter estimates that are updated as new data comes in, but not refitting on the entire data set. It can also be considered a memoryless model; it has "forgotten" the old data and only knows the current parameter estimates. The appeal of such models is that they're extremely efficient, can operate on enormous data sets and parameter estimates can be updated in real-time.

Okay, let's say we have a "forgetful" logistic model. Can we derive an updating rule, and does it look like Elo's? I'm going to give one possible derivation under the simplifying assumption that games are won or lost, with no ties.

We don't know how many games A and B had previously played, so let's assume they both had previously played \(n\) games and have just played \(m\) additional games between them, with A scoring \(S_A\) points. That means they've both played \(n+m\) games, but we're just going to forget this again, so let's adjust everything so that they end up with \(n\) games. One way to do this is to normalize \(n\) and \(m\) so that they sum to \(n\), thus \(n\) becomes \(n\frac{n}{n+m}\) and \(m\) becomes \(m\frac{n}{n+m}\).

We're now assuming they had each played \(n\frac{n}{n+m}\) games in the past, have just played \(m\frac{n}{n+m}\) additional games and A scored \(S_A \frac{n}{n+m}\) points (it has to be adjusted, too!) in those games.

Again, we're memoryless, so we don't know how strong the opponents were that each had played in the past, so we're going to assume that they had both played opponents that were as strong as themselves and had won half and lost half of those games. After all, people generally prefer to play competitive games.

\begin{align}

\frac{\partial L}{\partial d} &= 2n \frac{e^{-d}}{1+e^{-d}} -n + 2m \frac{e^{-c-2d}}{1+e^{-c-2d}}-2(m-S_A) \\

&= -n\frac{1-e^{-d}}{1+e^{-d}} + 2 S_A - 2m\frac{1}{1+e^{-c-2d}} \\

&= -n\tanh(d/2) + 2 S_A - 2m\frac{1}{1+e^{-c-2d}}.

\end{align} What is \( m\frac{1}{1+e^{-c-2d}} \)? This is actually just \( {E_A}' \), the expected score for A when playing B for \(m\) games, but assuming the updated ratings for both players. Finally, setting \(\frac{\partial L}{\partial d} = 0\), we get \[ n\tanh(d/2) = 2(S_A - {E_A}')\] and hence \[ \tanh(d/2) = \frac{2}{n} (S_A - {E_A}').\] Assuming \(n\) is large relative to \(S_A - {E_A}'\), we have \( \tanh(d/2) \approx d/2\) and \( {E_A}' \approx E_A \). This is Elo's updating rule in the form \[ d = \frac{4}{n} (S_A - E_A ).\] If we rescale with the constant \( \sigma \), the updating rule becomes \[ d = \frac{4\sigma }{n} (S_A - E_A ).\] We also now see that the adjustment factor \( K = \frac{4\sigma }{n}\).

Well done and neat post.

ReplyDeleteAlso, served to remind me just how much I have forgotten.

As soon as "partial derivative of L wrt to d" came up, I realized I would have to relearn derivative / power rules to follow. Ha.

With "A having scored S_A*n/(n+m) points", do you mean A scoring that amount of points out of m*n/(n+m) games after playing n*n/(m+n) games and having their ratings adjusted to R_a+d?

ReplyDelete