Skip to main content

Probability and Cumulative Dice Sums

Elo's Rating System as a Forgetful Logistic Model

Elo's rating system became famous from its use in chess, but it and variations are now used in sports like the NFL to eSports like League of Legends. It also was infamously used on various "Hot or Not" type websites, as shown in this scene from the movie "Social Network":


Of course, there's a mistake in the formula in the movie!

What is the Elo rating system? As originally proposed, it presumes that if two players A and B have ratings \(R_A\) and \(R_B\), then the expected score of player A is \[\frac{1}{1+10^{\frac{R_B-R_A}{400}}}.\] Furthermore, if A has a current rating of \(R_A\) and plays some more games, then the updated rating \({R_A}'\) is given by \({R_A}' = R_A + K(S_A-E_A)\), where \(K\) is an adjustment factor, \(S_A\) is the number of points scored by A and \(E_A\) was the expected number of points scored by A based on the rating \(R_A\).

Now, the expected score formula given above has the same form as a logistic regression model. What's the connection between the two? One answer is that Elo's rating system is a type of online version of a logistic model. An online algorithm is an algorithm that only sees each piece of data once. As applied to a statistical model, it's a model with parameter estimates that are updated as new data comes in, but not refitting on the entire data set. It can also be considered a memoryless model; it has "forgotten" the old data and only knows the current parameter estimates. The appeal of such models is that they're extremely efficient, can operate on enormous data sets and parameter estimates can be updated in real-time.

Okay, let's say we have a "forgetful" logistic model. Can we derive an updating rule, and does it look like Elo's? I'm going to give one possible derivation under the simplifying assumption that games are won or lost, with no ties.

We don't know how many games A and B had previously played, so let's assume they both had previously played \(n\) games and have just played \(m\) additional games between them, with A scoring \(S_A\) points. That means they've both played \(n+m\) games, but we're just going to forget this again, so let's adjust everything so that they end up with \(n\) games. One way to do this is to normalize \(n\) and \(m\) so that they sum to \(n\), thus \(n\) becomes \(n\frac{n}{n+m}\) and \(m\) becomes \(m\frac{n}{n+m}\).

We're now assuming they had each played \(n\frac{n}{n+m}\) games in the past, have just played \(m\frac{n}{n+m}\) additional games and A scored \(S_A \frac{n}{n+m}\) points (it has to be adjusted, too!) in those games.

Again, we're memoryless, so we don't know how strong the opponents were that each had played in the past, so we're going to assume that they had both played opponents that were as strong as themselves and had won half and lost half of those games. After all, people generally prefer to play competitive games.

Define \(d\) by \({R_A}' = R_A + d\) and let \(c = R_A - R_B\); we also require that \({R_B}' = R_B - d\). The log-likelihood \(L\) of A having scored \(S_A \frac{n}{n+m}\) points is \[ \frac{-2 n^2}{n+m}\log(1+e^{-d}) -\frac{n^2}{n+m}d-\frac{m n}{n+m}\log(1+e^{-c-2d}) - \frac{(m-S_A)n(c+2d)}{n+m}.\] Factoring out the constant term \(n/(n+m)\) simplifies this to \[ L = -2 n\log(1+e^{-d}) - n d - m \log(1+e^{-c-2d}) - (m-S_A)(c+2d).\] Taking the partial derivative of \(L\) with respect to \(d\) we get
\begin{align}
\frac{\partial L}{\partial d} &= 2n \frac{e^{-d}}{1+e^{-d}} -n + 2m \frac{e^{-c-2d}}{1+e^{-c-2d}}-2(m-S_A) \\
&= -n\frac{1-e^{-d}}{1+e^{-d}} + 2 S_A - 2m\frac{1}{1+e^{-c-2d}} \\
&= -n\tanh(d/2) + 2 S_A - 2m\frac{1}{1+e^{-c-2d}}.
\end{align} What is \( m\frac{1}{1+e^{-c-2d}} \)? This is actually just \( {E_A}' \), the expected score for A when playing B for \(m\) games, but assuming the updated ratings for both players. Finally, setting \(\frac{\partial L}{\partial d} = 0\), we get \[ n\tanh(d/2) = 2(S_A - {E_A}')\] and hence \[ \tanh(d/2) = \frac{2}{n} (S_A - {E_A}').\] Assuming \(n\) is large relative to \(S_A - {E_A}'\), we have \( \tanh(d/2) \approx d/2\) and \( {E_A}' \approx E_A \). This is Elo's updating rule in the form \[ d = \frac{4}{n} (S_A - E_A ).\] If we rescale with the constant \( \sigma \), the updating rule becomes \[ d = \frac{4\sigma }{n} (S_A - E_A ).\] We also now see that the adjustment factor \( K = \frac{4\sigma }{n}\).

Comments

  1. Well done and neat post.
    Also, served to remind me just how much I have forgotten.
    As soon as "partial derivative of L wrt to d" came up, I realized I would have to relearn derivative / power rules to follow. Ha.

    ReplyDelete
  2. With "A having scored S_A*n/(n+m) points", do you mean A scoring that amount of points out of m*n/(n+m) games after playing n*n/(m+n) games and having their ratings adjusted to R_a+d?

    ReplyDelete

Post a Comment

Popular posts from this blog

Notes on Setting up a Titan V under Ubuntu 17.04

I recently purchased a Titan V GPU to use for machine and deep learning, and in the process of installing the latest Nvidia driver's hosed my Ubuntu 16.04 install. I was overdue for a fresh install of Linux, anyway, so I decided to upgrade some of my drives at the same time. Here are some of my notes for the process I went through to get the Titan V working perfectly with TensorFlow 1.5 under Ubuntu 17.04. Old install: Ubuntu 16.04 EVGA GeForce GTX Titan SuperClocked 6GB 2TB Seagate NAS HDD + additional drives New install: Ubuntu 17.04 Titan V 12GB / partition on a 250GB Samsung 840 Pro SSD (had an extra around) /home partition on a new 1TB Crucial MX500 SSD New WD Blue 4TB HDD + additional drives You'll need to install Linux in legacy mode, not UEFI, in order to use Nvidia's proprietary drivers for the Titan V. Note that Linux will cheerfully boot in UEFI mode, but will not load any proprietary drivers (including Nvidia's). You'll need proprietary d

Mixed Models in R - Bigger, Faster, Stronger

When you start doing more advanced sports analytics you'll eventually starting working with what are known as hierarchical, nested or mixed effects models . These are models that contain both fixed and random effects . There are multiple ways of defining fixed vs random random effects , but one way I find particularly useful is that random effects are being "predicted" rather than "estimated", and this in turn involves some "shrinkage" towards the mean. Here's some R code for NCAA ice hockey power rankings using a nested Poisson model (which can be found in my hockey GitHub repository ): model <- gs ~ year+field+d_div+o_div+game_length+(1|offense)+(1|defense)+(1|game_id) fit <- glmer(model, data=g, verbose=TRUE, family=poisson(link=log) ) The fixed effects are year , field (home/away/neutral), d_div (NCAA division of the defense), o_div (NCAA division of the offense) and game_length (number of overtime

A Bayes' Solution to Monty Hall

For any problem involving conditional probabilities one of your greatest allies is Bayes' Theorem . Bayes' Theorem says that for two events A and B, the probability of A given B is related to the probability of B given A in a specific way. Standard notation: probability of A given B is written \( \Pr(A \mid B) \) probability of B is written \( \Pr(B) \) Bayes' Theorem: Using the notation above, Bayes' Theorem can be written:  \[ \Pr(A \mid B) = \frac{\Pr(B \mid A)\times \Pr(A)}{\Pr(B)} \] Let's apply Bayes' Theorem to the Monty Hall problem . If you recall, we're told that behind three doors there are two goats and one car, all randomly placed. We initially choose a door, and then Monty, who knows what's behind the doors, always shows us a goat behind one of the remaining doors. He can always do this as there are two goats; if we chose the car initially, Monty picks one of the two doors with a goat behind it at random. Assume we pick Door 1 an