Baseball, Chess, Psychology and Pychometrics: Everyone Uses the Same Damn Rating System

Here's a short summary of the relationship between common models used in baseball, chess, psychology and education. The starting point for examining the connections between various extended models in these areas. The next steps include multiple attempts, guessing, ordinal and multinomial outcomes, uncertainty and volatility, multiple categories and interactions. There are also connections to standard optimization algorithms (neural nets, simulated annealing).

Baseball

Common in baseball and other sports, the log5 method provides an estimate for the probability \( p \) of participant 1 beating participant 2 given respective success probabilities \( p_1, p_2 \). Also let \( q_* = 1 -p_* \) in the following. The log5 estimate of the outcome is then:

\begin{align}
p &= \frac{p_1 q_2}{p_1 q_2+q_1 p_2}\\
&= \frac{p_1/q_1}{p_1/q_1+p_2/q_2}\\
\frac{p}{q} &= \frac{p_1}{q_1} \cdot \frac{q_2}{p_2}
\end{align}

The final form uses the odds ratio, \( \frac{p}{q} \). Additional factors can be easily chained using this form to provide more complex estimates. For example, let \( p_e \) be an environmental factor, then:

\begin{align}
\frac{p}{q} &= \frac{p_1}{q_1} \cdot \frac{q_2}{p_2} \cdot \frac{q_e}{p_e}
\end{align}

Chess

The most common rating system in chess is the Elo rating system. This has also been adopted for various other uses, e.g. ``hot or not'' websites. This system assigns ratings \( R_1, R_2 \) to players 1 and 2 such that the probability of player 1 beating player 2 is approximately:

\begin{align}
p &= \frac{e^{R_1/C}}{e^{R_1/C}+e^{R_2/C}}
\end{align}

Here \( C \) is just a scaling factor (typically \( 400/\ln{10} \) ). The Elo rating is connected to log5 via setting \( e^{R/C} = p/q \). We then recover:

\begin{align}
\frac{p}{q} &= e^{R/C}\\
p &= \frac{e^{R/C}}{1+e^{R/C}}\\
R &= C\cdot \ln(p/q)
\end{align}

Note that \( p \) is also the probability of this player beating another player with Elo rating 0. The Elo system generally includes enhancements accounting for ties, first-move advantage and also an online algorithm for updating ratings. We'll revisit these features later.

Psychology

The Bradley-Terry-Luce (BTL) model is commonly used in psychology. Given two items, the probability \( p \) that item 1 is ranked over item 2 is approximately:

\begin{align}
p &= \frac{Q_1}{Q_1+Q_2}
\end{align}

In this context \( Q_* \) typically reflects the amount of a certain quality. That this model is equivalent to the previous models is immediate:

\begin{align}
Q &= e^{R/C} = p/q\\
R &= C\cdot \ln(Q) = C\cdot \ln(p/q)\\
p &= \frac{Q}{1+Q}
\end{align}

Psychometrics

The dichotomous (two-response) Rasch and item response models are commonly used in psychometrics. For the Rasch model, let \( r_1 \) represent a measurement of ability and \( r_2 \) the difficulty of the test item. The Rasch model estimates the probability of correct response \( p \) as:

\begin{align}
p &= \frac{e^{r_1-r_2}}{1+e^{r_1-r_2}}
\end{align}

The one-parameter item response model estimates:

\begin{align}
p &= \frac{1}{1+e^{r_2-r_1}}
\end{align}

These are clearly equivalent to each other and to the previous models.

The Angry Statistician

Search This Blog

Probability and Cumulative Dice Sums

Baseball, Chess, Psychology and Pychometrics: Everyone Uses the Same Damn Rating System

Comments

Post a Comment

Popular posts from this blog

Simplified Multinomial Kelly

Mixed Models in R - Bigger, Faster, Stronger

A Bayes' Solution to Monty Hall