Skip to main content

Probability and Cumulative Dice Sums

Combining Expert Opinions: NaiveBoost

In many situations we're faced with multiple expert opinions. How should we combine them together into one opinion, hopefully better than any single opinion? I'll demonstrate the derivation of a classifier I'll call NaiveBoost.

We'll start with a simple situation, and later gradually introduce more complexity. Let each expert state a yes or no opinion in response to a yes/no question (binary classifiers), each expert be independent of the other experts and assume expert \(i\) is correct with probability \(p_i\). We'll also assume that the prior distribution on whether the correct answer is yes or no to be uniform, i.e. each occurs with probability 0.5.

Label a "yes" as +1, and "no" as -1. We ask our question, which has some unknown +1/-1 answer \(L\), and get back a set of responses (labels) \(S = \{L_i \}\), where \(L_i\) is the response from expert \(i\). Observe we have \[ \Pr(S | L=+1) = \prod_{i} {p_i}^{\frac{L_i+1}{2}} \cdot {(1-p_i)}^\frac{-L_i+1}{2}\] and also \[ \Pr(S | L=-1) = \prod_{i} {(1-p_i)}^{\frac{L_i+1}{2}} \cdot {p_i}^\frac{-L_i+1}{2}. \] As \( \Pr(L=+1 | S) = \frac{\Pr(S | L=+1)\cdot \Pr(L=+1)}{\Pr(S)}\) and \( \Pr(L=-1 | S) = \frac{\Pr(S | L=-1)\cdot \Pr(L=-1)}{\Pr(S)}\), and given our assumption that \( \Pr(L=+1) = \Pr(L=-1) \), we need only compute \( \Pr(S | L=+1) \), \( \Pr(S | L=-1) \) and normalize.

We'll now take logs and derive a form similar to AdaBoost. Note for \( L_{+1} = \log\left( \Pr(S | L=+1) \right) \) this gives us \[ L_{+1} = \sum_i \frac{L_i+1}{2}\log{(p_i)} + \frac{-L_i+1}{2}\log{(1-p_i)}.\] Rearranging, we get \[ L_{+1} = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.\] Similarly, for \( L_{-1} = \log\left( \Pr(S | L=-1) \right) \) we get \[ L_{-1} = \sum_i -\frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.\] Note that each of these includes the same terms \( \sum_i \frac{1}{2}\log{\left( p_i(1-p_i)\right)}\). Upon exponentiation these would multiply \( \Pr(S | L=+1) \) and \( \Pr(S | L=-1) \) by the same factor, so we can ignore them as to recover the probabilities we'll need to normalize anyway. Thus we end up with a linear classifier with the AdaBoost form \[ C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)}. \] If \( C(S) \) is positive, the classifier's label is +1; if \( C(S) \) is negative, the classifier's label is -1. Furthermore, we may recover the classifier's probabilities by exponentiating and normalizing.

Comments