Combining Expert Opinions: NaiveBoost

In many situations we're faced with multiple expert opinions. How should we combine them together into one opinion, hopefully better than any single opinion? I'll demonstrate the derivation of a classifier I'll call NaiveBoost.

We'll start with a simple situation, and later gradually introduce more complexity. Let each expert state a yes or no opinion in response to a yes/no question (binary classifiers), each expert be independent of the other experts and assume expert

$i$ is correct with probability

$p_i$ . We'll also assume that the prior distribution on whether the correct answer is yes or no to be uniform, i.e. each occurs with probability 0.5.

Label a "yes" as +1, and "no" as -1. We ask our question, which has some unknown +1/-1 answer

$L$ , and get back a set of responses (labels)

$S = \{L_i \}$ , where

$L_i$ is the response from expert

$i$ . Observe we have

$\Pr(S | L=+1) = \prod_{i} {p_i}^{\frac{L_i+1}{2}} \cdot {(1-p_i)}^\frac{-L_i+1}{2}$ and also

$\Pr(S | L=-1) = \prod_{i} {(1-p_i)}^{\frac{L_i+1}{2}} \cdot {p_i}^\frac{-L_i+1}{2}.$ As

$\Pr(L=+1 | S) = \frac{\Pr(S | L=+1)\cdot \Pr(L=+1)}{\Pr(S)}$ and

$\Pr(L=-1 | S) = \frac{\Pr(S | L=-1)\cdot \Pr(L=-1)}{\Pr(S)}$ , and given our assumption that

$\Pr(L=+1) = \Pr(L=-1)$ , we need only compute

$\Pr(S | L=+1)$ ,

$\Pr(S | L=-1)$ and normalize.

We'll now take logs and derive a form similar to AdaBoost. Note for

$L_{+1} = \log\left( \Pr(S | L=+1) \right)$ this gives us

$L_{+1} = \sum_i \frac{L_i+1}{2}\log{(p_i)} + \frac{-L_i+1}{2}\log{(1-p_i)}.$ Rearranging, we get

$L_{+1} = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.$ Similarly, for

$L_{-1} = \log\left( \Pr(S | L=-1) \right)$ we get

$L_{-1} = \sum_i -\frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.$ Note that each of these includes the same terms

$\sum_i \frac{1}{2}\log{\left( p_i(1-p_i)\right)}$ . Upon exponentiation these would multiply

$\Pr(S | L=+1)$ and

$\Pr(S | L=-1)$ by the same factor, so we can ignore them as to recover the probabilities we'll need to normalize anyway. Thus we end up with a linear classifier with the AdaBoost form

$C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)}.$ If

$C(S)$ is positive, the classifier's label is +1; if

$C(S)$ is negative, the classifier's label is -1. Furthermore, we may recover the classifier's probabilities by exponentiating and normalizing.

The Angry Statistician

Search This Blog

Probability and Cumulative Dice Sums

Combining Expert Opinions: NaiveBoost

Comments

Post a Comment

Popular posts from this blog

Mining the First 3.5 Million California Unclaimed Property Records

Mixed Models in R - Bigger, Faster, Stronger

A Bayes' Solution to Monty Hall