- Get link
- X
- Other Apps
In many situations we're faced with multiple expert opinions. How should we combine them together into one opinion, hopefully better than any single opinion? I'll demonstrate the derivation of a classifier I'll call NaiveBoost.
We'll start with a simple situation, and later gradually introduce more complexity. Let each expert state a yes or no opinion in response to a yes/no question (binary classifiers), each expert be independent of the other experts and assume expert \(i\) is correct with probability \(p_i\). We'll also assume that the prior distribution on whether the correct answer is yes or no to be uniform, i.e. each occurs with probability 0.5.
Label a "yes" as +1, and "no" as -1. We ask our question, which has some unknown +1/-1 answer \(L\), and get back a set of responses (labels) \(S = \{L_i \}\), where \(L_i\) is the response from expert \(i\). Observe we have \[ \Pr(S | L=+1) = \prod_{i} {p_i}^{\frac{L_i+1}{2}} \cdot {(1-p_i)}^\frac{-L_i+1}{2}\] and also \[ \Pr(S | L=-1) = \prod_{i} {(1-p_i)}^{\frac{L_i+1}{2}} \cdot {p_i}^\frac{-L_i+1}{2}. \] As \( \Pr(L=+1 | S) = \frac{\Pr(S | L=+1)\cdot \Pr(L=+1)}{\Pr(S)}\) and \( \Pr(L=-1 | S) = \frac{\Pr(S | L=-1)\cdot \Pr(L=-1)}{\Pr(S)}\), and given our assumption that \( \Pr(L=+1) = \Pr(L=-1) \), we need only compute \( \Pr(S | L=+1) \), \( \Pr(S | L=-1) \) and normalize.
We'll now take logs and derive a form similar to AdaBoost. Note for \( L_{+1} = \log\left( \Pr(S | L=+1) \right) \) this gives us \[ L_{+1} = \sum_i \frac{L_i+1}{2}\log{(p_i)} + \frac{-L_i+1}{2}\log{(1-p_i)}.\] Rearranging, we get \[ L_{+1} = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.\] Similarly, for \( L_{-1} = \log\left( \Pr(S | L=-1) \right) \) we get \[ L_{-1} = \sum_i -\frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.\] Note that each of these includes the same terms \( \sum_i \frac{1}{2}\log{\left( p_i(1-p_i)\right)}\). Upon exponentiation these would multiply \( \Pr(S | L=+1) \) and \( \Pr(S | L=-1) \) by the same factor, so we can ignore them as to recover the probabilities we'll need to normalize anyway. Thus we end up with a linear classifier with the AdaBoost form \[ C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)}. \] If \( C(S) \) is positive, the classifier's label is +1; if \( C(S) \) is negative, the classifier's label is -1. Furthermore, we may recover the classifier's probabilities by exponentiating and normalizing.
We'll start with a simple situation, and later gradually introduce more complexity. Let each expert state a yes or no opinion in response to a yes/no question (binary classifiers), each expert be independent of the other experts and assume expert \(i\) is correct with probability \(p_i\). We'll also assume that the prior distribution on whether the correct answer is yes or no to be uniform, i.e. each occurs with probability 0.5.
Label a "yes" as +1, and "no" as -1. We ask our question, which has some unknown +1/-1 answer \(L\), and get back a set of responses (labels) \(S = \{L_i \}\), where \(L_i\) is the response from expert \(i\). Observe we have \[ \Pr(S | L=+1) = \prod_{i} {p_i}^{\frac{L_i+1}{2}} \cdot {(1-p_i)}^\frac{-L_i+1}{2}\] and also \[ \Pr(S | L=-1) = \prod_{i} {(1-p_i)}^{\frac{L_i+1}{2}} \cdot {p_i}^\frac{-L_i+1}{2}. \] As \( \Pr(L=+1 | S) = \frac{\Pr(S | L=+1)\cdot \Pr(L=+1)}{\Pr(S)}\) and \( \Pr(L=-1 | S) = \frac{\Pr(S | L=-1)\cdot \Pr(L=-1)}{\Pr(S)}\), and given our assumption that \( \Pr(L=+1) = \Pr(L=-1) \), we need only compute \( \Pr(S | L=+1) \), \( \Pr(S | L=-1) \) and normalize.
We'll now take logs and derive a form similar to AdaBoost. Note for \( L_{+1} = \log\left( \Pr(S | L=+1) \right) \) this gives us \[ L_{+1} = \sum_i \frac{L_i+1}{2}\log{(p_i)} + \frac{-L_i+1}{2}\log{(1-p_i)}.\] Rearranging, we get \[ L_{+1} = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.\] Similarly, for \( L_{-1} = \log\left( \Pr(S | L=-1) \right) \) we get \[ L_{-1} = \sum_i -\frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.\] Note that each of these includes the same terms \( \sum_i \frac{1}{2}\log{\left( p_i(1-p_i)\right)}\). Upon exponentiation these would multiply \( \Pr(S | L=+1) \) and \( \Pr(S | L=-1) \) by the same factor, so we can ignore them as to recover the probabilities we'll need to normalize anyway. Thus we end up with a linear classifier with the AdaBoost form \[ C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)}. \] If \( C(S) \) is positive, the classifier's label is +1; if \( C(S) \) is negative, the classifier's label is -1. Furthermore, we may recover the classifier's probabilities by exponentiating and normalizing.
Comments
Post a Comment