- Get link
- X
- Other Apps
In my previous post I looked at how a group of experts may be combined into a single, more powerful, classifier which I call NaiveBoost after the related AdaBoost. I'll illustrate how it can be used with a few examples.
As before, we're faced with making a binary decision, which we can view as an unknown label \( L \in \{ +1, -1 \}\). Furthermore, the prior distribution on \( L \) is assumed to be uniform. Let our experts' independent probabilities be \( p_1 = 0.8, p_2 = 0.7, p_3 = 0.6\) and \(p_4 = 0.5\). Our combined NaiveBoost classifier is \[ C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)},\] where \( S = \{ L_i \} \). A few things to note are that \( \log{\left( \frac{p_i}{1-p_i}\right)} \) is \( {\rm logit}( p_i )\), and an expert with \( p = 0.5 \) contributes 0 to our classifier. This latter observation is what we'd expect, as \( p = 0.5 \) is random guessing. Also, experts with probabilities \( p_i \) and \( p_j \) such that \( p_i = 1 - p_j \) are equivalent if we replace the stated label \( L_j \) with \( -L_j \).
Ignoring the last (uninformative) expert, we end up with the combined classifier \[ C(S) = \frac{L_1}{2} \log\left(4\right) + \frac{L_1}{2} \log\left(\frac{7}{3}\right) + \frac{L_3}{2} \log\left( \frac{3}{2} \right).\] If the overall value is positive, the classifier's label is \( L = +1\); if it's negative, the classifier's label is \( L = -1 \). Note the base of the logarithm doesn't matter and we could also ignore the factor of \( \frac{1}{2} \), as these don't change the sign of \( C(S) \). However, the factor of \( \frac{1}{2} \) must be left in if we want the ability to properly recover the actual combined probability via normalization.
Now say \( L_1 = -1, L_2 = +1, L_3 = +1 \). What's our decision? Doing the math, we get \( C(S) = \frac{1}{2} \log{ \left( \frac{7}{8} \right) } \), and as \( 7 < 8 \), \( C(S) < 0 \) and our combined classifier says \( L = -1\). If we wanted to recover the probability, note \[ \exp\left( \frac{1}{2} \log \left( \frac{7}{8} \right) \right) = {\left( \frac{7}{8} \right)}^{1/2},\] hence our classifier states \[ {\rm Pr}(L = +1 | S ) = \frac{ {\left( \frac{7}{8} \right)}^{1/2} }{ {\left( \frac{7}{8} \right)}^{1/2} + {\left( \frac{7}{8} \right)}^{-1/2} } = \frac{ \frac{7}{8} }{ \frac{7}{8} + 1 } = \frac{7}{15}, \] and of course \( {\rm Pr}(L = -1 | S ) = \frac{8}{15}\).
As a second example, consider the @CutTheKnotMath puzzle of two liars on an island. Here we have A and B, each of which lies with probability 2/3 and tells the truth with probability 1/3. A makes a statement and B confirms that it's true. What's the probability that A's statement is actually truthful? We can solve this in a complicated way by observing that this is equivalent to an ensemble of experts, where \( L \in \{ +1, -1 \} \), the prior on \( L \) is uniform and \( L_1 = L_2 = +1\). The probability that \( L = +1 \) is precisely the probability that A is telling the truth.
Following the first example, \[ L_{+1} = \frac{L_1}{2}\log\left( \frac{1}{2} \right) + \frac{L_2}{2}\log\left( \frac{1}{2} \right) = \log\left( \frac{1}{2} \right).\] Continuing as before, we get \[ {\rm Pr}(L = +1 | S ) = \frac{ \frac{1}{2} }{ \frac{1}{2} + 2 } = \frac{1}{5}.\]
As before, we're faced with making a binary decision, which we can view as an unknown label \( L \in \{ +1, -1 \}\). Furthermore, the prior distribution on \( L \) is assumed to be uniform. Let our experts' independent probabilities be \( p_1 = 0.8, p_2 = 0.7, p_3 = 0.6\) and \(p_4 = 0.5\). Our combined NaiveBoost classifier is \[ C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)},\] where \( S = \{ L_i \} \). A few things to note are that \( \log{\left( \frac{p_i}{1-p_i}\right)} \) is \( {\rm logit}( p_i )\), and an expert with \( p = 0.5 \) contributes 0 to our classifier. This latter observation is what we'd expect, as \( p = 0.5 \) is random guessing. Also, experts with probabilities \( p_i \) and \( p_j \) such that \( p_i = 1 - p_j \) are equivalent if we replace the stated label \( L_j \) with \( -L_j \).
Ignoring the last (uninformative) expert, we end up with the combined classifier \[ C(S) = \frac{L_1}{2} \log\left(4\right) + \frac{L_1}{2} \log\left(\frac{7}{3}\right) + \frac{L_3}{2} \log\left( \frac{3}{2} \right).\] If the overall value is positive, the classifier's label is \( L = +1\); if it's negative, the classifier's label is \( L = -1 \). Note the base of the logarithm doesn't matter and we could also ignore the factor of \( \frac{1}{2} \), as these don't change the sign of \( C(S) \). However, the factor of \( \frac{1}{2} \) must be left in if we want the ability to properly recover the actual combined probability via normalization.
Now say \( L_1 = -1, L_2 = +1, L_3 = +1 \). What's our decision? Doing the math, we get \( C(S) = \frac{1}{2} \log{ \left( \frac{7}{8} \right) } \), and as \( 7 < 8 \), \( C(S) < 0 \) and our combined classifier says \( L = -1\). If we wanted to recover the probability, note \[ \exp\left( \frac{1}{2} \log \left( \frac{7}{8} \right) \right) = {\left( \frac{7}{8} \right)}^{1/2},\] hence our classifier states \[ {\rm Pr}(L = +1 | S ) = \frac{ {\left( \frac{7}{8} \right)}^{1/2} }{ {\left( \frac{7}{8} \right)}^{1/2} + {\left( \frac{7}{8} \right)}^{-1/2} } = \frac{ \frac{7}{8} }{ \frac{7}{8} + 1 } = \frac{7}{15}, \] and of course \( {\rm Pr}(L = -1 | S ) = \frac{8}{15}\).
As a second example, consider the @CutTheKnotMath puzzle of two liars on an island. Here we have A and B, each of which lies with probability 2/3 and tells the truth with probability 1/3. A makes a statement and B confirms that it's true. What's the probability that A's statement is actually truthful? We can solve this in a complicated way by observing that this is equivalent to an ensemble of experts, where \( L \in \{ +1, -1 \} \), the prior on \( L \) is uniform and \( L_1 = L_2 = +1\). The probability that \( L = +1 \) is precisely the probability that A is telling the truth.
Following the first example, \[ L_{+1} = \frac{L_1}{2}\log\left( \frac{1}{2} \right) + \frac{L_2}{2}\log\left( \frac{1}{2} \right) = \log\left( \frac{1}{2} \right).\] Continuing as before, we get \[ {\rm Pr}(L = +1 | S ) = \frac{ \frac{1}{2} }{ \frac{1}{2} + 2 } = \frac{1}{5}.\]
In the event tree there are two cases where B says the response is true, one with probability of 1/9 (A says it's try and B also say's it is try, 1/3 * 1/3) and the other with probability of 4/9 (A lies and says it is false and B also lies and says it is true, 2/3 * 2/3). So the probability that A is really telling the truth given B said it is true is (1/9)/(1/9+4/9) or 1/5.
ReplyDeleteYes, that's simpler. I used this approach to illustrate how a puzzle such as this is actually related to a more complicated idea such as ensembles in machine learning.
DeleteUnderstood. Just checking the math. I don't know anything about machine learning...
Delete