### An Island of Liars is an Ensemble of Experts

In my previous post I looked at how a group of experts may be combined into a single, more powerful, classifier which I call NaiveBoost after the related AdaBoost. I'll illustrate how it can be used with a few examples.

As before, we're faced with making a binary decision, which we can view as an unknown label $$L \in \{ +1, -1 \}$$. Furthermore, the prior distribution on $$L$$ is assumed to be uniform. Let our experts' independent probabilities be $$p_1 = 0.8, p_2 = 0.7, p_3 = 0.6$$ and $$p_4 = 0.5$$. Our combined NaiveBoost classifier is $C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)},$ where $$S = \{ L_i \}$$. A few things to note are that $$\log{\left( \frac{p_i}{1-p_i}\right)}$$ is $${\rm logit}( p_i )$$, and an expert with $$p = 0.5$$ contributes 0 to our classifier. This latter observation is what we'd expect, as $$p = 0.5$$ is random guessing. Also, experts with probabilities $$p_i$$ and $$p_j$$ such that $$p_i = 1 - p_j$$ are equivalent if we replace the stated label $$L_j$$ with $$-L_j$$.

Ignoring the last (uninformative) expert, we end up with the combined classifier $C(S) = \frac{L_1}{2} \log\left(4\right) + \frac{L_1}{2} \log\left(\frac{7}{3}\right) + \frac{L_3}{2} \log\left( \frac{3}{2} \right).$ If the overall value is positive, the classifier's label is $$L = +1$$; if it's negative, the classifier's label is $$L = -1$$. Note the base of the logarithm doesn't matter and we could also ignore the factor of $$\frac{1}{2}$$, as these don't change the sign of $$C(S)$$. However, the factor of $$\frac{1}{2}$$ must be left in if we want the ability to properly recover the actual combined probability via normalization.

Now say $$L_1 = -1, L_2 = +1, L_3 = +1$$. What's our decision? Doing the math, we get $$C(S) = \frac{1}{2} \log{ \left( \frac{7}{8} \right) }$$, and as $$7 < 8$$, $$C(S) < 0$$ and our combined classifier says $$L = -1$$. If we wanted to recover the probability, note $\exp\left( \frac{1}{2} \log \left( \frac{7}{8} \right) \right) = {\left( \frac{7}{8} \right)}^{1/2},$ hence our classifier states ${\rm Pr}(L = +1 | S ) = \frac{ {\left( \frac{7}{8} \right)}^{1/2} }{ {\left( \frac{7}{8} \right)}^{1/2} + {\left( \frac{7}{8} \right)}^{-1/2} } = \frac{ \frac{7}{8} }{ \frac{7}{8} + 1 } = \frac{7}{15},$ and of course $${\rm Pr}(L = -1 | S ) = \frac{8}{15}$$.

As a second example, consider the @CutTheKnotMath puzzle of two liars on an island. Here we have A and B, each of which lies with probability 2/3 and tells the truth with probability 1/3. A makes a statement and B confirms that it's true. What's the probability that A's statement is actually truthful? We can solve this in a complicated way by observing that this is equivalent to an ensemble of experts, where $$L \in \{ +1, -1 \}$$, the prior on $$L$$ is uniform and $$L_1 = L_2 = +1$$. The probability that $$L = +1$$ is precisely the probability that A is telling the truth.

Following the first example, $L_{+1} = \frac{L_1}{2}\log\left( \frac{1}{2} \right) + \frac{L_2}{2}\log\left( \frac{1}{2} \right) = \log\left( \frac{1}{2} \right).$ Continuing as before, we get ${\rm Pr}(L = +1 | S ) = \frac{ \frac{1}{2} }{ \frac{1}{2} + 2 } = \frac{1}{5}.$

1. In the event tree there are two cases where B says the response is true, one with probability of 1/9 (A says it's try and B also say's it is try, 1/3 * 1/3) and the other with probability of 4/9 (A lies and says it is false and B also lies and says it is true, 2/3 * 2/3). So the probability that A is really telling the truth given B said it is true is (1/9)/(1/9+4/9) or 1/5.

1. Yes, that's simpler. I used this approach to illustrate how a puzzle such as this is actually related to a more complicated idea such as ensembles in machine learning.

2. Understood. Just checking the math. I don't know anything about machine learning...

### A Bayes' Solution to Monty Hall

For any problem involving conditional probabilities one of your greatest allies is Bayes' Theorem. Bayes' Theorem says that for two events A and B, the probability of A given B is related to the probability of B given A in a specific way.

Standard notation:

probability of A given B is written $$\Pr(A \mid B)$$
probability of B is written $$\Pr(B)$$

Bayes' Theorem:

Using the notation above, Bayes' Theorem can be written: $\Pr(A \mid B) = \frac{\Pr(B \mid A)\times \Pr(A)}{\Pr(B)}$Let's apply Bayes' Theorem to the Monty Hall problem. If you recall, we're told that behind three doors there are two goats and one car, all randomly placed. We initially choose a door, and then Monty, who knows what's behind the doors, always shows us a goat behind one of the remaining doors. He can always do this as there are two goats; if we chose the car initially, Monty picks one of the two doors with a goat behind it at random.

Assume we pick Door 1 and then Monty sho…

### Notes on Setting up a Titan V under Ubuntu 17.04

I recently purchased a Titan V GPU to use for machine and deep learning, and in the process of installing the latest Nvidia driver's hosed my Ubuntu 16.04 install. I was overdue for a fresh install of Linux, anyway, so I decided to upgrade some of my drives at the same time. Here are some of my notes for the process I went through to get the Titan V working perfectly with TensorFlow 1.5 under Ubuntu 17.04.

Old install:
Ubuntu 16.04
EVGA GeForce GTX Titan SuperClocked 6GB
2TB Seagate NAS HDD

New install:
Ubuntu 17.04
Titan V 12GB
/ partition on a 250GB Samsung 840 Pro SSD (had an extra around)
/home partition on a new 1TB Crucial MX500 SSD
New WD Blue 4TB HDD