Combining Expert Opinions: NaiveBoost

In many situations we're faced with multiple expert opinions. How should we combine them together into one opinion, hopefully better than any single opinion? I'll demonstrate the derivation of a classifier I'll call NaiveBoost.

We'll start with a simple situation, and later gradually introduce more complexity. Let each expert state a yes or no opinion in response to a yes/no question (binary classifiers), each expert be independent of the other experts and assume expert $$i$$ is correct with probability $$p_i$$. We'll also assume that the prior distribution on whether the correct answer is yes or no to be uniform, i.e. each occurs with probability 0.5.

Label a "yes" as +1, and "no" as -1. We ask our question, which has some unknown +1/-1 answer $$L$$, and get back a set of responses (labels) $$S = \{L_i \}$$, where $$L_i$$ is the response from expert $$i$$. Observe we have $\Pr(S | L=+1) = \prod_{i} {p_i}^{\frac{L_i+1}{2}} \cdot {(1-p_i)}^\frac{-L_i+1}{2}$ and also $\Pr(S | L=-1) = \prod_{i} {(1-p_i)}^{\frac{L_i+1}{2}} \cdot {p_i}^\frac{-L_i+1}{2}.$ As $$\Pr(L=+1 | S) = \frac{\Pr(S | L=+1)\cdot \Pr(L=+1)}{\Pr(S)}$$ and $$\Pr(L=-1 | S) = \frac{\Pr(S | L=-1)\cdot \Pr(L=-1)}{\Pr(S)}$$, and given our assumption that $$\Pr(L=+1) = \Pr(L=-1)$$, we need only compute $$\Pr(S | L=+1)$$, $$\Pr(S | L=-1)$$ and normalize.

We'll now take logs and derive a form similar to AdaBoost. Note for $$L_{+1} = \log\left( \Pr(S | L=+1) \right)$$ this gives us $L_{+1} = \sum_i \frac{L_i+1}{2}\log{(p_i)} + \frac{-L_i+1}{2}\log{(1-p_i)}.$ Rearranging, we get $L_{+1} = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.$ Similarly, for $$L_{-1} = \log\left( \Pr(S | L=-1) \right)$$ we get $L_{-1} = \sum_i -\frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)} + \frac{1}{2}\log{\left( p_i(1-p_i)\right)}.$ Note that each of these includes the same terms $$\sum_i \frac{1}{2}\log{\left( p_i(1-p_i)\right)}$$. Upon exponentiation these would multiply $$\Pr(S | L=+1)$$ and $$\Pr(S | L=-1)$$ by the same factor, so we can ignore them as to recover the probabilities we'll need to normalize anyway. Thus we end up with a linear classifier with the AdaBoost form $C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)}.$ If $$C(S)$$ is positive, the classifier's label is +1; if $$C(S)$$ is negative, the classifier's label is -1. Furthermore, we may recover the classifier's probabilities by exponentiating and normalizing.

A Bayes' Solution to Monty Hall

For any problem involving conditional probabilities one of your greatest allies is Bayes' Theorem. Bayes' Theorem says that for two events A and B, the probability of A given B is related to the probability of B given A in a specific way.

Standard notation:

probability of A given B is written $$\Pr(A \mid B)$$
probability of B is written $$\Pr(B)$$

Bayes' Theorem:

Using the notation above, Bayes' Theorem can be written: $\Pr(A \mid B) = \frac{\Pr(B \mid A)\times \Pr(A)}{\Pr(B)}$Let's apply Bayes' Theorem to the Monty Hall problem. If you recall, we're told that behind three doors there are two goats and one car, all randomly placed. We initially choose a door, and then Monty, who knows what's behind the doors, always shows us a goat behind one of the remaining doors. He can always do this as there are two goats; if we chose the car initially, Monty picks one of the two doors with a goat behind it at random.

Assume we pick Door 1 and then Monty sho…

Notes on Setting up a Titan V under Ubuntu 17.04

I recently purchased a Titan V GPU to use for machine and deep learning, and in the process of installing the latest Nvidia driver's hosed my Ubuntu 16.04 install. I was overdue for a fresh install of Linux, anyway, so I decided to upgrade some of my drives at the same time. Here are some of my notes for the process I went through to get the Titan V working perfectly with TensorFlow 1.5 under Ubuntu 17.04.

Old install:
Ubuntu 16.04
EVGA GeForce GTX Titan SuperClocked 6GB
2TB Seagate NAS HDD

New install:
Ubuntu 17.04
Titan V 12GB
/ partition on a 250GB Samsung 840 Pro SSD (had an extra around)
/home partition on a new 1TB Crucial MX500 SSD
New WD Blue 4TB HDD