Skip to main content

Posts

Showing posts from April, 2018

An Island of Liars is an Ensemble of Experts

In my previous post I looked at how a group of experts may be combined into a single, more powerful, classifier which I call NaiveBoost  after the related AdaBoost . I'll illustrate how it can be used with a few examples. As before, we're faced with making a binary decision, which we can view as an unknown label \( L \in \{ +1, -1 \}\). Furthermore, the prior distribution on \( L \) is assumed to be uniform. Let our experts' independent probabilities be \( p_1 = 0.8, p_2 = 0.7, p_3 = 0.6\) and \(p_4 = 0.5\). Our combined NaiveBoost classifier is \[ C(S) = \sum_i \frac{L_i}{2}\log{\left( \frac{p_i}{1-p_i}\right)},\] where \( S = \{ L_i \} \). A few things to note are that \( \log{\left( \frac{p_i}{1-p_i}\right)} \) is \( {\rm logit}( p_i )\), and an expert with \( p = 0.5 \) contributes 0 to our classifier. This latter observation is what we'd expect, as \( p = 0.5 \) is random guessing. Also, experts with probabilities \( p_i \) and \( p_j \) such that \( p_i = 1 - p_...

Combining Expert Opinions: NaiveBoost

In many situations we're faced with multiple expert opinions. How should we combine them together into one opinion, hopefully better than any single opinion? I'll demonstrate the derivation of a classifier I'll call NaiveBoost. We'll start with a simple situation, and later gradually introduce more complexity. Let each expert state a yes or no opinion in response to a yes/no question (binary classifiers), each expert be independent of the other experts and assume expert \(i\) is correct with probability \(p_i\). We'll also assume that the prior distribution on whether the correct answer is yes or no to be uniform, i.e. each occurs with probability 0.5. Label a "yes" as +1, and "no" as -1. We ask our question, which has some unknown +1/-1 answer \(L\), and get back a set of responses (labels) \(S = \{L_i \}\), where \(L_i\) is the response from expert \(i\). Observe we have \[ \Pr(S | L=+1) = \prod_{i} {p_i}^{\frac{L_i+1}{2}} \cdot {(1-p_i)}^\...

Simplified Multinomial Kelly

Here's a simplified version for optimal Kelly bets when you have multiple outcomes (e.g. horse races). The Smoczynski & Tomkins algorithm, which is explained here (or in the original paper): https://en.wikipedia.org/wiki/Kelly_criterion#Multiple_horses Let's say there's a wager that, for every $1 you bet, will return a profit of $b if you win. Let the probability of winning be \(p\), and losing be \(q=1-p\). The original Kelly criterion says to wager only if \(b\cdot p-q > 0\) (the expected value is positive), and in this case to wager a fraction \( \frac{b\cdot p-q}{b} \) of your bankroll. But in a horse race, how do you decide which set of outcomes are favorable to bet on? It's tricky, because these wagers are mutually exclusive i.e. you can win at most one. It turns out there's a simple and intuitive method to find which bets are favorable: 1) Look at \( b\cdot p-q\) for every horse. 2) Pick any horse for which \( b\cdot p-q > 0\) and mar...