Skip to main content


Showing posts from June, 2012

Why does Kaggle use Log-loss?

A Bayes' Solution to Monty Hall

For any problem involving conditional probabilities one of your greatest allies is Bayes' Theorem. Bayes' Theorem says that for two events A and B, the probability of A given B is related to the probability of B given A in a specific way.

Standard notation:

probability of A given B is written \( \Pr(A \mid B) \)
probability of B is written \( \Pr(B) \)

Bayes' Theorem:

Using the notation above, Bayes' Theorem can be written: \[ \Pr(A \mid B) = \frac{\Pr(B \mid A)\times \Pr(A)}{\Pr(B)} \]Let's apply Bayes' Theorem to the Monty Hall problem. If you recall, we're told that behind three doors there are two goats and one car, all randomly placed. We initially choose a door, and then Monty, who knows what's behind the doors, always shows us a goat behind one of the remaining doors. He can always do this as there are two goats; if we chose the car initially, Monty picks one of the two doors with a goat behind it at random.

Assume we pick Door 1 and then Monty sho…

The Infamous Monty Hall

A (Brief) History:

The Monty Hall problem is arguably the most infamous probability puzzle in recent history. It was originally proposed in its current form in 1975, but only really surged into the public spotlight in 1990 when in appeared in a Parade column written by Marilyn vos Savant. For more details on the history of this problem see Wikipedia: Monty Hall problem.

The Original:

As given in Marilyn's column the problem read:
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? The phrasing here is ambiguous. Does Monty always show you a goat? Does he only show you a goat when switching would lead to a win? Does he only show you a …

Another Nice Application of Difference Equations

Here's a nice problem I encountered in a course on applied stochastic processes.


Show that the set of all pairs of positive integers can be placed into one-one correspondence with the positive integers by giving an explicit one-one mapping between the two sets.


This can be done expediently using the theory of partial difference equations. A standard diagonalization can be characterized by the following relations:
\( f(1,1)=1 \)\( f(x,y) = f(x-1,y)+x+y \)\( f(x,y) = f(x,y-1)+x+y-1 \) These can be written in difference notation as:
\( f(1,1)=1 \)\( \frac{\Delta f}{\Delta x} = x+y \)\( \frac{\Delta f}{\Delta y} = x+y-1 \) Equation 2 gives \( \frac{{\Delta}^2 f}{{\Delta x}{\Delta y}} = 1 \) and equation 3 gives \( \frac{{\Delta}^2 f}{{\Delta y}{\Delta x}} = 1, \) so this is an exact partial difference equation. Summing using equation 2, \[ f(x,y) = x(x+1)/2 + xy + g(y). \] Differencing and setting this equal to equation 3, \[ x+\frac{\Delta g}{\Delta y} = x+y-1 \] and s…

Difference Equations

Difference equations are frequently covered in courses on algorithms due to the relationship to recursion. Believe it or not, I first learned the basics of difference equations from George Boole's book (of Boolean algebra fame). This is freely available either as a download or to read online.

The methods that are the most useful for solving linear difference equations are use of the characteristic equation or applying the Z-transform. These are the discrete analogues of the characteristic equation and Laplace transform in differential equations.


A Treatise on the Calculus of Finite DifferencesWikipedia: Recurrence relationConcrete MathematicsMathematics for the Analysis of Algorithms

A Dicey Problem: Solution

Problem: Waldo rolls one standard (fair) 6-sided die repeatedly, keeping a running total of what he rolls. To three decimal places, what's the probability that he eventually reaches a total of exactly 1,000,000?

Solution: On average Waldo will be adding \( \mu = (1+2+3+4+5+6)/6 = 3.5 \) per roll, so we would "intuitively" or "naively" expect that the probability of hitting a particular total that's large enough would be close to \( 1/3.5 \). This turns out to be correct.
More generally, if we have any finite set of positive integers with no common factor, then the probability of hitting a particular total for large enough numbers is \( 1/\mu \), where \( \mu \) is the expected value for a single roll. Note that the probabilities associated with the positive integers in our set don't need to be the same, either, just greater than zero and summing to \( 1 \).

Proof: I'll show how you can prove this rigorously for the case where the outcomes are \( 1 \) …

Why Is the Statistician Angry?

Because all the good blog names were taken. And all the bad blog names were taken. In fact, all of the blog names were taken.

Attempts that were made:

threequarks (taken)the initial letters of all six quarks in order (taken)amanaplanacanalpanama (taken)hyperreal (taken)octonion (taken)overthruster (taken)oscillationoverthruster (taken)poincare (taken)galois (taken)logistic (taken)seaquarks (taken; really?)monkeyrage (taken; really?)

Dicier and Dicier

The Easy Way: Waldo and Basil are playing a game involving a single normal die. They take turns rolling it, adding the number appearing to their score, and the winner is the first person to total 100 points or more (totals greater than 100 are counted as 100). During the course of a game, which number is the least likely to appear among either Waldo's or Basil's subtotals?

The Hard Way: Waldo and Basil get tired of that game, and instead decide to play a similar game involving throwing two regular dice with the marathon goal of scoring 10,000 points or more. Estimate the probability Waldo, Basil, or both score exactly 9,000 points during the course of the game.

Testing MathJax

Right click on any equation for additional options.

\[ e = mc^2 \]
\[ \frac{r^2}{r^2+s^2} \]
\[ \sum_{n=1}^{\infty} \frac{1}{n^2} = \frac{{\pi}^2}{6} \]
\[ \int_{0}^{\infty} e^{-x}\, dx = 1 \]
\[ \int_{x^2 + y^2 \leq R^2} f(x,y)\,dx\,dy \]
\[ P(E) = {n \choose k} p^k (1-p)^{n-k} \]
\[ J_\alpha(x) = \sum\limits_{m=0}^\infty \frac{(-1)^m}{m! \, \Gamma(m + \alpha + 1)}{\left({\frac{x}{2}}\right)}^{2 m + \alpha} \]
\[\mathbf{V}_1 \times \mathbf{V}_2 = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial X}{\partial u} & \frac{\partial Y}{\partial u} & 0 \\ \frac{\partial X}{\partial v} & \frac{\partial Y}{\partial v} & 0 \end{vmatrix} \]

A Dicey Problem

Problem: Waldo rolls one standard (fair) 6-sided die repeatedly, keeping a running total of what he rolls. To three decimal places, what's the probability that he eventually reaches a total of exactly 1,000,000?

Hint: If you use your intuition the answer is a single division.

Bonus: Rigorously prove that what you did is correct.