Skip to main content

Posts

Showing posts from 2014

Why does Kaggle use Log-loss?

A Strange Recursive Relation, Automatic

Hofstadter mentions the following recursive relation in his great book "Gödel, Escher, Bach": \[
\begin{align}
g(0) &= 0;\\
g(n) &= n-g(g(n-1)).
\end{align}
\] I claim that \( g(n) = \left\lfloor \phi\cdot (n+1) \right\rfloor \), where \( \phi = \frac{-1+\sqrt{5}}{2}\), and I'll show this using a technique that makes proving many identities of this type nearly automatic.

Let \( \phi\cdot n = \left\lfloor \phi\cdot n \right\rfloor + e\), where \( 0 < e < 1\) as \( \phi \) is irrational, nor can \(e = 1-\phi\), and note that \(\phi\) satisfies \( {\phi}^2 + \phi - 1 = 0\). Some algebra gives \[
\begin{align}
n-\left\lfloor \left( \left\lfloor \phi\cdot n \right\rfloor + 1 \right) \cdot \phi \right\rfloor
&= n-\left\lfloor \left( n\cdot \phi - e + 1 \right) \cdot \phi \right\rfloor \\
&= n-\left\lfloor n\cdot {\phi}^2 - e\cdot \phi + \phi \right\rfloor \\
&= n-\left\lfloor n\cdot \left(1-\phi\right) - e\cdot \phi + \phi \right\rfloor \\
&= n-n-…

Closed Under Means

Here's a nice little problem from the 13th All Soviet Union Mathematics Olympiad.
Given a set of real numbers \(S\) containing 0 and 1 that's closed under finite means, show that it contains all rational numbers in the interval \(\left[0,1\right]\). This isn't a difficult problem, but there's a particularly nice solution.

First observe that if \(x\in S\) then both \(\frac{x}{4}\) and \(\frac{3x}{4}\) are in \(S\); average \(\{0,x\}\) to get \(\frac{x}{2}\), average \(\{0, \frac{x}{2}\}\) to get \(\frac{x}{4}\), average \(\{\frac{x}{2}, x\}\) to get \(\frac{3x}{4}\).

We could show any rational number \(\frac{m}{n}\) with \(1\leq m < n\) is in \(S\) if we had \(n\) distinct elements from \(S\) that summed to \(m\). Let's exhibit one.

Start with an array with \(m\) 1s on the left, \(n-m\) 0s on the right. Repeatedly replace adjacent \(x,y\) values with \(\frac{3(x+y)}{4}, \frac{(x+y)}{4}\), where either \(x=1,y\neq1\) or \(x\neq 0, y=0\), until there is one 0 and o…

Learn One Weird Trick And Easily Solve The Product-Sum Problem

A tribute to Martin Gardner.

For which sets of positive integers does the sum equal the product? For example, when does \( x_1 + x_2 = x_1\cdot x_2\)? It's easy to see that this is only true when \(x_1 = x_2 = 2\).

In the general case our equality is \(\sum_{i=1}^{n} x_i= \prod_{i=1}^{n} x_i \). We can rearrange terms to give \[x_1+x_2+\sum_{i=3}^{n} x_i= x_1\cdot x_2\cdot \prod_{i=3}^{n} x_i,\] and this in turn factors nicely to give us \[\left( x_1\cdot \prod_{x=3}^{n} x_i - 1\right)\cdot \left( x_2\cdot \prod_{x=3}^{n} x_i - 1\right) = \left( \prod_{x=3}^{n} x_i \right)\cdot \left(\sum_{x=3}^{n} x_i \right) + 1.\] How does this help? Consider the case \(n=5\), and without loss of generality assume \(x_i \ge x_{i+1}\) for all \(i\). If \(x_3=x_4=x_5=1\) our factorized equation becomes \[(x_1-1)\cdot(x_2-1)=4,\] with the obvious solutions \( x_1=5, x_2=2; x_1=3, x_2=3\). The only remaining case to consider is \(x_3=2\), as any other case forces  \( \sum_{i=1}^{n} x_i < \prod_{…

Finding the Best Book Quotes: Power Ranking Goodreads

My friend Jordan Ellenberg wrote an article for the Wall Street Journal recently describing a metric to roughly measure the least read books, which he calls the Hawking Index (HI). As he describes it, take the page numbers of a book's five top highlights, average them, and divide by the number of pages in the whole book.

On a discussion thread on Facebook, this led to a proposal from me for measuring the general quality of a quote. Assume a user has some level of discrimination \(D\); the higher the value of \(D\), the more likely they are to quote a passage. Now assume each passage has some measure of quality \(Q\); the higher the value of \(Q\), the more likely a passage is to be quoted. Let's try a classic Bradley-Terry-Luce model - if a user with discrimination \(D\) quotes at least one passage from a particular work, the probability \(p\) that they'll quote a given passage with quality \(Q\) from that same work is roughly \[ p = \frac{Q}{Q+D}.\]
Nice, but where can w…

Five Free Student Tickets for the SaberSeminar in Boston (August 17-18, 2014)

Meredith Wills, Will Carroll and myself are donating four student two-day tickets, including lunch, for the upcoming baseball analytics Saberseminar run by Dan Brooks. This is a wonderful event, and 100% of the proceeds are donated to the Jimmy Fund. You must be a current student. Meredith and myself will by choosing four students by the end of this week, Sunday April 13, 2014.

Please note:
These tickets are for both days, August 17-18, 2014The event is in Boston, MALunch is included, but no other mealsTransportation and lodging are not included
If you would like to be considered for a donated ticket, please send: Your full name (first and last)If you're outside of the Boston area, how will you be getting to the event?Your school affiliation and whether high school or collegeBest contact email address (if different from reply-to address)A little about your baseball interests, analytical or otherwiseDo you see yourself working in baseball? For a team, as a journalist, or something el…