The Angry Statistician

Posts

Showing posts from January, 2015

A Very Rough Guide to Getting Started in Data Science: Part I, MOOCs

Introduction Data science is a very hot, perhaps the hottest, field right now. Sports analytics has been my primary area of interest, and it's a field that has seen amazing growth in the last decade. It's no surprise that the most common question I'm asked is about becoming a data scientist. This will be a first set of rough notes attempting to answer this question from my own personal perspective. Keep in mind that this is only my opinion and there are many different ways to do data science and become a data scientist. Data science is using data to answer a question. This could be doing something as simple as making a plot by hand, or using Excel to take the average of a set of numbers. The important parts of this process are knowing which questions to ask, deciding what information you'd need to answer it, picking a method that takes this data and produces results relevant to your question and, most importantly, how to properly interpret these results so you ca

How Unfair are the NFL's Overtime Rules?

In 2010 the NFL amended its overtime rules , and in 2012 extended these to all regular season games. Previously, overtime was handled by sudden death - the first team to score won. The team winning a coin flip can elect to kick or receive (they invariably receive, as they should). Assuming the game ends in the first overtime, the team with the first possession wins under the following scenarios: scores a touchdown on the first drive kicks a field goal on the first drive; other team fails to score on the second drive both teams kick a field goal on the first and second drives; win in sudden death doesn't score on the first drive; defensive score during second drive neither team scores on first or second drives; win in sudden death Under this overtime procedure, roughly how often should be expect the team winning the coin flip to win the game? For an average team the empirical probabilities of the above events per drive are: \(\mathrm{defensiveTD} = \mathrm{Pr}(\tex

A Short Note on Automatic Algorithm Optimization via Fast Matrix Exponentiation

Alexander Borzunov has written an interesting article about his Python code that uses fast matrix exponentiation to automatically optimize certain algorithms. It's definitely a recommended read. In his article, Alexander mentions that it's difficult to directly derive a matrix exponentiation algorithm for recursively-defined sequences such as \[ F_n = \begin{cases} 0, & n = 0\\ 1, & n = 1\\ 1, & n = 2\\ 7(2F_{n-1} + 3F_{n-2})+4F_{n-3}+5n+6, & n \geq 3 \end{cases} \] While it's true that it's not entirely simple, there is a relatively straightforward way to do this that's worth knowing. The only difficultly is due to the term \(5n+6\), but we can eliminate it by setting \(F_n = G_n + an+b\), then solving for appropriate values of \(a, b\). Substituting and grouping terms we have \[ G_n + an+b = 7(2G_{n-1} + 3G_{n-2})+4G_{n-3} + 39an-68a+39b+5n+6, \] and equating powers of \(n\) we need to solve the equations \[ \begin{align*} a &a

Young Alan Turing and the Arctangent

With the release of the new film "The Imitation Game" , I decided to read the biography this excellent film was based on - Alan Turing: The Enigma . In it, the author Andrew Hodges relates the story that the 15-year-old Alan Turing derived the Maclaurin series for the \(\arctan\) function, i.e. \[\arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \ldots\] This is trivial using calculus, but it's explicitly stated that young Alan Turing neither knew nor used calculus. How would you derived such a series without calculus? This is a tricky problem, and I'd suggest first tackling the much easier problem of deriving the Maclaurin series for \(\exp(x)\) from the relation \( \exp(2x) = \exp(x)\cdot \exp(x)\). This is an underconstrained relation, so you'll need to assume \(c_0 = 1, c_1 = 1\). Getting back to \(\arctan\), you could start with the half-angle formula for the tangent : \[\tan(2x) = \frac{2\tan(x)}{1-{\tan}^2(x)}.\] Now use the Weierstrass