tag:blogger.com,1999:blog-9149402429183581490Thu, 22 Jan 2015 01:24:21 +0000The Angry Statisticianhttp://angrystatistician.blogspot.com/noreply@blogger.com (Christopher Long)Blogger48125tag:blogger.com,1999:blog-9149402429183581490.post-574142958622078177Thu, 22 Jan 2015 01:15:00 +00002015-01-21T17:24:21.668-08:00A Very Rough Guide to Getting Started in Data Science: Part I, MOOCs<h3>Introduction</h3><br />Data science is a very hot, perhaps the hottest, field right now. Sports analytics has been my primary area of interest, and it's a field that has seen amazing growth in the last decade. It's no surprise that the most common question I'm asked is about becoming a data scientist. This will be a first set of rough notes attempting to answer this question from my own personal perspective. Keep in mind that this is only my opinion and there are many different ways to do data science and become a data scientist.<br /><div><br /></div><div>Data science is using data to answer a question. This could be doing something as simple as making a plot by hand, or using Excel to take the average of a set of numbers. The important parts of this process are knowing which questions to ask, deciding what information you'd need to answer it, picking a method that takes this data and produces results relevant to your question and, most importantly, how to properly interpret these results so you can be confident that they actually answer your question.</div><div><br /></div><div>Knowing the questions requires some domain expertise, either yours or someone else's. Unless you're a data science researcher, data science is a tool you apply to another domain.</div><div><br /></div><div>If you have the data you feel should answer your question, you're in luck. Frequently you'll have to go out and collect the data yourself, e.g. scraping from the web. Even if you already have the data, it's common to have to process the data to remove bad data, correct errors and put it into a form better suited for analysis. A popular tool for this phase of data analysis is a scripting language; typically something like Python, Perl or Ruby. These are high-level programming languages that very good at web work as well as manipulating data.</div><div><br /></div><div>If you're dealing with a large amount of data, you'll find that it's convenient to store it in a structured way that makes it easier to access, manipulate and update in the future. This will typically be a relational database of some type, such as PostgreSQL, MySQL or SQL Server. These all use the programming language SQL.</div><div><br /></div><div>Methodology and interpretation are the most difficult, broadest and most important parts of data science. You'll see methodology referenced as statistical learning, machine learning, artificial intelligence and data mining; these can be covered in statistics, computer science, engineering or other classes. Interpretation is traditionally the domain of statistics, but this is always taught together with methodology.</div><div><br /></div><div>You can start learning much of this material freely and easily with MOOCs. Here's an initial list.<br /><br /></div><div><h3>MOOCs</h3></div><h4>Data Science Basics</h4><div><a href="https://www.coursera.org/course/datascitoolbox" target="_blank">Johns Hopkins: The Data Scientist’s Toolbox</a>. Overview of version control, markdown, git, GitHub, R, and RStudio. Started January 5, 2015. Coursera.</div><div><br /></div><div><a href="https://www.coursera.org/course/rprog" target="_blank">Johns Hopkins: R Programming</a>. R-based. Started January 5, 2015. Coursera.</div><div><br /></div><h4>Scripting Languages</h4><div><a href="https://www.udacity.com/course/cs101" target="_blank">Intro to Computer Science</a>. Python-based. Take anytime. Udacity; videos and exercises are free.</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe class="youtube-player" type="text/html" width="640" height="385" src="http://www.youtube.com/embed/Pm_WAWZNbdA" allowfullscreen frameborder="0"></iframe></div><div><br /></div><div><a href="https://www.udacity.com/course/ud036" target="_blank">Programming Foundations with Python</a>. Python-based. Take anytime. Udacity; videos and exercises are free.</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe class="youtube-player" type="text/html" width="640" height="385" src="http://www.youtube.com/embed/bwq2W6XUvyg" allowfullscreen frameborder="0"></iframe></div><div><br /></div><div><br /></div><div><a href="https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-0" target="_blank">MIT: Introduction to Computer Science and Programming Using Python</a>. Python-based. Class started January 9, 2015. edX.</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe class="youtube-player" type="text/html" width="640" height="385" src="http://www.youtube.com/embed/ww2BdhILIio" allowfullscreen frameborder="0"></iframe></div><div><br /></div><div><br /></div><h4>Databases and SQL</h4><div><a href="https://www.coursera.org/course/db" target="_blank">Stanford: Introduction to Databases</a>. XML, JSON, SQL; uses SQLite for SQL. Self-paced. Coursera.</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe class="youtube-player" type="text/html" width="640" height="385" src="http://www.youtube.com/embed/ShjrtAQmIVg" allowfullscreen frameborder="0"></iframe></div><div><br /></div><h4>Machine Learning</h4><div><a href="https://www.coursera.org/course/ml" target="_blank">Stanford: Machine Learning</a>. Octave-based. Class started January 19, 2015. Coursera.</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe class="youtube-player" type="text/html" width="640" height="385" src="http://www.youtube.com/embed/qeHZOdmJvFU" allowfullscreen frameborder="0"></iframe></div><div><br /></div><div><a href="https://class.stanford.edu/courses/HumanitiesandScience/StatLearning/Winter2015/about" target="_blank">Stanford: Statistical Learning</a>. R-based. Class started January 19, 2015. Stanford OpenEdX.</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><iframe class="youtube-player" type="text/html" width="640" height="385" src="http://www.youtube.com/embed/St2-97n7atk" allowfullscreen frameborder="0"></iframe></div><div><br /></div><div><br /></div>http://angrystatistician.blogspot.com/2015/01/a-very-rough-guide-to-getting-started.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-8851062544262278534Mon, 19 Jan 2015 06:59:00 +00002015-01-19T01:23:42.131-08:00How Unfair are the NFL's Overtime Rules?In 2010 the <a href="http://en.wikipedia.org/wiki/Overtime_%28sports%29#Major_American_professional_leagues" target="_blank">NFL amended its overtime rules</a>, and in 2012 extended these to all regular season games. Previously, overtime was handled by sudden death - the first team to score won. The team winning a coin flip can elect to kick or receive (they invariably receive, as they should).<br /><br />Assuming the game ends in the first overtime, the team with the first possession wins under the following scenarios:<br /><ol><li>scores a touchdown on the first drive</li><li>kicks a field goal on the first drive; other team fails to score on the second drive</li><li>both teams kick a field goal on the first and second drives; win in sudden death</li><li>doesn't score on the first drive; defensive score during second drive</li><li>neither team scores on first or second drives; win in sudden death</li></ol><div>Under this overtime procedure, roughly how often should be expect the team winning the coin flip to win the game?</div><div><br /></div><div>For an average team the empirical probabilities of the above events per drive are:</div><div><ul><li>\(\mathrm{defensiveTD} = \mathrm{Pr}(\text{defensive touchdown}) = 0.02\)</li><li>\(\mathrm{safety} = \mathrm{Pr}(\text{safety}) = 0.001\)</li><li>\(\mathrm{fieldGoal} = \mathrm{Pr}(\text{field goal}) = 0.118\)</li><li>\(\mathrm{offensiveTD} = \mathrm{Pr}(\text{offensive touchdown}) = 0.200\)</li></ul><div>We'll also use the following:</div><div><ul><li>\(\mathrm{defensiveScore} = \mathrm{Pr}(\text{any defensive score}) = \mathrm{defensiveTD} + \mathrm{safety}\)</li><li>\(\mathrm{offensiveScore} = \mathrm{Pr}(\text{any offensive score}) = \mathrm{fieldGoal} + \mathrm{offensiveTD}\)</li><li>\(\mathrm{noOFscore} = \mathrm{Pr}(\text{no offensive score}) = 1 - \mathrm{offensiveScore}\)</li><li>\(\mathrm{noScore} = \mathrm{Pr}(\text{no score}) = 1 - \mathrm{offensiveScore} - \mathrm{defensiveScore}\)</li><li>\(\mathrm{sdWin} = \mathrm{Pr}(\text{driving team winning under sudden death rules})\)</li></ul></div><div>Then the probabilities of the above numbered outcomes is approximately:</div><div><ol><li>\(\mathrm{offensiveTD}\)</li><li>\(\mathrm{fieldGoal}\times \mathrm{noOFscore}\)</li><li>\(\mathrm{fieldGoal}\times \mathrm{fieldGoal}\times \mathrm{sdWin}\)</li><li>\(\mathrm{noScore}\times \mathrm{defensiveScore}\)</li><li>\(\mathrm{noScore}\times \mathrm{noScore}\times \mathrm{sdWin}\)</li></ol><div>The last thing we need to work out is \(\text{sdWin}\). There are three ways for the team with possession to win:</div></div></div><div><ol><li>any offensive score on the first drive</li><li>no offensive score; any defensive score on the second drive</li><li>neither team scores on the first or second possessions; we're back to our original state</li></ol><div>These three scenarios have values:</div><ol><li>\(\mathrm{offensiveScore}\)</li><li>\(\mathrm{noOFscore}\times \mathrm{defensiveScore}\)</li><li>\(\mathrm{noScore}\times \mathrm{noScore}\times \mathrm{sdWin}\)</li></ol><div>Doing the math, we get that \begin{align*}<br />\mathrm{sdWin} &= \mathrm{offensiveScore} + \mathrm{noOFscore}\times \mathrm{defensiveScore} + {\mathrm{noScore}}^2\times\mathrm{sdWin};\\<br />\mathrm{sdWin} &=\frac{(\mathrm{offensiveScore} + \mathrm{noOFscore}\times \mathrm{defensiveScore})}{(1-{\mathrm{noScore}}^2)}.<br />\end{align*}<br />Putting it all together we get \[<br />\text{win} = \mathrm{offensiveTD} + \mathrm{fieldGoal}\times \mathrm{noOFscore} + \mathrm{noScore}\times \mathrm{defensiveScore}\\<br />+ (\mathrm{fieldGoal}^2+ \mathrm{noScore}^2)\times \mathrm{sdWin}.\]<br />Plugging in our empirical values, we finally arrive at \[\mathrm{Pr}(\text{win coin flip, win game}) = 0.560.\] For comparison, under the original sudden death rules, \[\mathrm{Pr}(\text{win coin flip, win game}) = 0.589.\] So the NFL overtime rules are still ridiculously unfair in favor of the winner of the coin flip, but not as ridiculously unfair as they were under the original sudden death rules.<br /><br />How do these numerical results compare to actual outcomes? Under the current overtime rules, there have been 51 overtime games. In 27 of these the team winning the coin toss won the game, in 21 the team losing the coin toss won the game and there have been 3 ties. That puts \(\mathrm{win} = \frac{27}{48} = 0.5625\) for games not ending in ties. Close enough!<br /><br />If you'd like to tweak the probabilities for each event to see how the resulting probability for the winner of the coin flip changes, I have a <a href="https://github.com/octonion/football/tree/master/overtime" target="_blank">simple Python script here</a>.</div></div>http://angrystatistician.blogspot.com/2015/01/how-unfair-is-nfls-current-overtime.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-4012430552273245039Mon, 12 Jan 2015 03:10:00 +00002015-01-12T14:36:16.167-08:00A Short Note on Automatic Algorithm Optimization via Fast Matrix Exponentiation<a href="https://github.com/borzunov" target="_blank">Alexander Borzunov</a> has written an <a href="http://kukuruku.co/hub/algorithms/automatic-algorithms-optimization-via-fast-matrix-exponentiation" target="_blank">interesting article</a> about his <a href="https://github.com/borzunov/cpmoptimize" target="_blank">Python code</a> that uses fast matrix exponentiation to automatically optimize certain algorithms. It's definitely a recommended read.<br /><br />In his article, Alexander mentions that it's difficult to directly derive a matrix exponentiation algorithm for recursively-defined sequences such as \[<br />F_n = \begin{cases}<br />0, & n = 0\\<br />1, & n = 1\\<br />1, & n = 2\\<br />7(2F_{n-1} + 3F_{n-2})+4F_{n-3}+5n+6, & n \geq 3<br />\end{cases}<br />\] While it's true that it's not entirely simple, there is a relatively straightforward way to do this that's worth knowing.<br /><br />The only difficultly is due to the term \(5n+6\), but we can eliminate it by setting<br />\(F_n = G_n + an+b\), then solving for appropriate values of \(a, b\).<br /><br />Substituting and grouping terms we have \[<br />G_n + an+b = 7(2G_{n-1} + 3G_{n-2})+4G_{n-3} + 39an-68a+39b+5n+6,<br />\] and equating powers of \(n\) we need to solve the equations \[<br />\begin{align*}<br />a &= 39a+5,\\<br />b &= -68a+39b+6.<br />\end{align*}<br />\] Setting \(a = -\frac{5}{38}, b = -\frac{142}{361}\) we end up with the homogeneous system \[<br />G_n = \begin{cases}<br />\frac{142}{361}, & n = 0\\<br />\frac{379}{722}, & n = 1\\<br />\frac{237}{361}, & n = 2\\<br />7(2G_{n-1} + 3G_{n-2})+4G_{n-3}, & n \geq 3<br />\end{cases}<br />\] This is now easily cast into matrix exponentiation form using the <a href="http://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form" target="_blank">standard method</a>.http://angrystatistician.blogspot.com/2015/01/a-short-note-on-automatic-algorithm.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-6682721504411884692Wed, 07 Jan 2015 19:08:00 +00002015-01-09T00:31:22.861-08:00Young Alan Turing and the ArctangentWith the release of the new film <a href="http://www.rottentomatoes.com/m/the_imitation_game/" target="_blank">"The Imitation Game"</a>, I decided to read the biography this excellent film was based on - <a href="http://www.amazon.com/gp/product/B00E3URHEK/ref=as_li_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00E3URHEK&linkCode=as2&tag=octonion-20&linkId=YRGFVTE5D6727REE" target="_blank">Alan M. Turing: Centenary Edition</a>. In it, the author Andrew Hodges relates the story that the 15-year-old Alan Turing derived the <a href="http://mathworld.wolfram.com/MaclaurinSeries.html" target="_blank">Maclaurin series</a> for the \(\arctan\) function, i.e. \[\arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \ldots\] This is trivial using calculus, but it's explicitly stated that young Alan Turing neither knew nor used calculus. How would you derived such a series without calculus?<br /><br />This is a tricky problem, and I'd suggest first tackling the much easier problem of deriving the Maclaurin series for \(\exp(x)\) from the relation \( \exp(2x) = \exp(x)\cdot \exp(x)\). This is an undercontrained relation, so you'll need to assume \(c_0 = 1, c_1 = 1\).<br /><br />Getting back to \(\arctan\), you could start with the <a href="http://en.wikipedia.org/wiki/Tangent_half-angle_formula" target="_blank">half-angle formula for the tangent</a>: \[\tan(2x) = \frac{2\tan(x)}{1-{\tan}^2(x)}.\] Now use the <a href="http://weierstrass-like%20substitution/" target="_blank">Weierstrass-like substitution</a> \(x = \arctan(t)\) to get \[\tan(2\arctan(t)) = \frac{2t}{1-t^2}.\] The right-hand side can be expanded in the usual geometric series fashion to get \[\tan(2\arctan(t)) = 2t\cdot (1+t^2+t^4+\ldots).\]<br />Finally, take the \(\arctan\) of both sides and assume we have the series expansion \(\arctan(x) = c_1 x + c_3 x^3 + c_5 x^5 + \ldots\). Note that we may ignore the terms with even powers of \(x\) as \(\arctan(x)\) is an <a href="http://en.wikipedia.org/wiki/Even_and_odd_functions#Odd_functions" target="_blank">odd function</a>.<br /><br />This gives us the setting \[2\arctan(t) = \arctan(2t\cdot (1+t^2+t^4+\ldots))\] and expanding as a power series \[2(c_1 t + c_3 t^3 + \ldots) = c_1 (2t\cdot (1+t^2+\ldots) + c_3 (2t\cdot (1+t^2+\ldots))^3 + \ldots\]<br />The next step is to line up powers of \(t\) on both sides and set up a system of simultaneous equations. There's some algebra and combinatorics involved, but we end up with the system of equations \[c_{2i+1} = \sum_{j=0}^{i} c_{2j+1}\cdot 2^{2j} \cdot {{i+j} \choose {2j}}.\] Note that this system is underconstrained due to our functional relationship being satisfied by any multiple of the \(\arctan\) function. We'll assume that \(c_1 = 1\), but note that this follows from the <a href="https://math.stackexchange.com/questions/75130/how-to-prove-that-lim-limits-x-to0-frac-sin-xx-1/75151#75151" target="_blank">classical (non-calculus) limit</a> \( \lim_{x\to 0} \frac{\sin(x)}{x} = 1\).<br /><br />The first few relations are \begin{align*}<br />c_1 &= c_1 \\<br />c_3 &= c_1 + 4\cdot c_3 \\<br />c_5 &= c_1 + 12\cdot c_3 + 16\cdot c_5 \\<br />c_7 &= c_1 + 24\cdot c_3 + 80\cdot c_5 + 64\cdot c_7<br />\end{align*}<br />Assuming \(c_1 = 1\) as above we quickly calculate \( c_3 = -\frac{1}{3}, c_5 = \frac{1}{5}, c_7 = -\frac{1}{7}\), with the pattern being obvious.<br /><br />That \(c_{2i+1} = \frac{(-1)^i}{2i+1}\) can be verified by Wolfram Alpha:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.wolframalpha.com/input/?i=%5Csum_%7Bj%3D0%7D%5E%7Bi%7D+%28-1%29%5Ej%2F%282j%2B1%29+2%5E%7B2j%7D+%7B%7Bi%2Bj%7D+choose+%7B2j%7D%7D" target="_blank"><img alt="Wolfram Alpha" border="0" src="http://4.bp.blogspot.com/-R3rBbUh6_V4/VK4IDbKpB9I/AAAAAAAABmQ/TW0iRh25w0k/s1600/img.gif" title="" /></a></div><br />An obvious question is whether or not there's a simple demonstration of this; in particular, one that a young Alan Turing may have found. This I don't know (yet).http://angrystatistician.blogspot.com/2015/01/young-alan-turing.htmlnoreply@blogger.com (Christopher Long)1tag:blogger.com,1999:blog-9149402429183581490.post-7402880594498963226Sun, 12 Oct 2014 03:38:00 +00002014-10-11T20:41:45.665-07:00A Strange Recursive Relation, AutomaticHofstadter mentions the following recursive relation in his great book "Gödel, Escher, Bach": \[<br />\begin{align}<br />g(0) &= 0;\\<br />g(n) &= n-g(g(n-1)).<br />\end{align}<br />\] I claim that \( g(n) = \left\lfloor \phi\cdot (n+1) \right\rfloor \), where \( \phi = \frac{-1+\sqrt{5}}{2}\), and I'll show this using a technique that makes proving many identities of this type nearly automatic.<br /><br />Let \( \phi\cdot n = \left\lfloor \phi\cdot n \right\rfloor + e\), where \( 0 < e < 1\) as \( \phi \) is irrational, nor can \(e = 1-\phi\), and note that \(\phi\) satisfies \( {\phi}^2 + \phi - 1 = 0\). Some algebra gives \[<br />\begin{align}<br />n-\left\lfloor \left( \left\lfloor \phi\cdot n \right\rfloor + 1 \right) \cdot \phi \right\rfloor<br />&= n-\left\lfloor \left( n\cdot \phi - e + 1 \right) \cdot \phi \right\rfloor \\<br />&= n-\left\lfloor n\cdot {\phi}^2 - e\cdot \phi + \phi \right\rfloor \\<br />&= n-\left\lfloor n\cdot \left(1-\phi\right) - e\cdot \phi + \phi \right\rfloor \\<br />&= n-n-\left\lfloor -n\cdot \phi - e\cdot \phi + \phi \right\rfloor \\<br />&= -\left\lfloor -n\cdot \phi - e\cdot \phi + \phi \right\rfloor \\<br />&= -\left\lfloor -n \cdot \phi + e - e - e\cdot \phi + \phi \right\rfloor \\<br />&= \left\lfloor \phi\cdot n \right\rfloor -\left\lfloor - e - e\cdot \phi + \phi \right\rfloor.<br />\end{align}<br />\]<br />Now if \[<br />\begin{align}<br />0 < e < 1-\phi &\implies 0 < - e - e\cdot \phi + \phi < \phi;\\<br />1-\phi < e < 1 &\implies -1 < - e - e\cdot \phi + \phi < 0.<br />\end{align}<br />\]<br />This implies \[ n-\left\lfloor \left( \left\lfloor \phi\cdot n \right\rfloor + 1 \right) \cdot \phi \right\rfloor = \left\lfloor \phi\cdot (n+1) \right\rfloor .\] Since \( \left\lfloor \phi\cdot (0+1) \right\rfloor = 0\), we're done.<br /><br />The point of the algebra was to move all terms involving \(n\) out, and then checking to see how the remaining term varied with \( e\). A simple idea, but very useful.http://angrystatistician.blogspot.com/2014/10/a-strange-recursive-relation-automatic.htmlnoreply@blogger.com (Christopher Long)1tag:blogger.com,1999:blog-9149402429183581490.post-562657659715110374Sun, 05 Oct 2014 19:26:00 +00002014-10-05T21:01:29.267-07:00Closed Under MeansHere's a nice little problem from the 13th All Soviet Union Mathematics Olympiad.<br /><blockquote class="tr_bq">Given a set of real numbers \(S\) containing 0 and 1 that's closed under finite means, show that it contains all rational numbers in the interval \(\left[0,1\right]\).</blockquote>This isn't a difficult problem, but there's a particularly nice solution.<br /><br />First observe that if \(x\in S\) then both \(\frac{x}{4}\) and \(\frac{3x}{4}\) are in \(S\); average \(\{0,x\}\) to get \(\frac{x}{2}\), average \(\{0, \frac{x}{2}\}\) to get \(\frac{x}{4}\), average \(\{\frac{x}{2}, x\}\) to get \(\frac{3x}{4}\).<br /><br />We could show any rational number \(\frac{m}{n}\) with \(1\leq m < n\) is in \(S\) if we had \(n\) distinct elements from \(S\) that summed to \(m\). Let's exhibit one.<br /><br />Start with an array with \(m\) 1s on the left, \(n-m\) 0s on the right. Repeatedly replace adjacent \(x,y\) values with \(\frac{3(x+y)}{4}, \frac{(x+y)}{4}\), where either \(x=1,y\neq1\) or \(x\neq 0, y=0\), until there is one 0 and one 1 left. We can do this in exactly \(n-2\) steps, each resulting number is guaranteed to be in \(S\) by the above note, and each number is guaranteed to be distinct by construction!<br /><br />Examples:<br /><br />\(\frac{1}{3}: \left[1,0,0\right] \to \left[\frac{3}{4},\frac{1}{4},0\right] \)<br /><br />\(\frac{2}{5}: \left[1,1,0,0,0\right] \to \left[1,\frac{3}{4},\frac{1}{4},0,0\right] \to \left[1,\frac{3}{4},\frac{3}{16},\frac{1}{16},0\right] \)http://angrystatistician.blogspot.com/2014/10/closed-under-means.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-8609248843020646690Fri, 29 Aug 2014 04:28:00 +00002014-08-30T01:15:37.813-07:00Learn One Weird Trick And Easily Solve The Product-Sum ProblemA tribute to <a href="https://en.wikipedia.org/wiki/Martin_Gardner" target="_blank"><b>Martin Gardner</b></a>.<br /><br />For which sets of positive integers does the sum equal the product? For example, when does \( x_1 + x_2 = x_1\cdot x_2\)? It's easy to see that this is only true when \(x_1 = x_2 = 2\).<br /><br />In the general case our equality is \(\sum_{i=1}^{n} x_i= \prod_{i=1}^{n} x_i \). We can rearrange terms to give \[x_1+x_2+\sum_{i=3}^{n} x_i= x_1\cdot x_2\cdot \prod_{i=3}^{n} x_i,\] and this in turn factors nicely to give us \[\left( x_1\cdot \prod_{x=3}^{n} x_i - 1\right)\cdot \left( x_2\cdot \prod_{x=3}^{n} x_i - 1\right) = \left( \prod_{x=3}^{n} x_i \right)\cdot \left(\sum_{x=3}^{n} x_i \right) + 1.\] How does this help? Consider the case \(n=5\), and without loss of generality assume \(x_i \ge x_{i+1}\) for all \(i\). If \(x_3=x_4=x_5=1\) our factorized equation becomes \[(x_1-1)\cdot(x_2-1)=4,\] with the obvious solutions \( x_1=5, x_2=2; x_1=3, x_2=3\). The only remaining case to consider is \(x_3=2\), as any other case forces \( \sum_{i=1}^{n} x_i < \prod_{i=1}^{n} x_i \). For this case our equality becomes \[\left( 2 x_1 - 1\right)\cdot \left( 2 x_2 - 1\right) = 9.\] This gives us the remaining solution \(x_1 = x_2 = x_3 = 2\) with the other \(x_i = 1\).<br /><br />This is quite efficient. For example, for the case \(n=100\) the only equations we need to consider are \begin{aligned} (x_1-1)\cdot(x_2-1) &= 99\\<br /> (2 x_1-1)\cdot(2 x_2-1) &= 199\\<br /> (3 x_1-1)\cdot(3 x_2-1) &= 301\\<br /> (4 x_1-1)\cdot(4 x_2-1) &= 405\\<br /> (4 x_1-1)\cdot(4 x_2-1) &= 401\\<br /> (6 x_1-1)\cdot(6 x_2-1) &= 607\\<br /> (9 x_1-1)\cdot(9 x_2-1) &= 919\\<br /> (8 x_1-1)\cdot(8 x_2-1) &= 809\\<br /> (12 x_1-1)\cdot(12 x_2-1) &= 1225\\<br /> (16 x_1-1)\cdot(16 x_2-1) &= 1633.<br />\end{aligned} Now 199, 401, 607, 919, 809 are all prime ruling them out immediately, and 301 and 1633 don't have factors of the right form. The other equations yield the five solutions \( (100,2), (34,4), (12,10), (7,4,4), (3,3,2,2,2) \) with the other \(x_i = 1\).<br /><br />For \(n = 1000 \) you'd need to consider 52 cases, but most of these are eliminated immediately. I get the six solutions \( (1000,2), (334,4), (112,10), (38,28), (67,4,4), (16,4,4,4) \).<br /><br />Have fun!<br /><br />http://angrystatistician.blogspot.com/2014/08/learn-this-one-weird-trick-and-easily.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-5713515223902536661Wed, 09 Jul 2014 03:01:00 +00002014-07-08T20:30:01.826-07:00Finding the Best Book Quotes: Power Ranking GoodreadsMy friend Jordan Ellenberg <a href="http://online.wsj.com/articles/the-summers-most-unread-book-is-1404417569" target="_blank">wrote an article for the Wall Street Journal</a> recently describing a metric to roughly measure the least read books, which he calls the Hawking Index (HI). As he describes it, take the page numbers of a book's five top highlights, average them, and divide by the number of pages in the whole book.<br /><br />On a discussion thread on Facebook, this led to a proposal from me for measuring the general quality of a quote. Assume a user has some level of discrimination \(D\); the higher the value of \(D\), the more likely they are to quote a passage. Now assume each passage has some measure of quality \(Q\); the higher the value of \(Q\), the more likely a passage is to be quoted. Let's try a classic <a href="http://en.wikipedia.org/wiki/Pairwise_comparison" target="_blank">Bradley-Terry-Luce</a> model - if a user with discrimination \(D\) quotes at least one passage from a particular work, the probability \(p\) that they'll quote a given passage with quality \(Q\) from that same work is roughly \[ p = \frac{Q}{Q+D}.\] <br />Nice, but where can we find data to actually test this theory? It turns out an excellent source is <a href="https://www.goodreads.com/" target="_blank">Goodreads</a>. Users are numbered sequentially starting with Goodreads founder <a href="https://www.goodreads.com/user/show/1" target="_blank">Otis Chandler's ID of 1</a>; a user's <a href="https://www.goodreads.com/quotes/list/1" target="_blank">associated quote page</a> has a simple URL structure. Furthermore, every user's quote list has an <a href="https://www.goodreads.com/quotes/list_rss/1-otis-chandler" target="_blank">associated RSS/Atom URL</a> that provides an easy to parse XML file.<br /><br />The steps:<br /><ol><li>Write a simple Go scraper to uses a goroutine to quickly and sequentially get user quote pages, grab the count of quotes shown on that page (truncated at 30), then grab the RSS/Atom URL. This is able to process about 2500/users minutes over a slow internet connection.</li><li>Write a simple Ruby program to fetch quote XML files for every user that has at least one quote.</li><li>Write a simple Ruby program to fetch the author URL and work URL for every quote cited by at least one user.</li><li>Load all this data into a PostgreSQL database.</li><li>Write a simple R program to pull the data from the database and fit our model. We assign a value for the outcome "quoted" of 1 if the user quoted a passage; we assign "quoted" the value 0 if the user quote a passage from that work, but not that particular passage. Both user "discrimination" and quote "quality" are treated as factors and modeled as random effects; the R package "lmer" was used to fit the model.</li></ol><div>If there is interest, I can make all of this code publicly available through my <a href="https://github.com/octonion" target="_blank">GitHub account</a>. None of it is proprietary.</div><div><br /></div><div>As a test I grabbed the data from 333760 users, 10724 had at least one quote, there were 83160 total quotes and 32621 unique quotes. The model took about 3.5 minutes to fit using an optimized BLAS library and 4 hyperthreaded CPU cores.</div><div><br /></div><div>The top 10 quotes on Goodreads, ranked by quality:<br /><ol><li><a href="https://www.goodreads.com/quotes/11035-no-one-can-make-you-feel-inferior-without-your-consent" target="_blank">“No one can make you feel inferior without your consent.”</a><br /><b>Eleanor Roosevelt</b>, <i>This is My Story</i></li><li><a href="https://www.goodreads.com/quotes/352-outside-of-a-dog-a-book-is-man-s-best-friend" target="_blank">“Outside of a dog, a book is man's best friend. Inside of a dog it's too dark to read.”</a><br /><b>Groucho Marx</b>, <i>The Essential Groucho: Writings For By And About Groucho Marx</i></li><li><a href="https://www.goodreads.com/quotes/2340-twenty-years-from-now-you-will-be-more-disappointed-by" target="_blank">“Twenty years from now you will be more disappointed by the things that you didn't do than by the ones you did do. So throw off the bowlines. Sail away from the safe harbor. Catch the trade winds in your sails. Explore. Dream. Discover.”</a><br /><b>H. Jackson Brown Jr.</b>, <i>P.S. I Love You</i></li><li><a href="https://www.goodreads.com/quotes/1198-i-am-so-clever-that-sometimes-i-don-t-understand-a" target="_blank">“I am so clever that sometimes I don't understand a single word of what I am saying.”</a><br /><b>Oscar Wilde</b>, <i>The Happy Prince and Other Stories</i></li><li><a href="https://www.goodreads.com/quotes/5543-insanity-is-doing-the-same-thing-over-and-over-again" target="_blank">“Insanity is doing the same thing, over and over again, but expecting different results.”</a><br /><b>Narcotics Anonymous</b>, <i>Narcotics Anonymous</i></li><li><a href="https://www.goodreads.com/quotes/2644-he-has-achieved-success-who-has-lived-well-laughed-often" target="_blank">“He has achieved success who has lived well, laughed often, and loved much;Who has enjoyed the trust of pure women, the respect of intelligent men and the love of little children;Who has filled his niche and accomplished his task;Who has never lacked appreciation of Earth's beauty or failed to express it;Who has left the world better than he found it,Whether an improved poppy, a perfect poem, or a rescued soul;Who has always looked for the best in others and given them the best he had;Whose life was an inspiration;Whose memory a benediction.”</a><br /><b>Bessie Anderson Stanley</b>, <i>More Heart Throbs Volume Two in Prose and Verse Dear to the American People And by them contributed as a Supplement to the original $10,000 Prize Book HEART THROBS</i></li><li><a href="https://www.goodreads.com/quotes/2627-the-trouble-with-having-an-open-mind-of-course-is" target="_blank">“The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.”</a><br /><b>Terry Pratchett</b>, <i>Diggers</i></li><li><a href="https://www.goodreads.com/quotes/2621-love-all-trust-a-few-do-wrong-to-none" target="_blank">“Love all, trust a few, do wrong to none.”</a><br /><b>William Shakespeare</b>, <i>All's Well That Ends Well</i></li><li><a href="https://www.goodreads.com/quotes/6396-the-paradoxical-commandments-people-are-illogical-unreasonable-and-self-centered-love" target="_blank"><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">“The Paradoxical Commandments</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />People are illogical, unreasonable, and self-centered.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Love them anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />If you do good, people will accuse you of selfish ulterior motives.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Do good anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />If you are successful, you will win false friends and true enemies.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Succeed anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />The good you do today will be forgotten tomorrow.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Do good anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />Honesty and frankness make you vulnerable.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Be honest and frank anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />The biggest men and women with the biggest ideas can be shot down by the smallest men and women with the smallest minds.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Think big anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />People favor underdogs but follow only top dogs.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Fight for a few underdogs anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />What you spend years building may be destroyed overnight.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Build anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />People really need help but may attack you if you do help them.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;">Help people anyway.</span><span style="background-color: white; color: #181818; font-family: georgia, serif; font-size: 14px; line-height: 18px;"><br />Give the world the best you have and you'll get kicked in the teeth.</span></a><span style="background-color: white;"><span style="color: #181818; font-family: georgia, serif;"><span style="font-size: 14px; line-height: 18px;"><a href="https://www.goodreads.com/quotes/6396-the-paradoxical-commandments-people-are-illogical-unreasonable-and-self-centered-love" target="_blank">Give the world the best you have anyway.”</a></span></span><br /><span style="color: #181818; font-family: georgia, serif;"><span style="font-size: 14px; line-height: 18px;"><b>Kent M. Keith</b>, <i>The Silent Revolution: Dynamic Leadership in the Student Council</i></span></span></span></li><li><span style="background-color: white;"><span style="color: #181818; font-family: georgia, serif;"><span style="font-size: 14px; line-height: 18px;"><a href="https://www.goodreads.com/quotes/2795-she-is-too-fond-of-books-and-it-has-turned" target="_blank">“She is too fond of books, and it has turned her brain.”</a></span><br /><span style="font-size: 14px; line-height: 18px;"><b>Louisa May Alcott</b>, <i>Work: A Story of Experience</i></span></span></span></li></ol></div>http://angrystatistician.blogspot.com/2014/07/finding-best-book-quotes-power-ranking.htmlnoreply@blogger.com (Christopher Long)6tag:blogger.com,1999:blog-9149402429183581490.post-5487884645169400777Tue, 08 Apr 2014 03:04:00 +00002014-04-09T23:39:54.078-07:00Five Free Student Tickets for the SaberSeminar in Boston (August 17-18, 2014)<a href="https://twitter.com/Bbl_Astrophyscs" target="_blank">Meredith Wills</a>, <a href="https://twitter.com/injuryexpert" target="_blank">Will Carroll</a> and <a href="https://twitter.com/octonion" target="_blank">myself</a> are donating four student two-day tickets, including lunch, for the upcoming baseball analytics <a href="http://saberseminar.com/" target="_blank">Saberseminar</a> run by <a href="https://twitter.com/brooksbaseball" target="_blank">Dan Brooks</a>. This is a wonderful event, and 100% of the proceeds are donated to the <a href="https://www.jimmyfund.org/" target="_blank">Jimmy Fund</a>. You <b>must</b> be a current student. Meredith and myself will by choosing four students by the end of this week, Sunday April 13, 2014.<br /><br />Please note:<br /><ul><li>These tickets are for both days, August 17-18, 2014</li><li>The event is in Boston, MA</li><li>Lunch is included, but <b>no</b> other meals</li><li>Transportation and lodging are <b>not</b> included</li></ul><div><br />If you would like to be considered for a donated ticket, please send:</div><div><ul><li>Your full name (first and last)</li><li>If you're outside of the Boston area, how will you be getting to the event?</li><li>Your school affiliation and whether high school or college</li><li>Best contact email address (if different from reply-to address)</li><li>A little about your baseball interests, analytical or otherwise</li><li>Do you see yourself working in baseball? For a team, as a journalist, or something else?</li></ul><div><br />Please email the above information to me at <a href="mailto:sabermetrics@gmail.com" target="_blank">sabermetrics@gmail.com</a>.</div><div><br /></div><div>Again, please do so by the end of the day on Sunday, April 13, 2014. Once the tickets are awarded they're gone.</div></div>http://angrystatistician.blogspot.com/2014/04/four-free-student-tickets-for.htmlnoreply@blogger.com (Christopher Long)2tag:blogger.com,1999:blog-9149402429183581490.post-3718907100550685167Mon, 30 Dec 2013 17:51:00 +00002013-12-30T10:30:04.425-08:00A Stupid and Strange Way of Looking at Sports Power Ratings that could be Smart and UsefulAs I've <a href="http://angrystatistician.blogspot.com/2013/03/baseball-chess-psychology-and.html"><b>mentioned previously</b></a>, a common method used in sports for estimating game outcomes known as <a href="https://en.wikipedia.org/wiki/Log5"><b>log5</b></a> can be written \[p = \frac{p_1 q_2}{p_1 q_2+q_1 p_2}\] where \(p_i\) is the fraction of games won by team \(i\) and \(q_i\) is the fraction of games lost by team \(i\). We're assuming that there are no ties. What's the easiest way to derive this estimate? Here's one argument. Assume team \(i\) has a probability \(p_i\) of beating an average team (a team that wins half its games). Now imagine that this means for any given game the team has some <b>"strength"</b> sampled from [0,1] with median \(p_i\) and that the stronger team always wins. Thus, the probability that team 1 beats team 2 is \[ p = \int_0^1 \int_0^1 \! \mathrm{Pr}(p_1 > p_2) \, \mathrm{d} p_1 \mathrm{d} p_2 .\] This looks complicated, but but with probability \(p_1\) team 1 is stronger than an average team and with probability \(p_2\) team 2 is stronger than an average team. From this perspective the log5 estimate is just the <a href="https://en.wikipedia.org/wiki/Bayes'_theorem" target="_blank"><b>Bayesian probability</b></a> that team 1 will be stronger than an average team while team 2 will be weaker than an average team, conditional on either team 1 being stronger than an average team and team 2 weaker than an average team, or team 1 weaker than an average team and team 2 stronger than an average team. In these cases it's unambiguous which team is stronger. The cases where the strength of both teams is stronger or weaker than an average team (the ambiguous cases) are thus discarded.<br /><br />How could this be useful? Instead of ignoring the ambiguous outcomes when estimating the outcome probabilities under this <b>"latent strength"</b> model, we could instead determine which probability distributions best fit the outcome distributions for a given league! Furthermore, this allows us to cohesively put a power rating system into a Bayesian framework by assigning to each team a <a href="https://en.wikipedia.org/wiki/Prior_probability" target="_blank"><b>Bayesian prior</b></a> strength distribution. These priors could either be <a href="https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors" target="_blank"><b>uninformative</b></a> or <a href="https://en.wikipedia.org/wiki/Prior_probability#Informative_priors" target="_blank"><b>informative</b></a> using e.g. preseason rankings.http://angrystatistician.blogspot.com/2013/12/a-stupid-and-strange-way-of-looking-at.htmlnoreply@blogger.com (Christopher Long)2tag:blogger.com,1999:blog-9149402429183581490.post-6620747293907708484Thu, 07 Nov 2013 23:28:00 +00002013-11-07T16:01:12.861-08:00Building a Personal SupercomputerIt's time for a workstation upgrade; here's what I've assembled.<br /><br />The massive case has plenty of space for additional drives to store the extreme amount of data that nearly every sport is now collecting together with the associated video footage. The case, power supply and motherboard allow up to 3 additional video cards to use as GPU units as your analytical needs demand (and budget can handle).<br /><ol><li><a href="http://www.amazon.com/gp/product/B00BCXF4JQ/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00BCXF4JQ&linkCode=as2&tag=octonion-20" target="_blank">Cooler Master Cosmos II - Ultra Full Tower Computer Case</a><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://ecx.images-amazon.com/images/I/91O40bKe-mL._SL1500_.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="320" src="http://ecx.images-amazon.com/images/I/91O40bKe-mL._SL1500_.jpg" title="Cooler Master Cosmos II" width="259" /></a></div><br />An absolutely massive case - plenty of room for an E-ATX motherboard, large power supply, multiple video cards and hard drives. <br /><br /></li><li><a href="http://www.amazon.com/gp/product/B008U12856/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B008U12856&linkCode=as2&tag=octonion-20" target="_blank">PC Power & Cooling 1200W Silencer MK III Series Modular Power Supply</a><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://ecx.images-amazon.com/images/I/41-vt3pnUJL.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="214" src="http://ecx.images-amazon.com/images/I/41-vt3pnUJL.jpg" title="PC Power & Cooling 1200W Silencer MK III" width="320" /></a></div><br />An exceptionally large power supply, but platinum rated and plenty of room to spare allows it to operate nearly silenty, plus it'll handle any additional video cards you'll add later for GPU computing. <br /><br /></li><li><a href="http://www.newegg.com/Product/Product.aspx?Item=N82E16813132053" target="_blank">ASUS Rampage IV Black Edition LGA 2011 Extended ATX Intel Motherboard</a><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://images10.newegg.com/NeweggImage/productimage/13-132-053-02.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="240" src="http://images10.newegg.com/NeweggImage/productimage/13-132-053-02.jpg" title="ASUS Rampage IV Black Edition" width="320" /></a></div><br />Space for 4 video cards, 8 memory sticks (up to 64GB), superb quality and extreme tweakability. <br /><br /></li><li><a href="http://www.amazon.com/gp/product/B00EMHM622/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00EMHM622&linkCode=as2&tag=octonion-20" target="_blank">Intel i7-4930K LGA 2011 64 Technology Extended Memory CPU</a><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://ecx.images-amazon.com/images/I/91++fq52iVL._SL1500_.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="320" src="http://ecx.images-amazon.com/images/I/91++fq52iVL._SL1500_.jpg" title="Intel i7-4930K LGA 2011" width="236" /></a></div><br />Ivy Bridge, 6 cores, exceptional ability to overclock. <br /><br /></li><li><a href="http://www.amazon.com/gp/product/B008RAS8AO/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B008RAS8AO&linkCode=as2&tag=octonion-20" target="_blank">(2) Corsair Dominator Platinum 32GB (4x8GB) DDR3 2133 MHz</a><img alt="" border="0" height="1" src="http://ir-na.amazon-adsystem.com/e/ir?t=octonion-20&l=as2&o=1&a=B008RAS8AO" style="border: none !important; margin: 0px !important;" width="1" /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://ecx.images-amazon.com/images/I/51H33Y5+3zL.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="320" src="http://ecx.images-amazon.com/images/I/51H33Y5+3zL.jpg" title="Corsair Dominator Platinum 32GB (4x8GB)" width="320" /></a></div><br />Total of 64GB. Top-quality, you'll need your RAM in matched sets of 4 to enable quad-channel.<br /><br /></li><li><a href="http://www.amazon.com/gp/product/B00BL8BX7O/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00BL8BX7O&linkCode=as2&tag=octonion-20" target="_blank">EVGA GeForce GTX TITAN SuperClocked 6GB GDDR5 384bit</a><img alt="" border="0" height="1" src="http://ir-na.amazon-adsystem.com/e/ir?t=octonion-20&l=as2&o=1&a=B00BL8BX7O" style="border: none !important; margin: 0px !important;" width="1" /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://ecx.images-amazon.com/images/I/71phIxf42XL._SL1200_.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="320" src="http://ecx.images-amazon.com/images/I/71phIxf42XL._SL1200_.jpg" title="EVGA GeForce GTX TITAN SuperClocked" width="320" /></a></div><br /> Top-of-the-line NVIDIA Kepler card for GPU computing. 2688 CUDA cores that reach 4800 TFLOPS single-precision and 1600 TFLOPS double-precision.<br /><br /></li><li><a href="http://www.amazon.com/gp/product/B00D1GYNT4/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00D1GYNT4&linkCode=as2&tag=octonion-20" target="_blank">(2) Seagate NAS HDD 2TB SATA 6GB NCQ 64 MB Cache Bare Drive</a><img alt="" border="0" height="1" src="http://ir-na.amazon-adsystem.com/e/ir?t=octonion-20&l=as2&o=1&a=B00D1GYNT4" style="border: none !important; margin: 0px !important;" width="1" /><br /><br />Solid hard drive; you'll need 2 or more drives for a RAID 10 array.<br /><br /></li><li><a href="http://www.amazon.com/gp/product/B009NB8WRU/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B009NB8WRU&linkCode=as2&tag=octonion-20" target="_blank">Samsung Electronics 840 Pro Series 2.5-Inch 256 GB SATA 6GB/s Solid State Drive</a><img alt="" border="0" height="1" src="http://ir-na.amazon-adsystem.com/e/ir?t=octonion-20&l=as2&o=1&a=B009NB8WRU" style="border: none !important; margin: 0px !important;" width="1" /><br /><br />Small, fast SSD for quick booting and anything bottlenecked by drive read/write times.<br /><br /></li><li><a href="http://www.amazon.com/gp/product/B008O510FW/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B008O510FW&linkCode=as2&tag=octonion-20" target="_blank">Silverstone Tek 3.5-inch to 2 x 2.5-Inch Bay Converter</a><img alt="" border="0" height="1" src="http://ir-na.amazon-adsystem.com/e/ir?t=octonion-20&l=as2&o=1&a=B008O510FW" style="border: none !important; margin: 0px !important;" width="1" /><br /><br />Needed to adapt the SSD to the Cosmos II case.<br /><br /></li><li><a href="http://www.ubuntu.com/" target="_blank">Ubuntu Linux</a><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://design.ubuntu.com/wp-content/uploads/logo-ubuntu_st_no%C2%AE-black_orange-hex.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="" border="0" height="226" src="http://design.ubuntu.com/wp-content/uploads/logo-ubuntu_st_no%C2%AE-black_orange-hex.png" title="Ubuntu Linux" width="320" /></a></div><br /> Ubuntu Linux - what else? </li></ol>http://angrystatistician.blogspot.com/2013/11/building-personal-supercomputer.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-1224011523756248298Mon, 16 Sep 2013 06:58:00 +00002013-09-16T00:10:30.580-07:00The Good, the Bad and the Weird: Duels and the Gentleman's Draw<div>As I mentioned in the previous article <a href="http://angrystatistician.blogspot.com/2013/08/the-good-bad-and-weird-duels-truels-and.html">"The Good, the Bad and the Weird: Duels, Truels and Utility Functions"</a>, a classic probability puzzle involves a 3-way duel (called a <a href="http://en.wikipedia.org/wiki/Truel">"truel"</a>).</div><blockquote class="tr_bq">A, B and C are to fight a three-cornered pistol duel. All know that A's chance of hitting his target is 0.3, C's is 0.5, and B never misses. They are to fire at their choice of target in succession in the order A, B, C, cyclically (but a hit man loses further turns and is no longer shot at) until only one man is left unit. What should A's strategy be?</blockquote><div>There's a subtle issue involved in these types of problems in that we don't know how each participant values each outcome. If we allow duelists to deliberately miss there are \(2^3-1=7\) possible outcomes; each person may or may not be shot and at least one person will not be shot. Even if deliberate missing isn't allowed, there are still 3 possible outcomes. A, for example, could conceivably value B winning more than C winning.<br /><br />The classic solution concludes that with the given hit probabilities the optimal strategy for the first trueler is to deliberately miss. My contention is that this is an incomplete solution; for some sets of utility values this may not be the first trueler's optimal strategy. Let's examine duels using utility values as the first step towards addressing truels.<br /><br /></div><div>Let the two duelers have hit probabilities \( p_i \) for \( i=1,2\), and also let the values each assigns to a dueling win, loss or tie be \( W_i, L_i, T_i\). I'm assuming these values are all known to both duelers and that \( W_i > T_i > L_i\). Note that a strategy that optimizes expected utility is preserved under a <a href="http://www.econ.ucsb.edu/~tedb/Courses/Ec100C/VarianExpectedUtility.pdf%E2%80%8E" target="_blank">positive affine transformation</a>, so under the assumption that all values are finite, we may assume \(W_i=1\) and \(L_i=0\) for all \(i\).<br /><br />We'll declare the duel a "gentleman's draw" if both duelers deliberately miss in a single round. Let the expected utility value for dueler \(i\) be \(U_i\).<br /><br />Trivially, if the optimal strategy for dueler 1 is trying for a hit, the optimal strategy for dueler 2 must be to also try for a hit. Conversely, the optimal strategy for dueler 1 can be to deliberately miss if and only if the optimal strategy for dueler 2 is also to deliberately miss. Assume the draw is taken if there's indifference (they are, after all, gentlemen).<br /><br />If dueler 1 deliberately misses, dueler 2 has two choices. If he deliberately misses, it's a gentleman's draw and \(U_2 = T_2\); if he tries for a hit, both duelers will subsequently try to hit each other and \(U_2 = p_2 + q_2 q_1 U_2 \). Solving, we get \[U_2 = \frac{p_2}{1-q_1 q_2}.\] It's therefore optimal for dueler 2 to take the gentleman's draw if and only if \[T_2 \geq \frac{p_2}{1-q_1 q_2}.\] As a consequence, dueler 1 will not deliberately miss if \[T_2 < \frac{p_2}{1-q_1 q_2}.\] If dueler 1 tries for a hit, both will subsequently try to hit each other and we have \( U_1 = p_1 + q_1 q_2 U_1\). Solving, we get \[ U_1 = \frac{p_1}{1-q_1 q_2}.\] Thus, the optimal strategy for dueler 1 is to deliberately miss if and only if \[ T_1 \geq \frac{p_1}{1-q_1 q_2}\] and \[T_2 \geq \frac{p_2}{1-q_1 q_2}.\] These are, as expected, precisely the conditions under which dueler 2 will deliberately miss.<br /><br />For example, if \(p_1=p_2=1/2\), it'll be a gentleman's draw if and only if \(T_1,T_2 \geq 2/3\). Paradoxically, both will fire under these hit probabilities if \( T_1 = 4/5 \) and \( T_2 = 1/2 \), even though this results in lower expected utility for both duelers than if they had agreed to a draw. This is a type of <a href="http://en.wikipedia.org/wiki/Prisoner's_dilemma" target="_blank">prisoner's dilemma</a>.</div>http://angrystatistician.blogspot.com/2013/09/the-good-bad-and-weird-duels-and.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-5812573797954626341Sun, 11 Aug 2013 09:10:00 +00002013-08-11T02:10:20.672-07:00A Man, a Plan, a Cat, a Canal - Panama!A man, a plan, a cat, a master, a minae, a lunula, a tora, a vela, a cola, a belga, a tola, an uta, a dona, a tiara, an aura, a kata, a mana, a nipa, a cabana, a law, a harem, a taces, a liard, a gnat, a mut, a dawed, a bura, a nim, a levator, an eh, a trams, a sall, a vacates, an up, a garb, a keets, a serin, a sored, a patamar, a gater, a lees, a strep, a lama, a pat, a lame, a haded, a wag, a naan, a kalian, a fer, a yaws, a remit, a keep, a loom, a neep, a ha, an ai, a retailor, a cabals, a subah, a slits, a pated, a callet, a xu, a faena, a tups, a sati, a rapider, a reps, a rapid, a korat, a dens, a sled, a tical, a pipages, a rats, a seeks, a defi, a wasts, a peracid, a nomas, a rosarian, a bran, a nonet, a gama, a sac, a scrota, a lab, a cain, an aid, a statin, a reel, a tiff, a haj, a talas, a nares, a mares, a laics, an allot, a til, an arena, a nits, a palace, a fanum, a yarer, a daub, a basts, a masas, a called, a slay, a galoots, a senega, a ton, a span, a mum, a dracaenas, a coda, a must, a satis, a lion, a yob, a reknit, a dessert, a lex, an aver, a red, a fab, a katas, a deet, a diaper, a nos, a snips, a spot, a fed, a wap, a past, a galax, an apos, a zaps, a lat, a rai, a resale, a tallit, a boob, a hag, a fayed, a sports, a loof, a ragg, a sashay, an agar, a selah, a spit, a brad, a strap, a wolf, a reified, a navar, a cases, a bedaub, a tassels, a gaes, a runaway, an amahs, a burg, a sorter, a del, a basely, a kae, an evades, a lader, a waved, a spirts, a pas, a slam, a hasps, a nalas, a snog, a cedar, an ulama, a lasagnas, a leud, a poh, a larum, a si, a retained, a pay, a pit, a raps, a repots, a sell, a datums, a yard, a hey, a terces, a prod, a skips, a gip, a lin, a sib, a lac, a jaded, a later, a beef, a sallet, a panicum, a gnar, a pall, a gas, a rotator, a tamis, a gel, a snawed, a decaf, a pol, a jawans, a dual, a gateman, a tins, a ran, a riff, a za, a minim, a get, a bard, a spud, a sum, a stem, a rases, a tsar, a lobate, a hasp, a hwan, a lotahs, an agas, a pad, a reed, a slim, an attar, a steels, a medina, a gar, a tub, an asarum, a pows, an ava, a mib, a map, a stab, a rasp, a tala, a nadir, a seton, a stoped, a repoll, a waste, a hades, a ragi, a tsades, a bap, a rain, a ratels, a hasten, a gid, a snot, a basal, a vinas, an upas, a nonacid, a naval, a lahar, a sonar, a lovat, a wares, a bal, a ripsaw, a jatos, a sup, a sh, a haed, a hailed, a moor, a vaned, a delf, a revel, a tan, a meter, a sob, an evil, a deific, a peracids, an ass, a bait, a buran, a far, a lamas, a robalo, a gaum, a sha, an arak, a macer, a macs, a snub, a stirps, a deffer, a perp, a naw, a jar, a vain, a ban, a miladi, a rasps, a wallop, a rex, a waney, a rase, a caird, a patrons, a tun, a stob, a japans, a maws, an awa, a con, a spoons, a fir, a stroma, a mil, a keen, a lace, a canon, an allod, a spay, a ten, a kop, a yah, a regal, a modal, a dinar, a madame, a hallahs, a dor, a repel, a reflet, a noma, a gen, a fast, a waggon, a dewan, a recap, a wort, a pin, a sung, a bag, a sos, a tepal, an abash, an ami, a manias, a pees, a diols, a repp, a taxed, a deb, a spans, a reman, a reward, a poons, a smug, a now, a teff, a hakeem, a seem, a rami, a simaruba, a ness, a cadets, a teraohm, a knob, a sewan, a seven, an oot, a skeets, a demur, a lanai, an unassuaged, a stared, a seam, a meed, a naos, a bus, a wad, a xis, a hon, a dan, a snail, a kayak, a looter, a gob, a men, a sorb, an inner, an oxen, an emes, a seder, a reknits, a loid, a saw, a pint, a castes, a stop, a debut, a tows, a tog, a kips, a rep, a cask, a yam, a minareted, a reflow, a laced, a bun, a keels, an ort, an assent, a faun, a reign, a manat, a rakus, a lade, a dater, a minarets, a nay, a fanos, a matt, an arhat, a tram, a tel, a vat, a ba, a pal, a pager, a tarps, a plug, a nod, a rash, a rotas, a ramal, a butane, a galiot, a paws, a pals, a stressed, a derat, an um, a mac, a sri, a mas, a lagan, an ajar, a laid, a big, a lass, a vara, a mail, a sikas, a yawl, an add, a simas, a romano, a kalam, a tannin, a less, a tater, a carer, a pasts, a caseic, a papayas, a salal, a haffits, a snivel, an atlas, a bases, a macaber, a tress, an alerts, a water, an imaginer, a pec, a golf, a pupils, a gnaws, a mos, a la, a baklawa, a gan, a stums, a step, an ikat, a keel, a dot, a loll, a tares, a renamed, a nanas, a sematic, a tap, a zareeba, a mob, an ales, a lox, a tass, a matsah, a ports, a names, a batik, a secs, a fane, a haleru, a lasts, a fats, a valley, a spam, a spools, a saner, an aspis, a hun, a stops, a tots, a tils, a gamely, a put, a las, a vaw, a ranid, a ham, an ataps, a barca, a gig, a strow, a struts, a speed, a draw, an axon, a gaga, a birr, an agnail, a pow, a tuber, a teel, a rip, a tail, a tatar, a hay, a ray, a lall, a barege, a janitor, a caff, an auk, a hah, an areic, a raster, a casks, a basil, a bubal, a donas, a bananas, a lobar, a lades, a cadi, a caw, a tad, an ogam, a janes, a sirup, a cram, a lets, a passus, a nuts, a muton, a cyma, a mix, a madam, a let, a corban, a partan, a melanin, a top, a wot, a bad, a bows, a resod, a nae, a parol, a vastness, an anime, a hares, a rax, a lam, a tabbis, a bins, a liar, a straw, a pant, a cares, a papa, a res, a sass, a mattes, a nopal, a daff, a yapon, a toped, an ivy, a maven, a ganef, a sovran, a gorp, a divan, a dexes, a moot, a prat, a nips, a redips, a mood, a sabir, a zarebas, a shales, a nett, a papal, a japan, a swots, a seral, a sip, a tarots, a path, a baseness, a callan, a drib, a nairu, a salep, a last, a babuls, a kiths, an orc, a daps, a galoot, a stink, a sail, an ameer, a haslets, a pats, a medal, a ganof, a dim, an atoll, a hall, a feral, a robalos, a sin, a radioed, a cycases, a kuna, an as, an arepa, a ki, a haps, an asks, a mair, a saloops, a vug, a stang, a tophs, a dak, a yay, a serac, a vav, a lap, a gad, a laden, a manioc, a fin, a dub, a loons, a teemer, a sris, a soar, a matador, a saros, a vavassor, a kayos, a setal, a maid, a waw, a palet, a rayon, a solon, a dine, a rumaki, a hae, a gaff, a lipases, a ladder, a stub, an akees, a seller, a manats, a barytas, a lied, a nona, a mars, an ait, a suber, a cimex, a tau, a vail, a grana, a pan, a sums, a ramadas, a rood, a teels, a muts, a torr, a paeon, a sow, an alfa, a rob, a mures, a toper, a rerun, a mall, a canoe, an apart, a seer, a saps, a wait, a laxer, a cats, a manor, a baton, a tag, a strown, a fax, a wab, a jauk, a racial, a lev, a gasser, a careens, a swat, a rein, a wanigan, a tenner, a flora, a rolf, a rennet, an agin, a wanier, a taws, a sneer, a caress, a gavel, a laic, a raku, a jab, a wax, a fanworts, a gat, a not, a baron, a mast, a carex, a lati, a wasp, a sarees, a trap, an aeon, a call, a manurer, a repot, a serum, a bora, a flan, a wos, an oe, a parrot, a stum, a sleet, a door, a sad, a marasmus, a napa, an argali, a vau, a taxemic, a rebus, a ti, an asrama, an on, a deil, a satyr, a bast, an amarelles, a seek, an abuts, a redd, a lases, a pilaff, a gae, a haik, a muraenid, a nolos, an oy, a ratel, a paw, a wadi, a malates, a soy, a kaross, a vavasor, a sarod, a tamaraos, a sirs, a remeet, a snool, a bud, an if, a coin, a maned, a lad, a gap, a lav, a vac, a resay, a yak, a dashpot, a gnats, a guv, a spool, a sari, a masks, an asp, a haika, a per, an asana, an ukases, a cycadeoid, a ranis, a sol, a boral, a refall, a hallot, an amid, a fon, a gal, a demast, a pastels, a hareem, an alias, a knits, a tool, a gasp, a dacron, a shtik, a slub, a bats, a lapel, a saurian, a bird, an all, a cassenes, a baht, a pastor, a tapis, a lares, a stows, a nap, a jalap, a patten, a selahs, a saber, a zaribas, a doom, a spider, a spin, a tarp, a toom, a sexed, an avid, a prog, an arvos, a fen, a ganev, a mayvin, a depot, an op, a yaff, a dalapon, a sett, a massas, a sera, a pap, a ser, a catnap, a warts, a rail, a snib, a sibb, a tam, a lax, a raser, a haemin, an assents, a valor, a paean, a doser, a swob, a dab, a tow, a pot, an in, a leman, a trapan, a brocatel, a mad, a maxima, a myc, a notum, a stun, a suss, a pastel, a marc, a puris, a sen, a jam, a gonad, a taw, a caid, a cased, a lar, a bolas, an anabas, a nodal, a bubalis, a basks, a carets, a racier, a nah, a haku, a naff, a carotin, a jaeger, a ball, a lay, a rayah, a rat, a tali, a tapir, a leet, a rebut, a wop, a liang, an arriba, a gag, an ox, an award, a deeps, a sturts, a worts, a giga, a crab, a spat, an amah, a din, a raw, a vasal, a tup, a ylem, a gaslit, a stot, a spots, an uh, a sips, an arenas, a sloops, a maps, a yell, a vast, a fasts, a laurel, a haen, a fasces, a kit, a baseman, a strop, a hast, a mass, a taxol, a sel, an aboma, a beer, a zap, a tacit, a mesas, an anadem, an eraser, a tallol, a tod, a leek, a takin, a pets, a smuts, an aga, a walk, a baal, a som, a swang, a slipup, a flog, a cep, a renig, a minaret, a wastrel, an assert, a reb, a camases, a basalt, an alevins, a stiff, a halalas, a say, a papacies, a casts, a parer, a caret, a tassel, an inn, a tamal, a kaon, a moras, a misadd, an alway, a sakis, a li, a maar, a vassal, a gib, a dial, a raj, a nan, a galas, a mairs, a cam, a mun, a tared, a desserts, a slap, a swap, a toil, a gaen, a tubal, a maras, a torahs, a radon, a gulp, a sprat, a reg, a palapa, a bat, a valet, a mart, a tahr, an att, a mason, a fay, an aster, an imaret, a daedal, a suk, a ratan, a mangier, a nu, a fatness, a natron, a sleek, a nub, a decal, a wolfer, a deter, an imam, a yaks, a caper, a spik, a got, a swot, a tubed, a pots, a sets, a catnip, a was, a diol, a stinker, a redes, a semen, an exon, a rennin, a bros, an em, a bog, a retool, a kay, a kalians, an ad, a noh, a six, a daw, a sub, a so, an adeem, a maes, a derats, a degauss, an unai, an alarumed, a steeks, a toon, a neves, an awes, a bonk, a mho, a retasted, a cassena, a bur, a misaim, a ramees, a meek, a haffet, a won, a gums, a snoop, a drawer, a namer, a snaps, a bed, a dex, a tapper, a sloid, a seep, a sain, a maim, an ahs, a banal, a petasos, a gab, a gnus, a nip, a trow, a pacer, an awed, a nogg, a wats, a fanega, a mon, a telfer, a leper, a rod, a shall, a haem, a damar, a nidal, a domal, a gerah, a yapok, a net, a yaps, a doll, an anon, a caecal, an eek, a lima, a morts, a rif, a snoops, an oca, a wan, a swam, a snap, a jabots, a nut, a snort, a padri, a caesar, a yen, a waxer, a poll, a wasps, a raid, a liman, a bani, a var, a jawan, a prep, a reffed, a sprits, a buns, a scam, a rec, a makar, an aahs, a mu, a gaol, a boras, a malar, a fan, a rubati, a bass, an asdic, a repacified, a liven, a bos, a retem, an at, a lever, a fled, a den, a varoom, a deli, a hade, a hahs, a pus, a sot, a jaw, a spiral, a baser, a wat, a volar, an osar, a halal, a vanadic, a nonas, a punas, a nival, a sabatons, a dig, a nets, a haslet, a rani, a rap, a based, a staig, a rased, a haets, a walloper, a depots, a notes, a rid, an aal, a taps, a rabats, a pam, a bima, a van, a swop, a muras, an abut, a raga, an idem, a sleets, a rattan, a mils, a deer, a dap, a sag, an ash, a tolan, a whaps, a haet, a bolar, a stases, a ramets, a mus, a dups, a drab, a teg, a minima, a zaffir, an ar, a snit, a nametag, a laud, a snaw, a jalop, a faced, a dewans, a leg, a sim, a tarot, a toras, a gall, a parang, a mucin, a patellas, a feeb, a ret, a laded, a jacal, a bis, a nil, a pig, a spiks, a dorp, a secret, a yeh, a dray, a smut, a dalles, a stoper, a spar, a tip, a yap, a deni, a terais, a mural, a hop, a duel, a sang, a salaam, a lunar, a decagons, a sal, an asps, a hamals, a sap, a strips, a dev, a wared, a lased, a venae, a kayles, a baled, a retros, a grub, a shaman, a yaw, an urase, a gasless, a tabu, a debases, a caravan, a deifier, a flow, a parts, a darb, a tips, a hales, a rag, an ayahs, a saggar, a fool, a strops, a dey, a fag, a haboob, a till, a tael, a serai, a ratal, a spaz, a sop, an axal, a gats, a papaw, a def, a tops, a spins, a son, a repaid, a teed, a sat, a kab, a fader, a rev, an axel, a tressed, a tinker, a boy, a noil, a sit, a satsuma, a doc, a sane, a cardamum, a naps, a nota, a genes, a stool, a gayals, a dell, a casas, a masts, a babu, a darer, a yamun, a faecal, a pastina, an er, an alit, a toll, an asci, a laser, a maser, a nasal, a taj, a haffit, a leer, a nit, a tsadi, an ani, a cabala, a torcs, a casa, a mag, a tenon, an arb, an air, a soras, a monadic, a repasts, a waifed, a skees, a star, a seg, a pipal, a citadels, a sned, a tarok, a dip, a rasper, a redip, a raitas, a sputa, a ne, a faux, a tell, a cadet, a pastils, a habus, a slab, a caroli, a terai, an aah, a peen, a mool, a peek, a timer, a sway, a ref, a nail, a kana, a nag, a waded, a haemal, a tapa, a malaperts, a seel, a retag, a ram, a tapaderos, an ires, a steek, a brag, a pun, a set, a cavallas, a smart, a hen, a rot, a velamina, a rub, a dew, a datum, a tang, a drail, a sec, a tamer, a hawala, an abaca, a pina, an ama, a taka, a ruana, a raita, an oda, a tuna, a lota, a gleba, a loca, a leva, a rota, a lunulae, an imarets, a mat, a canal - Panama!http://angrystatistician.blogspot.com/2013/08/a-man-plan-cat-canal-panama.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-3974033514939034397Sun, 11 Aug 2013 06:03:00 +00002013-08-11T15:34:21.222-07:00The Good, the Bad and the Weird: Duels, Truels and Utility FunctionsIn the excellent (and highly recommended) book <a href="http://www.amazon.com/Challenging-Problems-Probability-Solutions-Mathematics/dp/0486653552" target="_blank">"Fifty Challenging Problems in Probability with Solution"</a>, Frederick Mosteller poses "The Three-Cornered Duel":<br /><blockquote class="tr_bq">A, B and C are to fight a three-cornered pistol duel. All know that A's chance of hitting his target is 0.3, C's is 0.5, and B never misses. They are to fire at their choice of target in succession in the order A, B, C, cyclically (but a hit man loses further turns and is no longer shot at) until only one man is left unit. What should A's strategy be?</blockquote>This is problem 20 in Mosteller's book, and it also appears (with an almost identical solution) in Larsen & Marx <a href="http://www.amazon.com/Introduction-Mathematical-Statistics-Its-Applications/dp/0321693949" target="_blank">"An Introduction to Probability and Its Applications"</a>.<br /><br />Mosteller's solution:<br /><div><blockquote class="tr_bq">A is naturally not feeling cheery about this enterprise. Having the first shot he sees that, if he hits C, B will then surely hit him, and so he is not going to shoot at C. If he shoots at B and misses him, then B clearly shoots the more dangerous C first, and A gets one shot at B with probability 0.3 of succeeding. If he misses this time, the less said the better. On the other hand, suppose A hits B. Then C and A shoot alternately until one hits. A's chance of winning is \( (.5)(.3) + (.5)^2(.7)(.3) + (.5)^3(.7)^2(.3) + \ldots\) . Each term corresponds to a sequence of misses by both C and A ending with a final hit by A. Summing the geometric series we get ... \(3/13 < 3/10\). Thus hitting B and finishing off with C has less probability of winning for A than just missing the first shot. So A fires his first shot into the ground and then tries to hit B with his next shot. C is out of luck.</blockquote>Is this right? What if B were to follow A's example and fire into the ground? What if all three were to keep firing into the ground? That this type of an outcome isn't unreasonable for certain sets of shot accuracy probabilities can be illustrated by considering the case where A's accuracy is 0.98, B's accuracy is 1.0 and C's accuracy is 0.99. Mosteller's argument is equally applicable in this case, but if B shoots C after A deliberately misses he'll be shot by A with probability 0.98. Is that reasonable?<br /><br />Under the assumption that deliberately missing is allowed, there are \(2^3-1=7\) possible outcomes - each participant can be shot or not shot, and there must be at least one participant not shot. The lack of clarity for what the ideal strategies are for A, B and C in the general case arises from the utility of 2- or 3-way ties to each of the participants being undefined.<br /><br />In the next article I'll analyze 2-way duels where deliberate missing is allowed by using such fully-defined utility functions. These results will be used in a third article on 3-way duels (truels); in particular, I'll re-examine Mosteller's solution.</div>http://angrystatistician.blogspot.com/2013/08/the-good-bad-and-weird-duels-truels-and.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-2191741530849988465Fri, 09 Aug 2013 08:23:00 +00002013-08-10T01:21:01.004-07:00Solving a Bar Bet Using at Most Three Different OperationsA bar bet as presented in the YouTube video <a href="http://www.youtube.com/watch?v=DoRB7FL02t4" target="_blank">The HARDEST Puzzle Yet!</a> involves starting with three of the same number from 0 through 9, then adding mathematical operations that result in an evaluation of 6. For example, if we start with three of the number 6, one solution could be \[ 6+6-6=6 .\] I'll demonstrate a method for solving this bar bet puzzle starting with three of any number, say \( N \), which involves using at most three different mathematical operations (although some of these may be used many, many times).<br /><br />If \( 0 \leq N \leq 2 \) we have the solutions<br />\begin{eqnarray}<br />(0! + 0! + 0!)! &= 6 \\<br />(1! + 1! + 1!)! &= 6 \\<br />2+2+2 &= 6.<br />\end{eqnarray} If \( N\geq 3\), concatenate the three numbers. Repeatedly applying the square-root operation we'll eventually end up with a result \( x \) with \( 3 \leq x < 9\). If we now take the greatest integer \(\lfloor x \rfloor\) we have an integer \( n \) with \( 3 \leq n \leq 8 \). If we can exhibit solutions for each of these cases that use only square-roots, greatest integers and one other operation, we'll be done. Using factorial for the third operation, some possibilities are<br />\begin{eqnarray}<br />\href{http://www.wolframalpha.com/input/?i=3%21}{3!} &= 6\\<br />\href{http://www.wolframalpha.com/input/?i=%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%284%21%29%21%7D%7D%7D%7D%7D+%5Cright%5Crfloor%21%7D%7D+%5Cright%5Crfloor+%21}{\left\lfloor \sqrt{\sqrt{\left\lfloor \sqrt{\sqrt{\sqrt{\sqrt{\sqrt{(4!)!}}}}} \right\rfloor!}} \right\rfloor !} &= 6\\<br />\href{http://www.wolframalpha.com/input/?i=%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B5%21%7D%7D+%5Cright%5Crfloor+%21}{\left\lfloor \sqrt{\sqrt{5!}} \right\rfloor !} &= 6\\<br />\href{http://www.wolframalpha.com/input/?i=6}{6} &= 6 \\<br />\href{http://www.wolframalpha.com/input/?i=%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B7%21%7D%7D%5Cright%5Crfloor+%21%7D%7D%7D+%5Cright%5Crfloor+%21}{\left\lfloor \sqrt{\sqrt{\sqrt{\left\lfloor \sqrt{\sqrt{7!}}\right\rfloor !}}} \right\rfloor !} &= 6\\<br />\href{http://www.wolframalpha.com/input/?i=%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B8%21%7D%7D%7D+%5Cright%5Crfloor+%21}{\left\lfloor \sqrt{\sqrt{\sqrt{8!}}} \right\rfloor !} &= 6.<br />\end{eqnarray}<br />I've added Wolfram Alpha links so you can verify that these do indeed evaluate to 6.<br /><br />As an illustrative example, when \(N=1337\) we have the solution \[<br />\href{http://www.wolframalpha.com/input/?i=%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%28++%5Cleft%5Clfloor+%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B%5Csqrt%7B133713371337%7D%7D%7D%7D+%5Cright%5Crfloor+%21%29%21%7D%7D%7D%7D%7D+%5Cright%5Crfloor%21%7D%7D+%5Cright%5Crfloor+%21}{\left\lfloor \sqrt{\sqrt{\left\lfloor \sqrt{\sqrt{\sqrt{\sqrt{\sqrt{\left( \left\lfloor \sqrt{\sqrt{\sqrt{\sqrt{133713371337}}}} \right\rfloor !\right)!}}}}} \right\rfloor!}} \right\rfloor !} = 6.\]http://angrystatistician.blogspot.com/2013/08/solving-bar-bet-using-at-most-three.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-848226778682792744Sat, 27 Jul 2013 20:25:00 +00002014-10-09T00:40:45.830-07:00Solving Sir Arthur Eddington's Zoo Puzzle using Dirac Matrices<a href="http://en.wikipedia.org/wiki/Arthur_Eddington" target="_blank">Sir Arthur Eddington</a> posed this difficult logic puzzle to the readers of Caliban's (<a href="http://en.wikipedia.org/wiki/Hubert_Phillips" target="_blank">Hubert Phillips</a>) newspaper puzzle columns, and famously (or infamously) provided a solution using <a href="http://mathworld.wolfram.com/DiracMatrices.html" target="_blank">Dirac matrices</a>.<br /><br /><b>Sir Eddington's zoo puzzle: </b><br /><div><br /></div>I took some nephews and nieces to the Zoo, and we halted at a cage marked<br /><ol><li>Tovus Slithius, male and female. </li><li>Beregovus Mimsius, male and female. </li><li>Rathus Momus, male and female. </li><li>Jabberwockius Vulgaris, male and female. </li></ol>The eight animals were asleep in a row, and the children began to guess which was which. "That one at the end is Mr Tove." "No, no! It's Mrs Jabberwock," and so on. I suggested that they should each write down the names in order from left to right, and offered a prize to the one who got most names right.<br /><br />As the four species were easily distinguished, no mistake would arise in pairing the animals; naturally a child who identified one animal as Mr Tove identified the other animal of the same species as Mrs Tove.<br /><br />The keeper, who consented to judge the lists, scrutinised them carefully. "Here's a queer thing. I take two of the lists, say, John's and Mary's. The animal which John supposes to be the animal which Mary supposes to be Mr Tove is the animal which Mary supposes to be the animal which John supposes to be Mrs Tove. It is just the same for every pair of lists, and for all four species.<br /><br />"Curiouser and curiouser! Each boy supposes Mr Tove to be the animal which he supposes to be Mr Tove; but each girl supposes Mr Tove to be the animal which she supposes to be Mrs Tove. And similarly for the other animals. I mean, for instance, that the animal Mary calls Mr Tove is really Mrs Rathe, but the animal she calls Mrs Rathe is really Mrs Tove."<br /><br />"It seems a little involved," I said, "but I suppose it is a remarkable coincidence."<br /><br />"Very remarkable," replied Mr Dodgson (whom I had supposed to be the keeper) "and it could not have happened if you had brought any more children."<br /><br />How many nephews and nieces were there? Was the winner a boy or a girl? And how many names did the winner get right?<br /><div><br /><b>Solution:</b><br /><br /> Note that every child permutes the species, and we may indicate these as \(4\times 4\) permutation matrices, where the row number indicates the actual species numbers and the column number indicates the species number guessed by the child. We can use a little trick to include gender. Consider the species permutation matrix for a given child, but use a -1 in the entry if the child additionally switches gender, too.<br /><br />Here's an illustrative example. Assume a child transposes species 1 (Tovus Slithius) and species 3 (Rathus Momus), but doesn't switch genders; and transposes species 2 (Beregovus Mimsius) and species 4 (Jabberwockius Vulgaris), but in this case also switches genders. We'd represent this child's guesses with the <a href="http://en.wikipedia.org/wiki/Generalized_permutation_matrix#Signed_permutation_group" target="_blank">signed permutation matrix</a> \[ A = \left( \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \\ 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \end{array} \right).\] Let \( A_i \) be the guess matrix for child \(i\). The keeper's first observation implies that these matrices satisfy \( A_i A_j = - A_j A_i \) for all \( i\neq j \). If the child is a boy, the keeper's second observation implies that \( A_i A_i = I\), where \( I\) is the \( 4\times 4\) identity matrix; if the child is a girl, the keeper's second observation implies that \( A_i A_i = -I\). Also note that this observation implies children either get species right, or transpose two species. Thus, the species permutations can be written as the product of disjoint transpositions.<br /><br />From the second observation, any child that gets the species right for an animal must be a boy. Furthermore, because the species guesses are the product of disjoint transpositions, every child must get an even number of species correct.<br /><br />If a boy had gotten all the species and genders right, his matrix would commute with the matrix of any other child, but we know there is at least one boy and at least one girl. Thus no child could have gotten all of the animals correct. Furthermore, if any boy gets a species right, no other child could also get that species right, otherwise this would violate the keeper's first observation.<br /><br />We could continue along these lines and work out the answer, but let's use Eddington's sledgehammer - the keeper's observations imply that these matrices will form a set of anticommuting <a href="http://mathworld.wolfram.com/DiracMatrices.html" target="_blank">Dirac matrices</a> if we multiply the girl's matrices by the imaginary unit \( i\). Such sets have size at most five; sets of size five have three boys and two girls, and the unique winner is a boy with four animals correct.<br /><br />Here's one possible set. Note that I've multiplied two of the matrices given on the Wolfram site by the imaginary unit \( i \) to convert them to signed permutation matrices. As I've mentioned, this doesn't affect the anticommutativity, but it does change the square of the matrix from \( I\) to \( -I\); thus, Dirac matrices with imaginary entries correspond to girls and Dirac matrices with real entries correspond to boys. \begin{eqnarray}<br />A_1 &= \left( \begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array} \right)\\<br />A_2 &= \left( \begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & -1 & 0 \\ 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 0<br />\end{array} \right)\\<br />A_3 &= \left( \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & -1 \\ 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0<br />\end{array} \right)\\<br />A_4 &= \left( \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & -1<br />\end{array} \right)\\<br />A_5 &= \left( \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ -1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0<br />\end{array} \right)\\<br /> \end{eqnarray} The boys are \( A_1, A_3, A_4\) and the girls are \( A_2, A_5\). The unique winner was \(A_4\) with four animals correct (two correct species with correct gender, so four animals).</div>http://angrystatistician.blogspot.com/2013/07/solving-sir-arthur-eddingtons-zoo.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-4237471736186941582Fri, 26 Jul 2013 00:09:00 +00002013-07-25T19:28:23.450-07:00Solving the TopSpin Puzzle using GAPTopSpin is an oval-track permutation puzzle that was made by Binary Arts; similar puzzles are made and sold by other manufacturers. Here's the <a href="http://www.amazon.com/Binary-Arts-Top-Spin-Top-Spin/dp/B001SDX244/" target="_blank">Binary Arts TopSpin</a>.<br /><br />It's not a difficult puzzle to solve if you play around with it for a few hours and figure out how to generate various permutations. It's more interesting (and difficult) if you observe that the turntable has a distinguishable top and bottom. This suggests an interesting question - can you invert the turntable while keeping the numbers in the track in the same order?<br /><br />The answer is, perhaps surprisingly, yes. Here's one way to find a sequence of operations that produces precisely this outcome.<br /><br /><a href="http://en.wikipedia.org/wiki/GAP_(computer_algebra_system)" target="_blank">GAP (Groups, Algorithms and Programming)</a> is a freely available programming language that specializes in computational group theory, and it's perfect for solving permutation puzzles. Here's my GAP code for TopSpin. Label the top of the turntable with 21 and the bottom with 22. Flipping the turntable generates the permutation \[(1,4)(2,3)(21,22);\] rotating the oval to the left generates the permutation \[ (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1). \] Denoting these two permutations as \(x\) and \(y\), we can now simply ask GAP to find a sequence of operations on the free group generated by \(x,y\) that results in the pre-images \((1,2)\), flipping two adjacent numbers on the track, leaving everything else the same (including turntable parity); and \((21,22)\), flipping turntable parity, leaving everything else the same. This corresponds to the operation of flipping the turntable while keeping the order of the numbers in the track the same.<br /><br /><script src="https://gist.github.com/octonion/6084770.js"></script><br />Running the code, we get these lovely results:<br /><blockquote class="tr_bq">(1,2) = y*x^-1*y^-1*x^-1*y*x^-1*y*x^-1*y^-1*x^-1*y^-1*x^-1*y^2*x^-1*y^-1*x^-1*y^-1*x^-1*y^4*x^-1*y^-1*x^-1*y^2*x^-1*y^-2*x^-1*y*x^-1*y^2*x^-1*y^-3*x^-1*y^-3*x*y^-4*x*y*x*y^-1*x*y^4*x^-1*y^-5*x^-1*y^-1*x^-1*y^5*x^-1*y^-6*x^-1*y^2*x^-1*y^-1*x^-1*y^5*x*y*x*y*x*y*x*y*x*y^5*x*y*x*y^-1*x*y^-5*x^-1*y^-1*x^-1*y^-1*x^-1*y^-2*x*y*x*y^-1*x*y^4*x*y^-1*x*y^3*x^-1*y^-1*x^-1*y^6*x^-1*y^-1*x^-2*y^-4*x^-3*y^-3*x^-2*y^-1*x^-1*y*x^-1*y^2*x^-1*y^-2*x^-1*y^-1*x^-1*y*x*y*x^-1*y*x^-1*y^-1*x^-1*y^-2 </blockquote><blockquote class="tr_bq">(21,22) = y*x^-1*y^2*x*y^-1*x*y*x*y^-1*x*y^-2*x*y^2*x*y^-1*x*y*x*y*x*y^-2*x^-1*y*x^-1*y^-2*x^-1</blockquote>Since there are sequences of operations that allow us to flip any two adjacent numbers or the parity of the turntable, it follows that all possible configurations are both solvable and achievable.<br />http://angrystatistician.blogspot.com/2013/07/solving-topspin-puzzle-using-gap.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-5335926023005868274Wed, 10 Jul 2013 07:13:00 +00002013-07-10T00:16:09.399-07:00A Slightly Less Pointless Solution to Le Monde puzzle #824Here's <b><a href="http://xianblog.wordpress.com/2013/06/13/le-monde-puzzle-824/" target="_blank">Le Monde puzzle #824</a>:</b><br /><br />Show that, for any integer \(y\), \[(\sqrt{3}-1)^{2y}+(\sqrt{3}+1)^{2y}\] is an integer multiple of a power of two.<br /><br /><b>Solution:</b><br /><b><br /></b>Consider \[f(n) = (-1+\sqrt{3})^{n}+(-1-\sqrt{3})^{n}\] and observe that the two bases are the roots of the quadratic \(x^2 + 2x - 2\), hence \( f(n) \) obeys the recursion \( x_{n+2} = -2 x_{n+1} + 2 x_n \) with \( x_0=2\) and \(x_1=-2 \). It follows that \( f(n) \) is always an even integer.http://angrystatistician.blogspot.com/2013/07/a-slightly-less-pointless-solution-to.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-2326554146127308818Sat, 29 Jun 2013 22:05:00 +00002013-06-29T15:05:59.179-07:00Asteroids Algebra<br /><b>Problem:</b><br /><br />Waldo is playing Asteroids. This game starts with 3 ships and you earn an extra ship for every 10000 points you score. At the end of his first game, Waldo noticed that he had the lowest score possible while averaging exactly 9000 points per ship. At the end of his second game, Waldo noticed that he had the highest score possible while averaging exactly 9000 points per ship. What were his two scores?<br /><br /><b>Solution:</b><br /><br />Let the number of starting ships be \(S\), points for an extra ship be \(X\) and the player's average scoring rate be \(A\). Furthermore, let \(R = A/X\) and the number of bonus ships earned by the player during the game be \(B\). The number of bonus ships earned by the player during his game is the floor of his final score over the bonus value \(X\). Algebraically, we have \[ B = \lfloor (S+B)\cdot R \rfloor.\] Now let \[ (S+B)\cdot R = \lfloor (S+B)\cdot R \rfloor + f,\] where \( 0 \leq f < 1 \). This gives us<br />\begin{aligned}<br />(S+B)\cdot R &= B + f ,\\<br />SR + B(R-1) &= f,\\<br />B &= (SR-f)/(1-R).<br />\end{aligned}<br />Together with \(0 \leq f < 1\) we get that \[\frac{SR-1}{1-R} < B \leq \frac{SR}{1-R}.\] Thus, the smallest possible number of bonus ships is \[ \left\lfloor \frac{SR-1}{1-R} + 1 \right\rfloor = \left\lfloor \frac{SR-R}{1-R}\right\rfloor \] and the greatest possible number of bonus ships is \[ \left\lfloor \frac{SR}{1-R}\right\rfloor. \] It follows that the lowest possible score while averaging \(A\) is \[ A\cdot \left\lfloor \frac{SR-R}{1-R} + S \right\rfloor = A\cdot \left\lfloor \frac{S-R}{1-R}\right\rfloor \] and the highest possible score while averaging \(A\) is \[ A\cdot \left\lfloor \frac{SR}{1-R} + S\right\rfloor = A\cdot \left\lfloor \frac{S}{1-R}\right\rfloor. \]<br />For Waldo, \(S=3\), \(A=9000\) and \(R=9000/10000 = 0.9\). Waldo therefore scored<br />\[ 9000\cdot \left\lfloor \frac{3-0.9}{1-0.9} \right\rfloor = 189000\] points in his first game, \[ 9000\cdot \left\lfloor \frac{3}{1-0.9} \right\rfloor = 270000\] points in his second game.<br /><br />http://angrystatistician.blogspot.com/2013/06/asteroids-algebra.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-5111765361407049258Wed, 26 Jun 2013 06:31:00 +00002013-06-25T23:35:16.273-07:00Interview Doubler<br /><b>Problem:</b><br /><br />Find the smallest positive number that doubles when you move the last digit to the front.<br /><br /><b>Solution:</b><br /><br />The answer is \( 105263157894736842 \), and the solution is similar to <a href="http://angrystatistician.blogspot.com/2013/06/twisty-temperature.html" target="_blank">Twisty Temperature</a>. Let this number be \[ x = x_{n}\cdot 10^{n-1} + ... + x_{1}\cdot 10^1 + x_{0} \] with \( 0 < x_{n} < 10 \), then after moving the last digit to the front we get \[ y = x_{0}\cdot 10^{n-1} + (x-x_{0})/10. \] We also have that \( y = 2 x \), so our equation becomes \[ 19\cdot x = x_0 \cdot (10^{n}-1). \] Now 10 is a <a href="http://en.wikipedia.org/wiki/Primitive_root_modulo_n" target="_blank">primitive root</a> modulo 19, and so it follows that \( x \) is integral if and only if \( n \) is of the form \( 18\cdot m \). Also note that we need \( 2 \le x_0 \le 9 \) since we require \( 0< x_{n} < 10\). When \( m=1 \) and \( x_0 = 2\) we get \( x =105263157894736842 \); when \( m=2\) and \(x_0 = 2\) we get \[ 2 (10^{36}-1)/19 = 105263157894736842105263157894736842.\] That this number indeed doubles when you move the last digit to the front can be verified by <a href="http://www.wolframalpha.com/input/?i=105263157894736842105263157894736842*2" target="_blank">WolframAlpha</a>.http://angrystatistician.blogspot.com/2013/06/interview-doubler.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-9183522418970664436Tue, 25 Jun 2013 06:51:00 +00002013-06-24T23:51:11.169-07:00Twisty Temperature<br /><b>Problem:</b><br /><br />When Waldo recently did a conversion of a positive integral Celsius temperature \( c = 275 \) to its Fahrenheit equivalent \( f \) (which turned out to be \( 527\) ), he noticed to his amazement that he could have simply moved the last digit of \( c \) to the front to obtain \( f \). Doing some intense calculations he failed to discover the next largest such example. Does one exist, and if so, what is it?<br /><br /><b>Solution:</b><br /><br />Let \( c = x_{n}\cdot 10^{n-1} + ... + x_{1}\cdot 10^1 + x_{0} \) with \( x_{n} > 0 \), then \[ f = x_{0}\cdot 10^{n-1} + (c-x_{0})/10. \] We also have that \[ f = (9/5)\cdot c + 32. \] Notice that in order for \( f \) to be integral \( c \) must be divisible by 5; this implies that \( x_0=5 \) since it cannot equal 0 (since as a number \( f>c \)). Our equation then becomes \[ (9/5)\cdot c + 32 = 5\cdot 10^{n-1} + (c-5)/10 \] implies \[ c = 5\cdot (10^n - 65)/17. \] Now it turns out that 10 is a <a href="http://en.wikipedia.org/wiki/Primitive_root_modulo_n" target="_blank"><b>primitive root</b></a> modulo 17, and so it follows that \( c \) is integral if and only if \( n \) is of the form \( 16\cdot m + 3 \). When \( m=0\) we get \( c=275\); when \( m=1\) we get the next highest such temperature, which is \[ 5\cdot (10^{19}-65)/17 = 2941176470588235275.\]<br />http://angrystatistician.blogspot.com/2013/06/twisty-temperature.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-7078318996494822863Wed, 10 Apr 2013 20:31:00 +00002013-04-10T13:48:24.313-07:00Lunchtime Sports Science: Introducing tanh5As I mentioned in a <a href="http://angrystatistician.blogspot.com/2013/03/baseball-chess-psychology-and.html">previous article on ratings systems</a>, the <a href="http://en.wikipedia.org/wiki/Log5"><b>log5</b></a> estimate for participant 1 beating participant 2 given respective success probabilities \( p_1, p_2 \) is<br /><div>\begin{align}<br />p &= \frac{p_1 q_2}{p_1 q_2+q_1 p_2}\\<br />&= \frac{p_1/q_1}{p_1/q_1+p_2/q_2}\\<br />\frac{p}{q} &= \frac{p_1}{q_1} \cdot \frac{q_2}{p_2}<br />\end{align} where \( q_1=1-p_1, q_2=1-p_2, q=1-p \).<br /><br /><div>Where does this come from? Assume that both participants each played average opposition. In a <a href="http://en.wikipedia.org/wiki/Pairwise_comparison"><b>Bradley-Terry</b></a> setting, this means</div>\begin{align}<br />p_1 &= \frac{R_1}{R_1 + 1}\\<br />p_2 &= \frac{R_2}{R_2 + 1},<br />\end{align} where \( R_1 \) and \( R_2 \) are the (latent) Bradley-Terry ratings; the \( 1 \) in the denominators is an estimate for the average rating of the participants they've played en route to achieving their respective success probabilities.</div><div><div><br /></div><div>In a Bradley-Terry setting, it's true that the product of the ratings in the entire pool is taken to equal 1. But participants don't play themselves! Thus, if participant 1 played every participant but itself, the average opponent would have rating \( R \), where \( R_1 \cdot R^{n-1} = 1 \). Here \( n \) is the number of participants in the pool.<br /><br />Our strength estimate for the average opponent faced is then</div><div>\begin{align}<br />R &= {R_1}^{-\frac{1}{n-1}}.<br />\end{align} </div><div>There are two extreme cases. If \( n=2 \), then \( R = \frac{1}{R_1} \); as \( n \to +\infty \), \( R \to 1 \).</div><div><br />The limit extreme case is <b>log5</b>; the \(n=2\) extreme case I call <b>tanh5</b>. We compute</div><div>\begin{align}<br />p_1 &= \frac{R_1}{R_1 + 1/R_1}\\<br />p_2 &= \frac{R_2}{R_2 + 1/R_2}\\<br />\textrm{tanh5} = p &= \frac{ \sqrt{p_1 q_2} }{ \sqrt{p_1 q_2} + \sqrt{q_1 p_2} }.<br />\end{align} </div><div>Why <b>tanh5</b>? We can think of <b>log5</b> as derived from the <a href="http://en.wikipedia.org/wiki/Logistic_function" target="_blank"><b>logistic function</b></a> by setting \( \log(R)=0 \) for the opponent's rating; analogously, <b>tanh5</b> is derived from the <a href="http://en.wikipedia.org/wiki/Hyperbolic_function" target="_blank"><b>hyperbolic tangent function</b></a> by setting \( \log(R)=0 \) for the opponent's rating.</div><div><br /></div><div>Note that we have a spectrum of estimates corresponding to each value for \(n\), but these are the two extremes. This also gives a new spectrum of <a href="http://en.wikipedia.org/wiki/Activation_function">activation functions</a> for <a href="http://en.wikipedia.org/wiki/Neural_network">neural networks</a>, but I'll explore this application later.</div></div>http://angrystatistician.blogspot.com/2013/04/lunchtime-sports-science-introducing.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-7002778674416622507Wed, 03 Apr 2013 22:06:00 +00002013-04-03T15:18:27.757-07:00Lunchtime Sports Science: Fitting a Bradley-Terry ModelPower rankings are game rankings that also allow you to estimate the likely outcome if two opponents were to face each other. One of the simplest of these models is known as the <a href="http://en.wikipedia.org/wiki/Pairwise_comparison" target="_blank"><b>Bradley-Terry-Luce</b></a> model (or commonly, Bradley-Terry). The idea is that each player \( i \) is assumed to have an unknown rating \( R_i \). If players \( i \) and \( j \) compete, the probability that \( i \) wins under this model is expected to be about \[ \frac{R_i}{R_i + R_j}. \] This model is very popular for hockey and other games; one commonly seen version is called <a href="http://www.collegehockeynews.com/info/?d=krach" target="_blank"><b>KRACH</b></a>.<br /><br />Let's fit a Bradley-Terry model to the current season of NCAA D1 men's hockey. The <a href="http://www.ncaa.com/sports/icehockey-men/d1" target="_blank"><b>Frozen Four</b></a> starts on Thursday, April 11, so you'll get to see how well your predictions do.<br /><br />You'll need to have R installed. Once R is installed, install the "BradleyTerry2" package that's freely available for R (thanks to Heather Turner and David Firth). To do this, start R and run the following command; you'll have to pick a source.<br /><pre class="code_r">install.packages("BradleyTerry2")</pre>Next, download two files from my hockey GitHub - R code that fits a basic Bradley-Terry model and a data file containing the NCAA D1 men's hockey game results going back to 1998.<br /><br /><a href="https://github.com/octonion/hockey/blob/master/lunchtime/uscho_btl.R">https://github.com/octonion/hockey/blob/master/lunchtime/uscho_btl.R</a><br /><a href="https://github.com/octonion/hockey/blob/master/lunchtime/uscho_games.csv">https://github.com/octonion/hockey/blob/master/lunchtime/uscho_games.csv</a><br /><br />Make sure both files are in the same directory and run the R code. That's it, you've built a power ranking using a Bradley-Terry model. You should get output that looks like this:<br /><br /><span style="font-family: Courier New, Courier, monospace;"> ability s.e.</span><br /><span style="font-family: Courier New, Courier, monospace;">Quinnipiac 1.687042594 0.5678939</span><br /><span style="font-family: Courier New, Courier, monospace;">Massachusetts-Lowell 1.480098569 0.5872701</span><br /><span style="font-family: Courier New, Courier, monospace;">Minnesota 1.428503522 0.5638946</span><br /><span style="font-family: Courier New, Courier, monospace;">Yale 1.115338226 0.5641414</span><br /><span style="font-family: Courier New, Courier, monospace;">Miami 1.114307264 0.5479346</span><br /><span style="font-family: 'Courier New', Courier, monospace;">Notre Dame 1.109912670 0.5523657</span><br /><span style="font-family: 'Courier New', Courier, monospace;">Boston College 1.091836391 0.5815233</span><br /><span style="font-family: 'Courier New', Courier, monospace;">St. Cloud State 1.079314965 0.5573018</span><br /><span style="font-family: Courier New, Courier, monospace;"><br /></span>The order should be the same as <a href="http://www.uscho.com/rankings/krach/d-i-men/" target="_blank"><b>USCHO's KRACH rankings</b></a>.<br /><span style="font-family: inherit;"><br /></span><span style="font-family: inherit;">How do we use these ability estimates to predict game outcomes? These values are the logarithms of the ratings I've mentioned above, so first apply the exponential to get the rating, then the estimated winning probability is the team's rating divided by the sum of the team and opponent ratings. For the teams in the Frozen Four we get a power rating of \( e^{1.687} = 5.40 \) for Quinnipiac and </span>\( e^{1.079} = 2.94 \) for St. Cloud State, so we estimate the probability of Quinnipiac beating St. Cloud State to be about \[ \frac{5.40}{5.40+2.94} = 0.65. \] What's your estimate for Massachusetts-Lowell beating Yale?http://angrystatistician.blogspot.com/2013/04/lunchtime-sports-science-fitting.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-8956268123135262261Mon, 01 Apr 2013 23:03:00 +00002013-04-01T16:03:44.806-07:00Lunchtime Sports Science: Cracking a New SportThis is the first and what will be a series of relatively short pieces on sports analytics. I'll be using a variety of sports for examples, including both team sports and single-player sports, and I'll also make my code available through my <a href="https://github.com/octonion" target="_blank">GitHub account</a>.<br /><br />Here are my recommended tools. If you're unfamiliar with some of these, don't worry. You'll pick them up as you go along, and they form a powerful suite that will keep you on the cutting edge even as a professional data scientist.<br /><ol><li><b>Hardware</b> - Ideally you want at least 4GB of RAM for larger data sets, but you'll be able to do high-level analysis with almost any modern computing hardware.</li><li><b>Linux operating system</b> - You can certainly do top-notch data analysis using any operating system, but Linux is an excellent (and free) working environment. There are a variety of ways to install and use Linux, but I'd recommend <a href="http://www.ubuntu.com/download/desktop/windows-installer" target="_blank"><b>Ubuntu's Windows installer</b></a>. This will allow you to easily install Ubuntu alongside Windows, and it also makes it easy to uninstall Ubuntu later (if you choose). Ubuntu is just one of the (many) Linux distributions available, but it's very popular and well supported.</li><li><b>R programming language</b> - R is a powerful statistical programming language, and it has thousands of available packages available. If you're using Ubuntu, installing R is simple - <a href="http://cran.r-project.org/bin/linux/ubuntu/README" target="_blank"><b>sudo apt-get install r-base</b></a>. That's it!</li><li><b>Python programming language</b> -Python is a powerful and relatively easy to use programming language. One of the most common tasks for sports analytics is <a href="http://en.wikipedia.org/wiki/Web_scraping" target="_blank">web scraping</a>, and <a href="http://www.slideshare.net/maikroeder/overview-of-python-web-scraping-tools" target="_blank">Python is an excellent choice</a> thanks to libraries such as <a href="http://wwwsearch.sourceforge.net/mechanize/" target="_blank">Mechanize</a>, <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">Beautiful Soup</a> and <a href="http://lxml.de/" target="_blank">lxml</a>. It's also a great language for <a href="http://en.wikipedia.org/wiki/Data_cleansing" target="_blank">data cleansing</a>. Installing Python under Ubuntu - <b>sudo apt-get install python</b>.</li><li><b>PostgreSQL database server</b> - There are many ways to store and analyze data sets, but a dedicated <a href="https://en.wikipedia.org/wiki/Relational_database" target="_blank">relational database</a> server is necessary tool for high-level analytics. <a href="http://www.postgresql.org/" target="_blank">PostgreSQL</a> is my personal recommendation, but there are other good choices (such as <a href="http://www.mysql.com/" target="_blank">MySQL</a>). PostgreSQL is free, fast, powerful and has a <a href="http://en.wikipedia.org/wiki/PostgreSQL#Procedural_languages" target="_blank">huge variety of procedural languages available</a> (including R and Python). Installing PostgreSQL under Ubuntu - <b>sudo apt-get install postgresql-9.2</b>.</li><li><b>GitHub account</b> - Setting up a <a href="https://github.com/" target="_blank"><b>GitHub account</b></a> is free and will allow you to automatically following any changes to various sports analytics GitHub projects (<a href="https://github.com/octonion" target="_blank">such as mine</a>). Later, you can set up your own repositories if you'd like to share your own work with other people. Don't forget to install git under Ubuntu - <b>sudo apt-get install git</b>.</li><li>We'll start with analyzing hockey. If you'd like to take a look at some of my hockey code and data that I've scraped, you can find them in <a href="https://github.com/octonion/hockey" target="_blank">my hockey GitHub repository</a>. If you've set up Ubuntu and have installed git you can execute the command <b>git clone https://github.com/octonion/hockey.git</b> to make a local copy of my repository.</li></ol><div>Here's a basic outline for tackling a new sport.</div><div><ol><li><b>Understand how teams win</b> - Build a model to project the likely outcome for a team when facing a particular opponent. Example - <a href="http://www.collegehockeynews.com/ratings/krach.php" target="_blank">Krach for hockey</a> (which is based on the Bradley-Terry model).</li><li><b>Understand how teams score</b> - Build a model to project how many goals/points/runs teams are likely to score or allow when facing a particular opponent. Exampe - <a href="http://hockeyanalytics.com/2004/09/poisson-toolbox/" target="_blank">Poisson distribution and hockey</a>.</li><li><b>Relate the two</b> - Characterize the relationship between scoring and winning or losing. Example - <a href="http://en.wikipedia.org/wiki/Pythagorean_expectation" target="_blank">Pythagorean win expectation</a>.</li><li><b>Understand how players score/prevent scoring</b> - Determine which aspects of player performance impact team offense and defense and by how much.</li><li><b>Understand player contribution to winning/losing</b> - This is nearly automatic once you understand the relationship between team offense/defense and team winning/losing.</li></ol><div>In my next article we'll build a basic <a href="http://en.wikipedia.org/wiki/Sports_rating_system" target="_blank">power ranking</a> model for hockey to predict likely game outcomes.</div></div>http://angrystatistician.blogspot.com/2013/04/lunchtime-sports-science-cracking-new.htmlnoreply@blogger.com (Christopher Long)0tag:blogger.com,1999:blog-9149402429183581490.post-3725092235959553735Wed, 27 Mar 2013 05:05:00 +00002013-03-26T22:05:11.366-07:00Solving Recurrences with Difference EquationsHere's an example from Robert Sedgewick's <a href="https://class.coursera.org/introACpartI-001/class" target="_blank">course on analytic combinatorics</a>.<br /><br />Solve the recurrence \[ a_n = 3 a_{n−1} − 3 a_{n−2} + a_{n−3} \] for \( n>2 \) with \( a_0=a_1=0 \) and \( a_2=1 \).<br /><br />Let \( f(n) = a_n \); in the language of difference equations the above becomes simply<br />\[ \frac{{\Delta^3} f}{{\Delta n}^3 } = 0 . \] Immediately,<br />\[ f(n) = c_2 n^2 + c_1 n + c_0 . \] Applying the initial conditions we get<br />\[ c_0 = 0, c_2 + c_1 = 0, 4 c_2 + 2 c_1 = 1, \] and so the solution is \( a_n = \frac{1}{2} n^2 - \frac{1}{2} n \).<br /><br />Now what if the initial conditions are changed so \( a_1 = 1 \)?<br /><br />http://angrystatistician.blogspot.com/2013/03/solving-recurrences-with-difference.htmlnoreply@blogger.com (Christopher Long)0