Skip to main content

Probability and Cumulative Dice Sums

Five Free Student Tickets for the SaberSeminar in Boston (August 17-18, 2014)

Meredith Wills, Will Carroll and myself are donating four student two-day tickets, including lunch, for the upcoming baseball analytics Saberseminar run by Dan Brooks. This is a wonderful event, and 100% of the proceeds are donated to the Jimmy Fund. You must be a current student. Meredith and myself will by choosing four students by the end of this week, Sunday April 13, 2014.

Please note:
  • These tickets are for both days, August 17-18, 2014
  • The event is in Boston, MA
  • Lunch is included, but no other meals
  • Transportation and lodging are not included

If you would like to be considered for a donated ticket, please send:
  • Your full name (first and last)
  • If you're outside of the Boston area, how will you be getting to the event?
  • Your school affiliation and whether high school or college
  • Best contact email address (if different from reply-to address)
  • A little about your baseball interests, analytical or otherwise
  • Do you see yourself working in baseball? For a team, as a journalist, or something else?

Please email the above information to me at sabermetrics@gmail.com.

Again, please do so by the end of the day on Sunday, April 13, 2014. Once the tickets are awarded they're gone.

Comments

  1. Greetings Christopher am interested to chat with you about your work the horse racing industry using predictive analytics/machine learning methods -i am based in Sydney Australia

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete

Post a Comment

Popular posts from this blog

Probability and Cumulative Dice Sums

Let a die be labeled with increasing positive integers \(a_1,\ldots , a_n\), and let the probability of getting \(a_i\) be \(p_i>0\). We start at 0 and roll the die, adding whatever number we get to the current total. If \({\rm Pr}(N)\) is the probability that at some point we achieve the sum \(N\), then \(\lim_{N \to \infty} {\rm Pr}(N)\) exists and equals \(1/\rm{E}(X)\) iff \((a_1, \ldots, a_n) = 1\). The direction \(\implies\) is obvious. Now, if the limit exists it must equal \(1/{\rm E}(X)\) by Chebyshev's inequality, so we only need to show that the limit exists assuming that \((a_1, \ldots, a_n) = 1\). We have the recursive relationship \[{\rm Pr}(N) = p_1 {\rm Pr}(N-a_1) + \ldots + p_n {\rm Pr}(N-a_n);\] the characteristic polynomial is therefore \[f(x) = x^{a_n} - \left(p_1 x^{(a_n-a_1)} + \ldots + p_n\right).\] This clearly has the root \(x=1\). Next note \[ f'(1) = a_n - \sum_{i=1}^{n} p_i a_n + \sum_{i=1}^{n} p_i a_i = \rm{E}(X) > 0 ,\] hence this root is als...

Simplified Multinomial Kelly

Here's a simplified version for optimal Kelly bets when you have multiple outcomes (e.g. horse races). The Smoczynski & Tomkins algorithm, which is explained here (or in the original paper): https://en.wikipedia.org/wiki/Kelly_criterion#Multiple_horses Let's say there's a wager that, for every $1 you bet, will return a profit of $b if you win. Let the probability of winning be \(p\), and losing be \(q=1-p\). The original Kelly criterion says to wager only if \(b\cdot p-q > 0\) (the expected value is positive), and in this case to wager a fraction \( \frac{b\cdot p-q}{b} \) of your bankroll. But in a horse race, how do you decide which set of outcomes are favorable to bet on? It's tricky, because these wagers are mutually exclusive i.e. you can win at most one. It turns out there's a simple and intuitive method to find which bets are favorable: 1) Look at \( b\cdot p-q\) for every horse. 2) Pick any horse for which \( b\cdot p-q > 0\) and mar...

Mixed Models in R - Bigger, Faster, Stronger

When you start doing more advanced sports analytics you'll eventually starting working with what are known as hierarchical, nested or mixed effects models . These are models that contain both fixed and random effects . There are multiple ways of defining fixed vs random random effects , but one way I find particularly useful is that random effects are being "predicted" rather than "estimated", and this in turn involves some "shrinkage" towards the mean. Here's some R code for NCAA ice hockey power rankings using a nested Poisson model (which can be found in my hockey GitHub repository ): model The fixed effects are year , field (home/away/neutral), d_div (NCAA division of the defense), o_div (NCAA division of the offense) and game_length (number of overtime periods); offense (strength of offense), defense (strength of defense) and game_id are all random effects. The reason for modeling team offenses and defenses as random vs fixed effec...