Estimating the Cost of Education

I saw a post circulating Facebook that sardonically addressed some ‘real’ comments from Facebook about teacher salaries. That post, “Are you sick of highly-paid teachers?“, comes to the conclusion:

$1.42 per hour per student — a very inexpensive baby-sitter and they even EDUCATE your kids!

This doesn’t seem realistic at first glance, nor does it account for all the other factors that go into employing a person. Let’s be a bit more rigorous and understand the problem from the numbers side.

Continue reading →


Falling Through the Earth (and Mars)

Recently, on Reddit (of course), someone posted a video talking about how long it would take to fall straight through the center of the earth to the other side. One of the nice parts of the video is that it shows you what the time would be when we account for the actual, non-uniform density of the earth. The video just showed an Excel sheet, though.

I recommend you watch the linked video so you have a good visual of the problem. Once you’ve done that, come on back! In this post I’m going to show you how to solve the problem using Python and a numerical integrator from SciPy.

Continue reading →

Powerball Nears Best EV (Updated)

This is a quick post that is the exact same as the Mega Millions post I did. I break down the Powerball odds, use some sales data, and look at the EV over various jackpots.

The quick takeaway, as the Powerball is up to $450 million for Wednesday, is that either this drawing (or the next if no one wins) is as near to the possible expected value for this lottery. The other takeaway? Don’t buy lottery tickets!

UPDATE: Going back I found an error in my code. I’ll update it at some point, but for now, take it with a grain of salt (except the historical data part, that part was fine).

Continue reading →

Mega Millions, Multiple Winners, and Expectations

The Mega Millions lottery is a popular number-picking lottery game in the US. It exists in 45 states (including D.C.), and is played by millions of people every week. Lotteries are well known for having negative expected values, meaning that players lose (on average) more than they win. This should be expected, given that lotteries (and gambling in general) are profit-seeking enterprises.

The potentially large jackpots of Mega Millions (the jackpot is pari-mutuel) can push the game into a region of positive EV, though. This is counter-balanced by the fact that duplicate tickets will result in splitting the winnings equally among the winners, driving down the available EV. This post explores what kind of impact that has on the game. Continue reading →

Non-Uniform Coupon Collector’s Problem

The Coupon Collector’s Problem is a neat little problem in probability, and I first heard about it recently on the statistics subreddit. You, like me, might be familiar with it if you’ve ever tried to solve the expected number of boxes of cereal to buy to get all the toys. Not that I have that problem right now, but it shows up on probability quizzes and the like.

The problem’s solution hinges on two things. One, there is replacement (sampling from a seemingly infinite population of items that are in some proportion). Two, all items are equally likely. What happens when they aren’t equally likely? We turn back to Absorbing Markov Chains (AMC), because apparently that has to be 50% of what I talk about on here!

Continue reading →

2015 AJC Peachtree Road Race

Every 4th of July in Atlanta Georgia the Atlanta Journal Constitution holds the Peachtree Road Race (AJCPRR, or just PRR). The PRR is a 10K run down one of the many Peachtree streets, from Lenox Mall to Piedmont Park. It’s been going on for quite a while, find out more about the cool history and tradition here.

One of the things that makes the race a pretty spectacular event is the number of people! Up to 60 thousand participants run each year, making it pretty massive (the largest in the world). There are 26+ waves, with professional (really fast) runners at the start, people who walk it, and everyone in between. People dress up in fun costumes, too, and it’s a great atmosphere.

It’s also a great chance to plot some medium(ish)-sized data and see some pretty pictures!

Continue reading →

Bayesian Modeling with PyMC: Dirichlet and a Custom Stochastic

There was a question asked on Reddit’s r/statistics by user nomm_ in this post. It sounded like the perfect problem for some Bayesian modeling, so I dusted off the PyMC Python library to tackle it. This will also serve, I hope, as a guide to others who are trying to do things like custom stochastics in PyMC that are also observed values.

The question involves estimating the probabilities of selecting a particular action several times in a row (the action involved upgrading items in a video game). There are nodes that need to be linked, and the actions decide how many to link. Several actions can take place for each result, and the actions change likelihood based on the previous actions. Let’s see how to tackle this!

Continue reading →