Powerball Nears Best EV (Updated)

This is a quick post that is the exact same as the Mega Millions post I did. I break down the Powerball odds, use some sales data, and look at the EV over various jackpots.

The quick takeaway, as the Powerball is up to $450 million for Wednesday, is that either this drawing (or the next if no one wins) is as near to the possible expected value for this lottery. The other takeaway? Don’t buy lottery tickets!

UPDATE: Going back I found an error in my code. I’ll update it at some point, but for now, take it with a grain of salt (except the historical data part, that part was fine).

Continue reading →


Mega Millions, Multiple Winners, and Expectations

The Mega Millions lottery is a popular number-picking lottery game in the US. It exists in 45 states (including D.C.), and is played by millions of people every week. Lotteries are well known for having negative expected values, meaning that players lose (on average) more than they win. This should be expected, given that lotteries (and gambling in general) are profit-seeking enterprises.

The potentially large jackpots of Mega Millions (the jackpot is pari-mutuel) can push the game into a region of positive EV, though. This is counter-balanced by the fact that duplicate tickets will result in splitting the winnings equally among the winners, driving down the available EV. This post explores what kind of impact that has on the game. Continue reading →

2015 AJC Peachtree Road Race

Every 4th of July in Atlanta Georgia the Atlanta Journal Constitution holds the Peachtree Road Race (AJCPRR, or just PRR). The PRR is a 10K run down one of the many Peachtree streets, from Lenox Mall to Piedmont Park. It’s been going on for quite a while, find out more about the cool history and tradition here.

One of the things that makes the race a pretty spectacular event is the number of people! Up to 60 thousand participants run each year, making it pretty massive (the largest in the world). There are 26+ waves, with professional (really fast) runners at the start, people who walk it, and everyone in between. People dress up in fun costumes, too, and it’s a great atmosphere.

It’s also a great chance to plot some medium(ish)-sized data and see some pretty pictures!

Continue reading →

MLB Call Challenges: Who Wins the Reviews? (Updated)

Back at the start of the 2014 Major League Baseball season, new rules were implemented for making plays reviewable. Managers are now allowed to have certain plays reviewed and potentially overturned. There’s only 1 full season and a couple months of data, but let’s dig in and see what we can learn.

I’m interested in finding out several things: Which teams ask for reviews the most? Which teams are the most successful? Are there any umpires who find their calls being reviewed and overturned more than others? Also, how long do reviews take, and does the length of the review time hint at the ruling for the review?

Continue reading →

Running the Numbers: Analyzing Pace Structure for a 15K

Races (the running kind) are a great resource for anyone looking to play with a moderately sized data set. It’s not hard to make some descriptive and pretty charts, and you can do some simple “who ran fastest?” breakdowns by various factors.

In this post, I’ll look at a race that took place recently here in Atlanta: The Hot Chocolate 15K/5K. The best parts of the results they posted is that they include split times for each third of the 15K. That means that we can do some analyses on runner pace consistency and time trends!

We’ll start with some quick visualization of the participants and the results to orient ourselves to the data, and then get into analyzing (spoiler: with clustering) runner pacing structures.

Continue reading →

Academy Awards: Nominations and Win Relationships

You can’t win if you don’t play, as the saying goes. For the Oscars, you can’t win if you aren’t nominated. I’ve been wondering if there are any relationships or trends within the Oscar nomination data (and that data alone! No outside budget or other information) that can help us determine who is going to win. My main goals are to find out the following:

  1. Do more nominations guarantee more wins? If so, in what way?
  2. Which categories win more for movies with a single nomination?
  3. Which categories win more together (for the same movie)?

Those questions, and maybe some interesting tangents, are the focus of this post. To learn about where I got the data, and what assumptions are in it, see the previous post. Once you’ve read that, let’s get into the data. Continue reading →

The Academy Awards: Building a Data Set & Category Viz

The Academy Awards are nearing, and all the trailers are now reminding us of how many Oscars this or that movie have been nominated for. After enough of these ads, it’s hard not to wonder what the chances of each movie are to win. The amateur data junky in me wanted to find out, and the Oscars are a manageable enough problem that some visualization and analyses should be feasible.

In the next post, we’ll go over the various finding. For now, there’s actually a chunk of data conditioning to go over. I’ll show you where I got the data from, how I parsed it out, and what choices were made to get an apples to apples Oscar win analysis underway!

Continue reading →