Running the Numbers: Analyzing Pace Structure for a 15K

Races (the running kind) are a great resource for anyone looking to play with a moderately sized data set. It’s not hard to make some descriptive and pretty charts, and you can do some simple “who ran fastest?” breakdowns by various factors.

In this post, I’ll look at a race that took place recently here in Atlanta: The Hot Chocolate 15K/5K. The best parts of the results they posted is that they include split times for each third of the 15K. That means that we can do some analyses on runner pace consistency and time trends!

We’ll start with some quick visualization of the participants and the results to orient ourselves to the data, and then get into analyzing (spoiler: with clustering) runner pacing structures.

Continue reading →


Academy Awards: Nominations and Win Relationships

You can’t win if you don’t play, as the saying goes. For the Oscars, you can’t win if you aren’t nominated. I’ve been wondering if there are any relationships or trends within the Oscar nomination data (and that data alone! No outside budget or other information) that can help us determine who is going to win. My main goals are to find out the following:

  1. Do more nominations guarantee more wins? If so, in what way?
  2. Which categories win more for movies with a single nomination?
  3. Which categories win more together (for the same movie)?

Those questions, and maybe some interesting tangents, are the focus of this post. To learn about where I got the data, and what assumptions are in it, see the previous post. Once you’ve read that, let’s get into the data. Continue reading →

The Academy Awards: Building a Data Set & Category Viz

The Academy Awards are nearing, and all the trailers are now reminding us of how many Oscars this or that movie have been nominated for. After enough of these ads, it’s hard not to wonder what the chances of each movie are to win. The amateur data junky in me wanted to find out, and the Oscars are a manageable enough problem that some visualization and analyses should be feasible.

In the next post, we’ll go over the various finding. For now, there’s actually a chunk of data conditioning to go over. I’ll show you where I got the data from, how I parsed it out, and what choices were made to get an apples to apples Oscar win analysis underway!

Continue reading →

The Information in Wheel of Fortune Letters

In the previous post, we took a look at some properties of letters in Wheel of Fortune (WoF) puzzles. We saw how contestants tend to pick CDMA the most, but that DGHO was much more likely to reveal more letters. The thought that we ended with was how we could value the information gained by each letter, rather than just counting the revealed letters themselves?

This post explores how we can measure the information gained by letters in a couple ways, and we’ll see if the information perspective brings any new advice to light.

Continue reading →

How to (Maybe) Win at Wheel of Fortune

The other night, while watching Wheel of Fortune, I got curious if the WoF staff made RSTLNE less common during the final round of the game. I found some data, worked on it for too short a period of time, and wound up with a decent sized post on the DataIsBeautiful subreddit. There were lots of good comments and criticisms (I had made some dumb errors). These were well addressed (graphically and otherwise) in Mr. Ingraham’s WaPo article, which I commend to you.

I wanted to re-do my work here as a way to correct my mistakes and also ask some questions not yet asked of the data.

Continue reading →