# Progressive Betting Strategies Analysis with Markov Chains

If you are a fan of statistics and probability, then you might have a certain affinity for various games of chance. It can be quite fun to, for example, figure out card counting strategies in Blackjack with simulation. It might also be interesting to try to use some machine learning on the basic strategy tables to figure out smaller, easier to learn sub-sets (something that I want to try at some point).

Hopefully you are also aware of the Gambler’s Fallacy. If you have (or think you have) a problem with gambling, don’t be afraid to seek help! Also, never EVER gamble with money that you can’t afford to lose. Always set aside a given amount of money that you are 100% fine with losing all of (because that will happen).

There are excellent sites out there (like Wizard Of Odds) that give excellent information on probabilities and house edges (notice how none are in our favor!). There are also plenty of discussion of strategies (usually about how bad they are). What I was curious about was whether or not some betting strategies were able to increase the probability of making a set profit (and then stopping). Most people don’t go to the casino very often (if at all), so I wanted to find out about short-term behaviors of these strategies, rather than the obvious long-term failure.

Using Markov Chain analysis and Monte Carlo simulation (in the next post), I’m going to examine some betting strategies. The obvious conclusion is “you will still lose everything in the long run”, but there are some interesting twists along the way. I’ve included some code so you can set up your own analyses, too!

# MLB Call Challenges: Who Wins the Reviews? (Updated)

Back at the start of the 2014 Major League Baseball season, new rules were implemented for making plays reviewable. Managers are now allowed to have certain plays reviewed and potentially overturned. There’s only 1 full season and a couple months of data, but let’s dig in and see what we can learn.

I’m interested in finding out several things: Which teams ask for reviews the most? Which teams are the most successful? Are there any umpires who find their calls being reviewed and overturned more than others? Also, how long do reviews take, and does the length of the review time hint at the ruling for the review?

# Computational Methods for Games 2: Markov Chains

In many games, tabletop or otherwise, there are a series of positions, board states, or other features that occur in some kind of order. In monopoly, for example, you travel in a circle. Each property is a ‘state’ that your piece (battleship!) can be in. In something like Candyland, Chutes and Ladders, or Mr. Bacon’s Big Adventure, there is a goal state to reach, and you do things (roll die, draw cards, etc.) to try to get there.

What makes these more complicated than say, just figuring the combined probability of rolling a certain sum for many die, is that the game states branch. Branching just means that you can reach more than 1 state after your current one. Markov Chains are a powerful tool for analyzing a game’s progress through it states, and this post will show you an example of that, using the game Betrayal at House on the Hill.

# Zombie Dice Strategy Evaluation

In the last post, we took a quick look at the basics of Monte Carlo simulation, and used a simple simulation to get the probabilities of various outcomes in the first roll of Zombie Dice. In this post, we’ll extend our simulation to play turns for us, based on a strategy that we can define. We’ll try several different strategies of varying complexities and see how well we do!

# Computational Methods for Tabletop Games 1: Zombie Dice and Monte Carlo Simulation

Many board and tabletop games rely on the randomness of dice rolls (or card order) to create uncertainty in the game. This will lead many frustrated players to ask “What are the odds of that happening?”. Catan is a good example of this, as a run of not getting any of your numbers will quickly lead to frustration! Catan’s odds are easy to calculate, though. The die rolls are independent from one turn to the next, and the state space of outcomes never changes.

When the games get more complex, the odds may not be possible to compute analytically, or they may just be complicated to compute by hand. Zombie Dice is an example of just such a game. Zombie dice has many stages within a turn, and while each stage is analytically calculable, the overall odds of getting some number of brains is dependent on the player’s strategy. Using a general method called Monte Carlo Simulation, we can easily play thousands of turns and calculate odds, all without needing more than a simple random number generator.

# Running the Numbers: Analyzing Pace Structure for a 15K

Races (the running kind) are a great resource for anyone looking to play with a moderately sized data set. It’s not hard to make some descriptive and pretty charts, and you can do some simple “who ran fastest?” breakdowns by various factors.

In this post, I’ll look at a race that took place recently here in Atlanta: The Hot Chocolate 15K/5K. The best parts of the results they posted is that they include split times for each third of the 15K. That means that we can do some analyses on runner pace consistency and time trends!

We’ll start with some quick visualization of the participants and the results to orient ourselves to the data, and then get into analyzing (spoiler: with clustering) runner pacing structures.

# Academy Awards: Nominations and Win Relationships

You can’t win if you don’t play, as the saying goes. For the Oscars, you can’t win if you aren’t nominated. I’ve been wondering if there are any relationships or trends within the Oscar nomination data (and that data alone! No outside budget or other information) that can help us determine who is going to win. My main goals are to find out the following:

1. Do more nominations guarantee more wins? If so, in what way?
2. Which categories win more for movies with a single nomination?
3. Which categories win more together (for the same movie)?

Those questions, and maybe some interesting tangents, are the focus of this post. To learn about where I got the data, and what assumptions are in it, see the previous post. Once you’ve read that, let’s get into the data. Continue reading →