You can’t win if you don’t play, as the saying goes. For the Oscars, you can’t win if you aren’t nominated. I’ve been wondering if there are any relationships or trends within the Oscar nomination data (and that data alone! No outside budget or other information) that can help us determine who is going to win. My main goals are to find out the following:

- Do more nominations guarantee more wins? If so, in what way?
- Which categories win more for movies with a single nomination?
- Which categories win more together (for the same movie)?

Those questions, and maybe some interesting tangents, are the focus of this post. To learn about where I got the data, and what assumptions are in it, see the previous post. Once you’ve read that, let’s get into the data.

## Initial Notes

I only used data for nominations and wins from 1960 onward. The reason for this is because, before 1960, there were many more single-nomination movies. After 1960, the number of films with 1 and more than 1 nominations were pretty steady. I thought that would give more stable/consistent results. Going much later than that would reduce the size of the data set to somewhere that would have too little data.

## Nominations vs. Wins

The first thing to look at is whether or not there is any relationship between being nominated and winning. Yes, besides the obvious. We will first look at the probability of a movie winning some number of Oscars, conditioned on how many nominations it received. This is shown as heatmap of probabilities for each combination of the number of nominations and the number of wins. The results are close to what we might expect. The probability of winning more than one Oscar grows with the number of nominations. For movies with a small number of nominations, getting any Oscars is difficult. If you’re only nominated for a single Oscar, you have about a 90% chance of NOT winning. It does seem odd for single-nomination movies to do so poorly. We’ll cover the makeup of those movies in the next section and see if we can figure out why they don’t do well.

The outlier with 14 nominations and 11 wins is TITANIC. Another observation that stands out is that, for movies with fewer nominations, getting above 1 win is difficult. For 6 nominations, the probability of going home with 2 or more Oscars is 40%. It’s only a 13% chance to get 3 or more. For 5 nominations, getting 2 or more only happens 24% of the time. Let’s simplify things a bit and just look at the probability of winning any kind of Oscar: We have a lot more data for the lower number of nominations (because those are more common). In the data set I’m using, there are 723 single-nomination movies, and 887 2+ nominations movies (see the actual counts farther down). First, let’s look at the Jeffreys Intervals for estimating the binomial probability of winning at least 1 Oscar. These intervals are nice and Bayesian (no normal approximations for bounded variables!), and they also account for the number of samples. It appears that any movie with 5 or more nominations has better than a 50/50 shot at winning at least 1 Oscar. The basic conclusion of this section appears to be that more nominations = more wins, and that fewer than 6 nominations means you’re probably only getting 1 Oscar for the film. Let’s now try to dig deeper into what’s going on with the winning categories.

## Single vs. Multiple Nomination Categories

I said just a bit ago that there are 723 single-nomination movies, and 887 2+ nominations movies. The exact breakdown of the counts for each category are:

Noms: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Count: 723 287 152 124 91 46 59 40 25 28 18 9 7 1

We also saw that the single-nomination films really didn’t do well at all. Let’s look at the categories of these movies and see if there are any trends to pick up on. The following chart shows two proportions in the data, each split between single and multiple-nomination films. The left plot shows the probability of winning in a category for each film type. The right plot shows the ratio of nominations for each film type. Each ratio or probability is based on the actual number of nominations and wins (there are duplicates for each year due to the category condensing that had to happen). Single-nomination movies are nominated and win the most (as a group) in the categories that don’t participate in the major categories (such as shorts, documentaries, and foreign films).

It’s interesting to see the ‘blips’ in the ratio of nominations. Visual effects, screenplay, and costume design stand out (as an example) as being a film’s only nomination around 20% of the time. They almost never win, though. The ORIGINAL SCREENPLAY category is really telling for this. The blips in VISUAL EFFECTS, for example, probably exist for special-effects driven blockbusters that are more ‘movie’ and less ‘film’. The takeaway here seems to be that, unless your movie has been singly nominated for a short, documentary, or foreign slot, don’t get your hopes up for winning.

## Multiple Nomination Relationships

Now that we’ve seen what categories are more likely to produce winners for single and multi-nomination films, let’s try to find out if there are any trends or relationships for wins in multiply nominated films. What we want to find out is if some categories win more together, and if there are any interesting conclusions to draw from that. This section will start with a graph-based approach to help detect strong connections and clusters, if possible. I used NetworkX in Python to build a graph of which categories won together for multi-nominated films. I exported that to Gephi to make a pretty picture and do some analyses. The results of that are shown below, in a graph were each edge connects categories that won together, and the sizes of the edges reflect the number of paired wins. The node size is proportional to the number of paired wins that the node is in. The colors are based on Gephi’s community detection algorithm. We’ll talk more about the communities later.

### Analyzing Category Pairs

For starters, let’s take a look at the pairs of wins. DIRECTING and BEST PICTURE are paired together the most, and those two categories also pair the most with all the other categories. That makes sense, considering that awarding the overall film or director an Oscar means that the components of the film are probably winners as well. Let’s look at the probabilities of the pairs winning in any given year, along with their confidence intervals. Keep in mind that these probabilities are calculated by dividing the number of actually winning pairs by the number of years that the pair could have won. There are several edges that have near or greater than a 50% likelihood of showing up during the Oscars. The most probable one is the DIRECTOR:BEST PICTURE pair (since these probabilities are directly proportional to the edge weights in the graph). The top 20 pairs are shown below, in descending order of probability.

## Category Communities and Clusters

Let’s see if we can pick up on any higher-order relationships or groups among the categories. Our first step in doing that was running Gephi’s community detection algorithm (a graph clustering approach). That algorithm found 3 communities (the three colors in the earlier graph). One contained just one category: ANIMATED SHORT. The other two are split between what I will broadly term “CRAFT” (in red) and “TECHNICAL” (dark blue). The CRAFT community contains the DIRECTING, BEST PICTURE, all the acting categories, and the writing categories. These categories signify the ability to craft a great story and act it out. The TECHNICAL community has many of the various post-filming activities, as well as the musical and visual categories. The community detection gives us some information that films might be more likely to do well in an acting/writing arena, or in the technical side, but not always a lot of both at the same time.

In the interest of finding smaller communities to explore that idea, I turned to Non-Negative Matrix Factorization (NMF). I like NMF because it is, in some senses, both a regression and a clustering algorithm. I used the Python library nimfa and its Binary Matrix Factorization algorithm (since we only have 0/1s in our matrix) for this part. I tried many component numbers, and I’ll show you the results for 10 below (the last column are all isolated or don’t fit into a component):

A | B | C | D | E | ISOLATED |
---|---|---|---|---|---|

BEST PICTURE | VISUAL EFFECTS | COSTUME DESIGN | ORIGINAL SCORE | ORIGINAL SCREENPLAY | ANIMATED SHORT |

DIRECTING | SOUND MIXING | PRODUCTION DESIGN | ORIGINAL SONG | ACTRESS | DOCUMENTARY |

ADAPTED SCREENPLAY | CINEMATOGRAPHY | MAKEUP AND HAIR | ANIMATED FEATURE FILM | SHORT | |

ACTOR | SOUND EDITING | SHORT DOCU | |||

FILM EDITING | SUPP ACTRESS | ||||

SUPP ACTOR | |||||

FOREIGN FILM |

This breakdown makes sense, because it’s just split up the bigger components from earlier while maintaining some obvious themes in each category. ANIMATED FEATURE FILM would belong on its own, except it had some song nominations that keep it linked to the music side of things. The supporting actor/actress in the isolated categories indicate that those awards are somewhat separate from the others (in that they aren’t more frequent or tied to any other categories.

I also used SciPy’s hierarchical clustering to see how the categories related to one another as we preferred more or less grouping of them. I did that because I wanted to see the difference between that method and the BMF (and because of the easy dendrogram creation in SciPy). Here is the dendrogram that resulted:

Read the dendrogram by moving from left to right. As we go from left to right, we are joining categories based on their distances from each other. The sooner two categories are joined, the closer they were to each other. If you look at DIRECTING and BEST PICTURE, they form a cluster before any other categories, because those two are paired together the most. The sooner clusters are formed, the more confident we can be that those categories tend to win together. Eventually everything is clustered into one group, but the intermediate clusters can tell us something about what categories win together more often.

## Wrap Up

This has been a long post, and they’re about to play me off the stage to get to another commercial. Overall, I accomplished answering the questions I set out to answer. Let’s go through them real quick and summarize.

### Do more nominations guarantee more wins? If so, in what way?

Yes, but not really until you start getting around 6 nominations. Below that and you’re probably just going to win 1 (or no) Oscar. Even with many nominations, it’s hard to get more than a couple of Oscars. The higher nominated films do get more Oscars, but the number of data points up there is very low. Winning more than 3 or 4 Oscars is unlikely, but you have a much better shot of winning anything if you’ve got two or more nominations.

### Which categories win more for movies with a single nomination?

The specialty categories won the most (short, animated, or foreign). Of the categories that multi-nomination films dominate, MAKEUP AND HAIR was the most winning-est category for the single-nomination films.

### Which categories win more together (for the same movie)?

BEST PICTURE and DIRECTOR are at the top of that list. They’ve won together almost every year since 1960. The charts above show which ones are more likely to win together. As we move away from pairs, the clustering/community algorithms detected a split between what I called CRAFT (acting, directing, writing) and TECHNICAL (effects, mixing, production). These were split more by using BMF and hierarchical clustering. An interesting split that happened is that ACTRESS, ORIGINAL SCREENPLAY, and SUPP ACTRESS seem to favor each other more, while ACTOR and ADAPTED SCREENPLAY are separate (but favor each other).

### The End.

I’ll leave any broader conclusions (about all of this) as an exercise to the reader! Based on this data, what movies do you think will win big this year? Does Grand Budapest (with 9 nominations!) stand to win 2, 3, or more Oscars? Is Birdman going to win for ORIGINAL SCREENPLAY and SUPP ACTRESS since it has both of those commonly winning pairs together, or will it be Boyhood (the only other film with that pair)? Will American Sniper grab ACTOR and ADAPTED SCREENPLAY but not get BEST PICTURE because it isn’t nominated for DIRECTING? Is all this just blowing smoke because the Oscars are political? Time will tell.