In the previous post, we took a look at some properties of letters in Wheel of Fortune (WoF) puzzles. We saw how contestants tend to pick CDMA the most, but that DGHO was much more likely to reveal more letters. The thought that we ended with was how we could value the information gained by each letter, rather than just counting the revealed letters themselves?
This post explores how we can measure the information gained by letters in a couple ways, and we’ll see if the information perspective brings any new advice to light.
As I alluded to in the last post, information can be seen as a ‘no’ to a set of possible options for the state of things. With WoF puzzles, whenever a letter is revealed, all the words that could have been there have been said ‘no’ to. We can also think of it in reverse, and treat information as the amount of things we can say ‘yes’ to. The more things we can say no to, and the fewer than can be said yes to, the more information we have as to the true state of the puzzle.
There is one primary way to measure this, which is by counting the number of words that a letter excludes (the number of included just being the difference between that and all possible words). Let’s do a simple example to illustrate. Let’s say you are up at the final puzzle, the category is ‘THING’, and this is what you see (after RSTLNE):
_ E _ _ R _ _ S E R
Taking advantage of a dictionary (more on this later), there are 38 possible words for the first word, and 6 possible for the second. For the second word, those options are:
['BRUISER', 'AROUSER', 'PRAISER', 'BROWSER', 'GROUSER', 'CRUISER']
Since you were fortunate enough to know about O being a pretty good vowel to pick, you pick that, and now see:
_ E _ _ R O _ S E R
You now only have 3 options left for the second word. You’re pretty sure that “BED AROUSER” isn’t going to be on WoF, so you have two choices left. For the first word, there are still 37 options (“GEO” was eliminated since we know there isn’t an O in that word). Thinking about “BROWSER” leads your tech-savvy mind to guess “WEB BROWSER”, and Pat shows you what prize you’ve won.
If you had one letter to pick for the above clue, “W” would have been the best:
W E _ _ R _ W S E R
After picking that, only 3 possibilities remain for the first word (wed, web, wei), and 1 possibility for the last word. That would makes its value: 1 – 4/44 = 0.91. In the code used, I also subtracted the number of words in the puzzle from the remaining words so that a solved puzzle would have a value of 1. Another option is to count the number of puzzle options left. Originally, 38 x 6 words means 228 combinations, while 3 x 1 means 3. I opted not to use that measure, since it doesn’t seem like it would approximate how people think during the game.
People Aren’t Computers
The one drawback that I can think of to the above method is that it takes full advantage of a computer’s memory. For a person under pressure, memory and evaluation don’t always perform at their best. A way that this can manifest is that people guess words that have letters that have been eliminated. They don’t utilize all the information they have effectively (and who can blame them?).
I call this the ‘minimal information’ approach, where only the visible letter arrangements are used to find the possible words. This means that for:
_ E _ _ R _ _ S E R
we would consider “TROUSER” as a possible second word, even though it’s certainly not possible in the puzzle.
Now that we have two extremes of situations (full information usage to minimal information usage), we can actually get to some data and visualizing.
A Quick Word about Dictionaries
First, I want to explain where I got the dictionary to look up words in. Obviously, these results are highly conditioned on the available sets of words. There’s no escaping that! To try to be as fair as possible, I wanted to find a large dictionary with as many common words, person names, place names, and contractions as possible.
The one I ended up using was SCOWL, the Spell Checker Oriented Word List. I joined, based on the site’s advice, all the English and American word lists at level 60 or below. I didn’t include any spelling variants. I also had to add in two words: “JACKMAN” and “HOMEBUYER”, which were answers in puzzles. All the other puzzle words were in the dictionary.
Single Letter Information Value
Using the full and minimal information concepts, the following charts show the values of each letter:
In the full-knowledge measurements, the vowels are the most valuable (as in the last post), with O and A on top. The letters CDMDGH are all very close in value, which makes me think that any strategy with some set of those letters will do well. With O and A nearly tied, CDMA might be as good a choice as DGHO.
In the minimal-knowledge measurements, we see more variation in the values of each letter. The most pronounced variation is in the vowels, where O is a much better choice than A. C and M are much worse than G and H in this case as well. This gives some validation to the previous post. For a minimal use of information, the thing you might need the most is revealed letters to help you reduce the number of possibilities.
Let’s go ahead an be more rigorous about evaluating the strategies. We’ll do it in the same way we did the last time, which is to show the distribution of the value of the strategy over all the puzzles. Remember that a value of ‘1’ means that the clue is solved, while a value of zero means no new information was gained. We’ll evaluate CDMA and GHPO (from last time), and we’ll also add BCDA, which are the most valuable consonants and vowel for the full-information case.
The full knowledge strategies all perform quite well, and about the same between each other.
This one shows very little difference between strategies as well. The most interesting difference between the full and minimal strategies is in the bimodal nature of the minimal strategies. The minimal knowledge evaluation doesn’t let us remove letters from words, so the puzzles where those letters aren’t in provide no information beyond what RSTLNE provides.
I double-checked that the strategies weren’t coming out the same because of a coding error by plotting the value of each strategy for each clue. Those charts are below, and they show that the strategies don’t all have the same value for the same puzzles.
What’s It All Mean?
I was surprised to see that, in both cases, the strategies didn’t vary in quality. I wonder if this says more about the value of computational power and dictionary selection (as opposed to the letters themselves)? It could be possible that the dictionary has too many words that would never be picked (and are also easily removed), which makes letters appear more valuable. Since we saw that there is a difference in value for individual letters, the allowance of 4 letters being picked must be enough of an information boost to make all the strategies have similar values. We can double-check this by running our strategies with the three best letters (2 consonants and a vowel):
This gives us some confidence in saying that that reason for the equal value of the strategies is in the number of letter choices, since the values separate more with fewer letters. I didn’t show the plot for the minimal information strategy, because it didn’t change much (the bucket got more extreme, but the strategies didn’t separate). I’ll leave the conclusions of which strategy is better, based on these plots, as an exercise for the reader. Which is a better metric? Full information value, minimal information value, or the number of revealed letters?
It’s worth mentioning that some of the solved puzzles are probably still hard to guess for those of use who aren’t computers. Here’s one (a THING, and the answer is at the end of the post, so scroll slowly):
_ _ _ L _ _ _
All of the strategies solved that one. Here’s what it looked like for the contestant who used MHPA (the revealed letters are the same as you would get for CDMA):
M A _ L _ _ _
The contestant did not win the $30K, unfortunately. There’s only one word in the dictionary that fits that criteria, but it’s not that easy to think of. I think the best take-away is that it pays to forget letters you’ve eliminated, rather than any particular set of letters. If you guessed A and it doesn’t show up, start thinking with Os! If you use CDMA, once you see the revealed letters, start thinking GHPO. Although we probably didn’t need data and plots to tell us that, it’s more fun that way.
The answer is: