Now the lines for these over-under bets are made by Vegas oddsmakers and we get them from the Internet. They don't necessarily represent 50/50 odds for the over/under events. The goal for the oddsmakers is to get 50/50 money on the bets to minimize their risk. So the line represents the average

*perceived*50/50 odds point of the people who place bets. For my purposes today I'll ignore that and assume the odds are actually 50/50.

At the last Superbowl I brought some family to the party and the five of us filled out our forms. If my goal is not to win but to have someone in my group of five win, what strategy could we use in filing out our forms to maximize the probability that one of us wins? Again, I'll assume every bet on the form has 50/50 odds. We, as a group, also all fill out our forms together. (Or, equivalently, I could enter the contest multiple times.)

This year there were 12 bets on the form. The winner ended up getting 10 correct. The majority of participants got 3 or 4 right. I knew most people going were football fans and would have strong biases when they placed their bets. I figured I probably would have biases, too. So I decided to just use random.org to make each betting decision for me. I ended up getting 6 right.

Suppose I tell my wife that I'll fill out her form. How should I fill it out? The exact opposite of my first form is what I would intuitively decide to do. Her score will be 12-MyScore. We'll come back to this strategy, later. For a group of five, however, it's not so simple.

Consider the outcome space. There are 2^12 or 4096 different ways the results could turn out. We would like the average "distance" from any of these outcomes to one of our five choices to be minimized. "Distance" would be defined as the hamming distance. So the optimal five choices will be placed in this outcome space to jointly minimize their average distance to all possible outcomes. (There's probably some prior research on this topic but I couldn't seem to find the right search term to find it.)

If we have only one choice, it turns out that any choice will do, i.e. any way you fill out the form will, on average, have the same average hamming distance to all possible outcomes. Since all outcomes are equally likely, this makes sense. The expected value of the number of correct bets will be, you guessed it, 6.

I wrote a simple simulator in C++ to find groups of choices (bit strings) that, as a group, minimize the average distance to all outcomes. I can't exhaustively go through all groups of five. That's because 4096 choose 5 is an enormous number (on the order of 10^15). Not so big that it couldn't be done by a super computer but it's too big for the computer and time that I have. So we'll sample the space randomly and find the best ones we can.

The first thing I found was that there are multiple groups-of-five that achieve optimality. This makes sense when you go back to the results of having just a single choice.

To introduce my next result, I would just like to remind you that the number of bits set (or number of bets that are over) follows a binomial distribution. It much more likely that between 5 and 7 bets will be over than that 2 or fewer will be. So when I looked at the number of bits set in each choice over many optimal groups, I found that the number of bits set for my optimal choices also followed a binomial distribution. This makes sense, too. If I can have multiple entries, I want the distribution of my entries to follow the distribution of the outcome space. (For five entries I have a pretty small sample size and perhaps just using the simulator to find an optimal group of five entries would be simpler than coming up with one on my own.)

For the future I'll have to take a look at the other extreme. What betting strategy results in the worst performance? What distribution does it follow? (The inverse of the binomial distribution?)

Also, how does the expected number of correct bets change with number of entries? Obviously this goes up with the number of entries but quantitatively how does it change? All for another day.