Don't answer that question! Give me a minute to explain myself before you blurt out "yes!" ... So would you believe me if I said my wife took my phone, lost a bunch of games and ruined my stats? No? I've never been good at lying so let's think about this. My ego is at stake here.

**Shoulder devil:**You stink at Gin Rummy. You're losing to some freaking electrons zipping around in your phone! Pathetic.

**Shoulder angel:**You don't stink. 46.5% is good! And this is a game with a lot of random variation. Even a skilled player will lose to an amateur sometimes. And you're getting better each time you play!

(We will ignore the comment on skill and assume for today that my skill was the same over all games played.)

**Self:**Hmmm... random variation. So maybe it was just a string of bad luck?

**Shoulder angel:**Yes... I mean maybe... Yes to maybe.

I have played 86 games so far. If I flipped a coin 86 times, how many times will it come out heads? The most likely outcome is 43. But even

*more*likely is

*not*getting this outcome. I'll say that again. You're more likely

*not*to get 43 heads in 86 flips but some other number. How much more likely?

The pmf for the binomial distribution is

\begin{equation}

Pr(K=k) = \binom{n}{k} p^{k} (1-p)^{n-k}

\end{equation}

where $n$ is the number of trials, $k$ is the number of successes and $p$ is the probability of success. For our problem we get

\begin{equation}

Pr(K=43) = \binom{86}{43} (0.5)^{43} (1-0.5)^{86-43} = 0.086.

\end{equation}

So if you flipped a fair coin 86 times every day and did this for thousands of days, you'd get 43 heads on 8.6% of those days. The other 91.4% of the days would result in some other number of heads. (The fact that our rounded probability of 0.086 is 86/1000 and 86 is the number of trials is a coincidence. Don't get hung up on that.)

**Shoulder devil:**And...

**Self:**So if you figure out the probability for 40 successes out of 86, you get 0.070. Also small but not so different than 0.086 that you might say that the losses were entirely due to skill.

**Shoulder devil:**That sounds kind of subjective to me. I think 0.070 and 0.086 are

*very*different. Loser.

**Self:**Let's quantify it!

(This is when "QUANTIFY" would zoom in and zoom out with a spinning background a-la the Batman symbol.)

**Shoulder angel:**So what exactly are we quantifying?

What we'd ultimately like to know is the

*true*probability $p$ of success. If it's greater than 0.5 then I am a better player. If it's less, then Jane is better. But we really can't

*know*what that value is so we'll find a confidence interval that makes us comfortable.

At this point we'll try to find the "binomial proportion confidence interval." (I know Wikipedia isn't the best reference. But it is sometimes a good source of references.) The binomial proportion is the probability $p$. The confidence interval will give a range of values of $p$ that is most likely to contain our true value. I won't go into the many methods for calculating this (this is left as an exercise to the reader (I hate reading that in books)). I will give the Clopper-Pearson interval for today. A 95% confidence interval using this method for 40 successes in 86 trials gives $0.35678 \le p \le 0.57592$.

**Shoulder angel:**Look at that! It could be as high as 0.575! Maybe you

*are*better than Jane.

**Shoulder devil:**Or as low as 0.357. Don't celebrate too much, chump.

So am I better than Jane? Maybe... Yes to maybe.

(My gut tells me that right now that I am not better. Recently, however, I've thought of some new strategies that may help. So, again, maybe.)

**Kevin McCallister:**You guys give up? Or ya thirsty for more?

A while ago I read this article at Grantland by Bill Barnwell. In the section titled "The Best of the Best" he gives a table of win/loss records of NFL quarterbacks in games decided by one touchdown or less (in footnote 2 of the article he indicates that this means games ending with a point differential of 7 or less). His table includes ties but for my purposes I'll leave those out. His table is sorted, logically, by win percentage. But the number of games ending in this situation for each quarterback ranges from 50 to 117. Is Terry Bradshaw's 0.593 in 59 games really better than Brett Favre's 0.581 in 117 games?

**Shoulder devil:**Yes. 0.593 is greater than 0.581. Greater is better. Yet you ask the question as if there's some reason for it not to be better. And if you have something in mind, why are you asking a question and not just telling me what you have in mind?

There's a lot of random variation in football games and the quarterback doesn't always control the outcome of a game, even if he is on the field. I submit that we should sort this list of quarterbacks by the lower bound of a 95% confidence interval on their win percentage, i.e., their binomial proportion.

**Shoulder angel:**Don't forget to be completely honest about the assumptions you're making.

I didn't say this when talking about Gin Rummy games but the assumption with a binomial process is that all trials are iid (independent and identically distributed). Since I assumed my skill level remained the same over all games and that the AI skill doesn't change either, this assumption seems fair. In football games this assumption is weaker (but we're going to make it anyway). For example, Brett Favre has 117 games here. That would take at least 8 seasons to play that many games (ignoring the playoffs). A quarterback's skill can vary over time. I don't have the raw data but how the game plays out matters, too. Was the team ahead and then gave up a meaningless end-game touchdown? Were they behind near the end and taking more risks to win? Were they playing at home? Who were they playing? The point of this is that iid is not as strong of an assumption as it was in Gin Rummy. But, like I said, we're going to make it anyway.

So if we add some new columns to the table and sort it by the lower bound of our 95% confidence interval we get the following.

Player | GP | W | L | Win% | Win% LB | Win% UB | Win% Rank | LB Rank | Change |

Tom Brady | 76 | 54 | 22 | 0.711 | 0.595 | 0.809 | 1 | 1 | 0 |

Peyton Manning | 103 | 66 | 37 | 0.641 | 0.540 | 0.733 | 3 | 2 | 1 |

Jim Kelly | 69 | 44 | 25 | 0.638 | 0.513 | 0.750 | 4 | 3 | 1 |

Jay Schroeder | 52 | 34 | 18 | 0.654 | 0.509 | 0.780 | 2 | 4 | -2 |

Dan Marino | 107 | 64 | 43 | 0.598 | 0.499 | 0.692 | 8 | 5 | 3 |

Brett Favre | 117 | 68 | 49 | 0.581 | 0.486 | 0.672 | 11 | 6 | 5 |

Matt Hasselbeck | 65 | 40 | 25 | 0.615 | 0.486 | 0.733 | 5 | 7 | -2 |

Ken Stabler | 67 | 41 | 26 | 0.612 | 0.485 | 0.729 | 6 | 8 | -2 |

John Elway | 112 | 64 | 48 | 0.571 | 0.474 | 0.665 | 14 | 9 | 5 |

Jake Plummer | 53 | 32 | 21 | 0.604 | 0.460 | 0.735 | 7 | 10 | -3 |

Brian Sipe | 57 | 34 | 23 | 0.596 | 0.458 | 0.724 | 9 | 11 | -2 |

Terry Bradshaw | 59 | 35 | 24 | 0.593 | 0.457 | 0.719 | 10 | 12 | -2 |

Joe Montana | 69 | 40 | 29 | 0.580 | 0.455 | 0.698 | 13 | 13 | 0 |

Phil Simms | 71 | 40 | 31 | 0.563 | 0.440 | 0.681 | 17 | 14 | 3 |

Dave Krieg | 80 | 44 | 36 | 0.550 | 0.435 | 0.662 | 20 | 15 | 5 |

Ben Roethlisberger | 64 | 36 | 28 | 0.563 | 0.433 | 0.686 | 18 | 16 | 2 |

Eli Manning | 60 | 34 | 26 | 0.567 | 0.432 | 0.694 | 16 | 17 | -1 |

Dan Pastorini | 58 | 33 | 25 | 0.569 | 0.432 | 0.698 | 15 | 18 | -3 |

Joe Theismann | 50 | 29 | 21 | 0.580 | 0.432 | 0.718 | 12 | 19 | -7 |

Fran Tarkenton | 55 | 31 | 24 | 0.564 | 0.423 | 0.697 | 19 | 20 | -1 |

Congratulations to Brett Favre and John Elway. Sorry, Joe Theismann. Interesting also is that only four quarterbacks have lower bound numbers higher than 0.500. That's not to say the rest of the quarterbacks weren't any better than some average replacement but it does go to show just how hard it is to be a really dominant quarterback in close games over a long period of time. Again that comes back to just how much other factors come into play in the outcome of a football game.