## Tuesday, December 30, 2014

### Number of New Pairs from N Old Pairs

I mentioned in my previous post how I had offered a \$5 cash prize to anyone who could solve a counting problem. This was offered to people who talked to me during a poster presentation on some research I had been working on. The problem was relevant to the research I was working on as a PhD student at the University of Utah. Here I am standing in front of my poster. Nothing says "SCIENCE!" like a bow-tie and a white lab coat, right? The problem is this: A room is filled with people who are all at this moment shaking hands with someone else in the room (there are an even number of people), making N pairs of people. If everyone stops shaking hands with their current partner and then starts shaking hands with someone else, how many ways can we make N new pairs where no pair is the same as before? To those who were offered this \$5 problem, I think they had to solve this for 6 pairs of people, if I remember right. Faculty had to solve for the general solution. It also had to be solved by midnight that night and submitted via email to qualify (about 8-10 hours).

This is not an easy problem (well, it wasn't for me). I don't remember how long it took me but it took quite a bit of head-scratching and took at least three or four pages of notes and simple examples in my notebook to come up with a general solution. That's why I felt confident that I wouldn't have to give the \$5 prize away, which I did not have to do. I probably should have offered more to pull more people into listening me talk about my research but as a student I felt enough risk offering just \$5.

First, let's solve the simpler problem. How many pairs can we make from $2N$ people in general? Answer: $N$. Oh, you wanted to know how many ways can we make $N$ pairs from $2N$ people? Sorry. (Many of you are probably not laughing at that. It sounded funnier in my head before I wrote it all out.)

The number of ways we can make $N$ pairs from $2N$ people (or things or whatever)

\frac{\prod_{i=1}^{N} \binom{2i}{2}}{N!}.

We start $i$ equaling $1$ but it may make more sense to start from $N$ and work our way down to $1$ on that product. First we find $\binom{2N}{2}$, making our first pair of people from the $2N$ available to us. Then $\binom{2(N-1)}{2}$ to make the second pair from the $2(N-1)$ left and so on until we only have two left for $\binom{2}{2}$, which of course is just equal to one. So we could start $i$ at $2$ but I feel starting at $1$ is more instructive and complete. Then we divide by $N!$ because we don't care about the order of our $N$ pairs.

The answer for the \$5 problem in its general form is g(N) = \frac{\prod_{i=1}^{N} \binom{2i}{2}}{N!} - \sum_{i=1}^{N} \binom{N}{i} g(N-i) where$g(0)=1$and$n$is the number of pairs. (This forms integer sequence A053871 from The Online Encyclopedia of Integer Sequences. That link gives some additional, interesting formulas for calculating the same sequence.) The product you'll recognize as the equation for forming$N$pairs from$2N$people. But then it gets smaller, which you'd expect, by subtracting off a sum of$N$terms. Let's talk about that sum. The function$g(N)$is the number of ways we can form$N$new, unique pairs from$N$old pairs. The number of ways we can do that is a subset of all the ways we can form$N$pairs. The terms we are subtracting off are all of the ways we form$N$pairs where$i$pairs were the same as before and$N-i$are new. There are$\binom{N}{i}$ways to select$i$old pairs and$g(N-i)$ways to create$N-i$new pairs. Subtract those terms away and we're left with all the ways we can make$N$new pairs. It's recursively defined but using memory to store previously computed values would avoid long computations. Seems simple enough. In my notebook I had to start from one pair and work my way up to, I think, four pairs to figure out the pattern. It was fun. When I was looking through some old pictures I found this one of your's truly the day I did my dissertation defense: My lab-mates wrote the "Good Try Job, Merrick!" message on the board while I was taking questions from my graduate committee without the public present. They were a pretty fun group. My lab-mates, that is. Unless you were on my graduate committee and are reading this right now. You were also fun. Oh, wait. You don't hold any power over me anymore. You were not fun. Your questions were hard and... Oh, sorry. I kinda forgot what I was talking about. Ah, the picture! There's my solution for$g(N)$! Fun! And to my graduate committee, you may not have been fun but I was stretched and I passed. What more could I ask for? ## Monday, December 22, 2014 ### More on Observing All Outcomes in M Events I wrote about the probability of observing all N, equally likely outcomes over M events here. I was chatting with Neal, who served as my PhD advisor, about this problem and he reminded me that he had given me a similar problem in the random processes class I took from him 5 years ago (which I had forgotten). The problem was a "gotta collect them all" problem. If you are trying to collect all the toys of the available set offered in, for example, McDonald's Happy Meals and assume each toy you receive can be modeled as an iid event, how long will it take to collect them all? In the class we modeled the process as a Markov chain. In the first state you have a probability of 1 of getting a toy that you don't already have (because you start with having none). In the second state you have a probability of$\frac{N-1}{N}$of getting a new toy (moving on to the next state) and a probability of$\frac{1}{N}$that you get a toy you already have (staying in same state). This continues all the way to the final state where you have collected them all. After finally publishing my previous post I began to think more about this. Could I get a distribution from that as well? Short answer: yes. Longer answer: yes and let's talk about it more! The reason you can (easily) model the process as a Markov chain is that the probability of observing events is uniformly distributed. That's important because it means the state transition probabilities are independent of the distribution of outcomes. If you had non-uniformly distributed outcomes, a Markov chain's state transition probabilities would change depending on which particular event had occurred. It's not impossible to model but not simple, either. So the Markov chain we use to model this process is a relatively simple one. For a process with$N$possible outcomes, we have a Markov chain with$N+1$states. The transition probabilities for each state are as I described previously, progressively getting harder (less likely) to move from the current state to the next. A really nice property of Markov chains is that you can take the state transition probability matrix$\mathbf{P}$and use it to quickly calculate the probability of being in any given state after$T$steps by taking the matrix to the$T$th power as$\mathbf{P}^T$. We'll also assume we always start in the first state of the Markov chain. The probability of being in any given state after$T$steps... that's the CDF! It's the CDF for our final state, anyway. That's because that probability goes to 1 as$T$goes to infinity for our process. A discrete derivative, again, gives the pmf. Even better is that this way of calculating the distribution is way, way, way, faster than doing using the integer counting method I described in my previous post. (This is probably due to the use of the BigInt type in Julia, which is an arbitrary-precision integer type that is necessary for the large numbers involved.) But they both work and give the same answers. Counting huge integers like these also requires using arbitrary-precision data types whereas using a Markov chain can give good results with native 32 and 64 bit floating point numbers. To show the difference in speed I'll give an example. To calculate the distribution for$N=12$out to$M=75$using my outcome counting method, it took my Julia code about 200 seconds. Doing the same thing with a Markov chain state transition probability matrix took around 380 micro seconds. It was also a heck of a lot easier to think through and code up. But boy do I love a good counting problem. (As part of a poster presentation I made before graduation I offered a \$5 cash prize to anyone that could solve a difficult counting problem related to the research I was presenting. I'll have to give that problem its own post sometime.)

Calculations that are orders of magnitude faster mean I can give distributions for outcome spaces that are orders of magnitude bigger. Here's the distribution for $N=512$ (Hey! You think twelve is big? How about five hundred and twelve?!) :
And again but with a log scale on the x-axis:

Hmm... That actually looks right tailed, even after scaling the x-axis. I guess it's not a log-normal. What is this distribution? Maybe skew-normal? or log-skew-normal?

Let me give another example of where you could apply this problem. If you have a communication system using $N$ symbols to represent data transmitted across your medium, you could measure how often each symbol occurs to make sure it's close to uniform. But you could also count how many symbol transmissions you need before you observe all $N$ symbols at least once. The distribution of this transmission count would follow this log-normal-like distribution I've described.

Similarly, you could use it as one test in a suite of tests for uniformity of a random number generator. You would use $B$ bits to form $2^B$ possible outcomes. The number of $B$ bit sequences needed to observe all $2^B$ possible outcomes would follow this distribution, whatever it is. It should also be true for all overlapping sequences.

But the question still is: what is this distribution?

## Tuesday, December 2, 2014

### Random Assignment Success Probabilities in Sunday School

A few Sundays ago I was sitting in Sunday School during church. The teacher put 9 scripture references on the chalkboard that would be discussed during the class hour (from the Old Testament, Lesson 37). She then said something to the effect of, "Will everyone please look up one of these scriptures."

Hmm.... what are the odds that every scripture will be chosen by someone in the room? Assumptions: everyone in the room actually looks up a scripture, no one in the room collaborates with anyone else to decide which scripture to choose, and all scriptures are equally likely to be chosen. Under those assumptions we can model each person's choice as an iid event.

Oh, I'm supposed to be listening to a lesson about.... what's it about? Okay, we're reading from Isaiah. Let's see...

There's nine scriptures, (about) thirty people, how do I figure this out?

We can think of a 30 digit number where only digits 1 through 9 are used. As long as each number between 1 and 9 shows up at least once in the 30 digit number, we're good! Dang, that's complicated. Let's simplify, first.

We can say we have $M$ events (30 people) and $N$ outcomes (9 scriptures). To simplify, let's consider 2 events and 2 outcomes. The first event will randomly end up in one of the two outcomes with probability 1 (it doesn't matter which outcome). The second event has a probability of 0.5 of ending in the outcome opposite the first. The product of 1 and 0.5 gives 0.5 (iid). So we end with a probability 0.5 that with two events (people) and two possible outcomes (scriptures) that both outcomes will occur.

So, intuitively, if we have more events, we should have a greater probability of having both outcomes occur. The more times we flip a coin, the more likely we are to see both a head and a tail at least once. How much more likely? For three events, let's just list out the possible outcomes
000
001
010
011
100
101
110
111
where 0 and 1 are the two different outcomes. (That's a 3 digit number with 2 possible numbers for each digit.) For three events and two possible outcomes, we have 8 ways things can shake out. Of those, 6 are successful (at least one 1 and at least one 0). Therefore, $p=\frac{6}{8}=0.75$. The probability went up like we expected.

We can easily calculate 8 as $2^3$ or $N^M$. How do we calculate 6? I'll say right now it's NOT $8-2$*.

(*You most certainly can calculate it as $(N^M)-X$. But X is not necessarily 2 as a general rule but calculated as the complement of what will follow.)

Rather than think about long numbers using odd bases, perhaps it's better to think of buckets and balls, where $N$ is the number of buckets and $M$ is the number of balls. Each ball randomly lands into one of the $N$ buckets. What's the likelihood of each bucket having at least one ball after all is said and one?

So when $N=9$ and $M=30$, there are over $4 \times 10^{28}$ possible outcomes, that is, we can't just list out all the outcomes and count the ones that work. We have to count smarter than that, which we can do.

For the example above, we calculate 6 as $\binom{3}{2} + \binom{3}{1} = 6$. If we had four events, we'd calculate the number of successful outcomes as $\binom{4}{1} + \binom{4}{2} + \binom{4}{3} = 14$. Using $n$ choose $k$ notation is correct but perhaps not the most instructive as it is written. Writing the binomial coefficient out using factorials will help you understand going forward.
$\binom{4}{1} = \frac{4!}{1!3!}$
The denominator is $1!$ times $3!$, where 1 and 3 are the number of events resulting in different outcomes, one time for outcome 0 or A and three times for outcome 1 or B (or whatever). In the buckets with balls example, that would be one ball in bucket A and three in bucket B.

With multiple possible outcomes (multiple scriptures to choose from), we have the product of $N$ factorials in the denominator, which is a multinomial coefficient. The sum of all $N$ numbers (not their factorials) is $M$. An example: $M=10$, $N=3$ (where the outcomes are A, B, and C), and we ended up seeing A twice, B five times, and C three times. How many ways can we have this outcome? Because order doesn't matter (it's a combination and not a permutation), we can calculate it as $\frac{10!}{2!5!3!}$. Or if we saw A five times, B zero times, and C five times, we would have $\frac{10!}{5!0!5!}$.

So to guarantee all $N$ outcomes are seen (all nine scriptures are chosen or all nine buckets each have at least one ball) after $M$ events, we must have all numbers in the denominator be greater than or equal to 1. So now we must also count all the ways we can add $N$ numbers between $0$ and $M$ to $M$. Then use only those where all $N$ numbers are greater than 1. Note here that ways to add to $M$ are permutations. For example, consider again $\frac{10!}{5!0!5!}$. Is that the same as $\frac{10!}{0!5!5!}$ or $\frac{10!}{5!5!0!}$? No. They give the same count but are not the same outcomes for our $M$ events. But if we only have one of the three, we can quickly calculate that we need three as $\binom{3}{2}$ or $\binom{3}{1}$ (2 fives and 1 zero).

Funny enough, I had already written a python script to enumerate all the ways you can add $N$ numbers between $0$ and $M$ to $M$ just a few months ago. It gets really slow as the numbers get big (there's lots of room for optimization in my algorithm, I'm sure). Lucky for me this is not a new problem. The different ways you can add positive integers to $N$ are called partitions. And an efficient algorithm for generating partitions are given by Donald Knuth in Volume 4A of his "The Art of Computer Programming" book series. And this algorithm is already implemented in Julia's standard library of functions. Is Julia's implementation of Knuth's algorithm better than my Python implementation of my own algorithm? Yes. Absolutely, yes. Not only is it faster but I can call either partitions(m), which generates all partitions that sum to m, or partitions(m,n), which generates all partitions of size n that sum to m. How convenient! (I've reversed the Julia documentation's n,m notation to match ours.)

Now, for each partition given by partitions(m,n), we'll calculate its multinomial coefficient. Each of those is multiplied by the multinomial coefficient of the counts of the digits in the partition. An example: partitions(10,5) will give seven different partitions but one of those will be [4,3,1,1,1]. The multinomial coefficient for this is $\binom{10}{4,3,1,1,1}$ or $\frac{10!}{4!3!1!1!1!}$. This is multiplied by $\binom{5}{1,1,3}$ or $\frac{5!}{1!1!3!}$, which comes from the fact that there are 5 digits in the partition where 1 is a four, 1 is a three, and 3 are ones. This is the number of ways we can rearrange our partition uniquely. This product of multinomials is calculated for each partition and these products are summed. This sum is the number of ways we can successfully observe all $N$ outcomes over $M$ trials. Divide this sum by $N^M$ to get the probability of observing all $N$ outcomes over $M$ or fewer trials. That's critical. Or fewer. We mindlessly perform $M$ trials and don't care when we succeeded in observing all $N$ outcomes. Maybe it happened right at trial $M$ but maybe it happened before that (if $M>N$). In other words, we are calculating values of the CDF.

So for our Sunday School teacher, the probability that all 9 of the scriptures will be chosen by the class of 30 is about 75.6% (again, under the assumptions we made earlier). So most weeks she won't have to worry about having to wait for someone to look up a scripture that no one else chose to look up ahead of time if she employs this method of scripture assignment.

A discrete derivative of this CDF will give us the pmf. But what does the value from the pmf mean? It is the probability of requiring exactly $M$ trials to observe the $N$ outcomes. So intuitively we would expect the probability to be low when $M$ is close to $N$. You would need to be very lucky to roll each of the six values on a standard, six-sided die in just six rolls (about a 1.5% chance). But you would also expect very large values of $M$ to be unlikely, too. You probably won't need to roll that die 100 times to see all six sides at least once. For that to happen you would have needed five of the outcomes to occur in the first 99 rolls and the sixth to occur on the 100th roll. That seems like too many rolls. So somewhere in the middle is our most likely outcome.

Let's look at some examples. The following is a plot of the pmf when $N=6$. Again, these are the probabilities that we will complete our observation of all $N$ outcomes in $M$ events exactly. The maximum probability (or mode) occurs at $M=11$. The median is approximately $M=13$. The expected value is approximately $14.7$ (using values calculated out to $M=50$).

(And these plots were made in Julia 0.3.2 using Gadfly 0.3.9 and are zoomable. Neat!)

That's an interesting shape. Let's look at some more for $N=8$ and $N=12$.

Hmm... that reminds me of a log-normal distribution. But if a Poisson distribution can approximate a normal distribution, is there a similar discrete distribution that approximates a log-normal (for large $N$)?

The following is a plot for $N=12$ again but plotting versus the natural log of the number events.

I like it. Maybe this discrete distribution could approximate a log-normal?

Voice of my ego: Poisson gets a cool distribution and process named after himself. This is your chance to have something named after you!
Voice of reason: No one has ever thought of this before? I doubt it. There's no way I'm the first. (I'll hedge by saying the probability is close to 0 that I'm the first.)

Strangely enough, I'm having a hard time finding anything about a discrete distribution that approximates a log-normal distribution out on the sea of Google and Bing the way a Poisson distribution approximates a normal distribution. Anyone know something I don't and wishes to share?