Neat Little Combinatorics Problem

2 minute read

Published:

I’ll cut right to it. Consider the set $S = (49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46 )$. What is the expected value for the minimum of 6 samples from this set?

We could always just sample form the set to estimate the expected value. Here is a python script to do just that.

import numpy as np
x = np.array([49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46])

mins = []
for _ in range(1000):
    mins.append(np.random.choice(x,size = 6, replace = False).min())

print(np.mean(mins))

But that is estimating the mean. We can do better and directly compute it. Here is some python code to create all subsets from $S$ of size 6. Then, we simply take out the minimum from each subset and compute the mean.

import numpy as np
from itertools import combinations, groupby

x = np.array([49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46])
x = np.sort(x)

c = list(combinations(x,6))

mins = list(map(lambda x: x[0], c))

s = 0
for k, g in groupby(sorted(mins)):
    s+=k*(len(list(g))/len(mins))

print( s )

The script returns 8.18 repeating. Great, but we can do even better! If we can compute the probability density function, we can compute the mean analytically. Let’s consider a smaller problem to outline the solution.

Let our set in question be $(1,2,3,4,5)$. Let the minimum of a sample of 3 numbers from this set be the random variable $z$. Now, note there are $\binom{5}{3} = 10$ ways to choose 3 elements from a set of 5.

How many subsets exist where the minimum is 1? Well, if I sampled 1, then I would still have to pick 2 numbers from a possible 4 numbers larger than 1. There are $\binom{4}{2}$ ways to do this. So $p(z=1) = \binom{4}{2} / \binom{5}{3}$.

In a similar fashion, there are $\binom{3}{2}$ subsets where 2 is the minimum, and $\binom{2}{2}$ subsets where 3 is the minimum. There are no subsets where 4 or 5 are the minimum (why?). So that means the expected minimum value for this set would be

\[\operatorname{E}(z) = \dfrac{ \sum_{k = 1}^{3} k\binom{5-k}{2} }{\binom{5}{3}}\]

Whatever that sum happens to be. Here is how you could code up the analytic solution to our problem.

import numpy as np
from scipy.special import binom

x = np.array([ 4, 8, 15, 16, 23, 42, 43, 44, 45, 46, 47, 48, 49])
x = np.sort(x)

sample_size =6
sample_space = x[:-(sample_size-1)]
E = 0
for i,s in enumerate(sample_space,start = 1):

    E+= s*binom(x.size-i,sample_size-1)

print(E/binom(x.size, sample_size))

Full disclosure, this was on a job application (literally, on the job application), so sorry KiK for putting the answer out there, but the question was too fun not to write up!