Neat Little Combinatorics Problem

2 minute read

Published:

I’ll cut right to it. Consider the set $S = (49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46 )$. What is the expected value for the minimum of 6 samples from this set?

We could always just sample form the set to estimate the expected value. Here is a python script to do just that.

import numpy as np
x = np.array([49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46])

mins = []
for _ in range(1000):
    mins.append(np.random.choice(x,size = 6, replace = False).min())

print(np.mean(mins))

But that is estimating the mean. We can do better and directly compute it. Here is some python code to create all subsets from $S$ of size 6. Then, we simply take out the minimum from each subset and compute the mean.

import numpy as np
from itertools import combinations, groupby

x = np.array([49, 8, 48, 15, 47, 4, 16, 23, 43, 44, 42, 45, 46])
x = np.sort(x)

c = list(combinations(x,6))

mins = list(map(lambda x: x[0], c))

s = 0
for k, g in groupby(sorted(mins)):
    s+=k*(len(list(g))/len(mins))

print( s )

The script returns 8.18 repeating. Great, but we can do even better! If we can compute the probability density function, we can compute the mean analytically. Let’s consider a smaller problem to outline the solution.

Let our set in question be $(1,2,3,4,5)$. Let the minimum of a sample of 3 numbers from this set be the random variable $z$. Now, note there are $\binom{5}{3} = 10$ ways to choose 3 elements from a set of 5.

How many subsets exist where the minimum is 1? Well, if I sampled 1, then I would still have to pick 2 numbers from a possible 4 numbers larger than 1. There are $\binom{4}{2}$ ways to do this. So $p(z=1) = \binom{4}{2} / \binom{5}{3}$.

In a similar fashion, there are $\binom{3}{2}$ subsets where 2 is the minimum, and $\binom{2}{2}$ subsets where 3 is the minimum. There are no subsets where 4 or 5 are the minimum (why?). So that means the expected minimum value for this set would be

Whatever that sum happens to be. Here is how you could code up the analytic solution to our problem.

import numpy as np
from scipy.special import binom

x = np.array([ 4, 8, 15, 16, 23, 42, 43, 44, 45, 46, 47, 48, 49])
x = np.sort(x)

sample_size =6
sample_space = x[:-(sample_size-1)]
E = 0
for i,s in enumerate(sample_space,start = 1):

    E+= s*binom(x.size-i,sample_size-1)

print(E/binom(x.size, sample_size))

Full disclosure, this was on a job application (literally, on the job application), so sorry KiK for putting the answer out there, but the question was too fun not to write up!