Today I came across an algorithmic problem.
I wanted to choose a random sub array of size $$k$$ from an array of size $$n$$, while maintaining its order.
For example, if we have the array $$\{0,1,2,3,4,5,6,7,8,9\}$$ a possible result for $$k=4$$ is the array $$\{1,4,7,9\}$$.

I came up with an algorithm with average case time complexity $$O(n)$$.

## The Algorithm

Let $$A$$ be the original array, with elements of type $$T$$.
Let $$k$$ be the number of elements we want in the sub array.
Let $$n=length(A)$$.

1. Let $$B=T[k]$$ (an array of type $$T$$ with $$k$$ elements)
2. Let $$keys=bool[n]$$ (an array with $$n$$ boolean elements)
3. Let $$flag=(k\le n / 2)$$
4. Let $$stop=(flag ? k : n-k)$$
5. $$for(i=0;i\lt stop;i=i+1)$$
1. $$while(true)$$
1. Let $$key=random(0,n-1)$$
2. $$if(keys[key]=false)$$
1. $$keys[key]=true$$
2. $$break$$
6. Let $$p=0$$
7. $$for(i=0;i\lt k;i=i+1)$$
1. $$while(keys[p] \oplus flag)$$ (where $$\oplus$$ is xor)
1. $$p=p+1$$
2. $$B[i]=array[p]$$
3. $$p=p+1$$
8. $$return\space B$$

## Average Case Time Analysis

Stages 1 and 2 depends on the language’s implementation for array declaration.
However they are always $$O(n)$$.

Stages 3, 4 and 6 are clearly $$O(1)$$.

Let’s take a look at stage 7.
Since $$p$$ can never be bigger than $$n$$, and $$i$$ can never be bigger than $$k$$, and each iteration is $$O(1)$$, these stages are $$O(n+k)=O(n)$$.

Now, let’s consider stage 5.
Within each step of the loop we are trying to find a number between $$0$$ and $$n-1$$ which wasn’t picked earlier.
The distribution of the number of guesses it takes to find such a number, is a geometric distribution with probability $$\frac{n-i}{n}$$, hence the mean is $$\frac{n}{n-i}$$.
We know that $$i$$ goes from $$0$$ to $$min(k,n-k)-1$$ which is less than $$n/2$$.
Each step in the while loop is $$O(1)$$.
Therefore the worst case complexity, on average, is $$O(\sum_{i=0}^{n/2-1}\frac{n}{n-i})=O(\sum_{i=0}^{n/2-1}\frac{n}{n/2})=O((n/2)\frac{n}{n/2})=O(n)$$.

Summing all of the stages together, we can see that the algorithm has an average case time complexity of $$O(n)$$.

## C# implementation

I wrote a simple implementation of the algorithm in C#:

public static T[] RandomSubArray<T>(T[] array, int k)
{
int n = array.Length;

T[] ret = new T[k];
bool[] keys = new bool[n];
Random rand = new Random();

bool flag = k <= n / 2;

for (int i = 0, stop = flag ? k : n - k; i < stop; i++)
{
while (true)
{
int key = rand.Next(0, n);
if (!keys[key])
{
keys[key] = true;
break;
}
}
}

int p = 0;
for (int i = 0; i < k; i++)
{
while (keys[p] ^ flag)
{
p++;
}
ret[i] = array[p++];
}

return ret;
}
Tagged with: