# Generating Random Sub Array

*By Shay Ben Moshe On October 9, 2011 · Leave a Comment · In Algorithms*

Today I came across an algorithmic problem.

I wanted to choose a random sub array of size \(k\) from an array of size \(n\), while maintaining its order.

For example, if we have the array \(\{0,1,2,3,4,5,6,7,8,9\}\) a possible result for \(k=4\) is the array \(\{1,4,7,9\}\).

I came up with an algorithm with average case time complexity \(O(n)\).

## The Algorithm

Let \(A\) be the original array, with elements of type \(T\).

Let \(k\) be the number of elements we want in the sub array.

Let \(n=length(A)\).

- Let \(B=T[k]\) (an array of type \(T\) with \(k\) elements)
- Let \(keys=bool[n]\) (an array with \(n\) boolean elements)
- Let \(flag=(k\le n / 2)\)
- Let \(stop=(flag ? k : n-k)\)
- \(for(i=0;i\lt stop;i=i+1)\)
- \(while(true)\)
- Let \(key=random(0,n-1)\)
- \(if(keys[key]=false)\)
- \(keys[key]=true\)
- \(break\)

- \(while(true)\)
- Let \(p=0\)
- \(for(i=0;i\lt k;i=i+1)\)
- \(while(keys[p] \oplus flag)\) (where \(\oplus\) is xor)
- \(p=p+1\)

- \(B[i]=array[p]\)
- \(p=p+1\)

- \(while(keys[p] \oplus flag)\) (where \(\oplus\) is xor)
- \(return\space B\)

## Average Case Time Analysis

Stages 1 and 2 depends on the language’s implementation for array declaration.

However they are always \(O(n)\).

Stages 3, 4 and 6 are clearly \(O(1)\).

Let’s take a look at stage 7.

Since \(p\) can never be bigger than \(n\), and \(i\) can never be bigger than \(k\), and each iteration is \(O(1)\), these stages are \(O(n+k)=O(n)\).

Now, let’s consider stage 5.

Within each step of the loop we are trying to find a number between \(0\) and \(n-1\) which wasn’t picked earlier.

The distribution of the number of guesses it takes to find such a number, is a geometric distribution with probability \(\frac{n-i}{n}\), hence the mean is \(\frac{n}{n-i}\).

We know that \(i\) goes from \(0\) to \(min(k,n-k)-1\) which is less than \(n/2\).

Each step in the while loop is \(O(1)\).

Therefore the worst case complexity, on average, is \(O(\sum_{i=0}^{n/2-1}\frac{n}{n-i})=O(\sum_{i=0}^{n/2-1}\frac{n}{n/2})=O((n/2)\frac{n}{n/2})=O(n)\).

Summing all of the stages together, we can see that the algorithm has an average case time complexity of \(O(n)\).

## C# implementation

I wrote a simple implementation of the algorithm in C#:

public static T[] RandomSubArray<T>(T[] array, int k) { int n = array.Length; T[] ret = new T[k]; bool[] keys = new bool[n]; Random rand = new Random(); bool flag = k <= n / 2; for (int i = 0, stop = flag ? k : n - k; i < stop; i++) { while (true) { int key = rand.Next(0, n); if (!keys[key]) { keys[key] = true; break; } } } int p = 0; for (int i = 0; i < k; i++) { while (keys[p] ^ flag) { p++; } ret[i] = array[p++]; } return ret; }

### Recent Comments

- liggywoth on Password Entropy
- liggywoth on Password Entropy
- Stephen Long on Password Entropy
- PHP Native Array vs. SPL SplFixedArray vs. ‘SPL Arraylist’ | Anthony's Blog on PHP’s native array vs SplFixedArray performance
- Shay Ben Moshe on Password Entropy

### Categories