Today I came across an algorithmic problem.
I wanted to choose a random sub array of size \(k\) from an array of size \(n\), while maintaining its order.
For example, if we have the array \(\{0,1,2,3,4,5,6,7,8,9\}\) a possible result for \(k=4\) is the array \(\{1,4,7,9\}\).

I came up with an algorithm with average case time complexity \(O(n)\).

The Algorithm

Let \(A\) be the original array, with elements of type \(T\).
Let \(k\) be the number of elements we want in the sub array.
Let \(n=length(A)\).

  1. Let \(B=T[k]\) (an array of type \(T\) with \(k\) elements)
  2. Let \(keys=bool[n]\) (an array with \(n\) boolean elements)
  3. Let \(flag=(k\le n / 2)\)
  4. Let \(stop=(flag ? k : n-k)\)
  5. \(for(i=0;i\lt stop;i=i+1)\)
    1. \(while(true)\)
      1. Let \(key=random(0,n-1)\)
      2. \(if(keys[key]=false)\)
        1. \(keys[key]=true\)
        2. \(break\)
  6. Let \(p=0\)
  7. \(for(i=0;i\lt k;i=i+1)\)
    1. \(while(keys[p] \oplus flag)\) (where \(\oplus\) is xor)
      1. \(p=p+1\)
    2. \(B[i]=array[p]\)
    3. \(p=p+1\)
  8. \(return\space B\)

Average Case Time Analysis

Stages 1 and 2 depends on the language’s implementation for array declaration.
However they are always \(O(n)\).

Stages 3, 4 and 6 are clearly \(O(1)\).

Let’s take a look at stage 7.
Since \(p\) can never be bigger than \(n\), and \(i\) can never be bigger than \(k\), and each iteration is \(O(1)\), these stages are \(O(n+k)=O(n)\).

Now, let’s consider stage 5.
Within each step of the loop we are trying to find a number between \(0\) and \(n-1\) which wasn’t picked earlier.
The distribution of the number of guesses it takes to find such a number, is a geometric distribution with probability \(\frac{n-i}{n}\), hence the mean is \(\frac{n}{n-i}\).
We know that \(i\) goes from \(0\) to \(min(k,n-k)-1\) which is less than \(n/2\).
Each step in the while loop is \(O(1)\).
Therefore the worst case complexity, on average, is \(O(\sum_{i=0}^{n/2-1}\frac{n}{n-i})=O(\sum_{i=0}^{n/2-1}\frac{n}{n/2})=O((n/2)\frac{n}{n/2})=O(n)\).

Summing all of the stages together, we can see that the algorithm has an average case time complexity of \(O(n)\).

C# implementation

I wrote a simple implementation of the algorithm in C#:

public static T[] RandomSubArray<T>(T[] array, int k)
{
	int n = array.Length;

	T[] ret = new T[k];
	bool[] keys = new bool[n];
	Random rand = new Random();

	bool flag = k <= n / 2;

	for (int i = 0, stop = flag ? k : n - k; i < stop; i++)
	{
		while (true)
		{
			int key = rand.Next(0, n);
			if (!keys[key])
			{
				keys[key] = true;
				break;
			}
		}
	}

	int p = 0;
	for (int i = 0; i < k; i++)
	{
		while (keys[p] ^ flag)
		{
			p++;
		}
		ret[i] = array[p++];
	}

	return ret;
}
Tagged with:
 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.