# Generalized Random Sub Array Algorithm

*By Shay Ben Moshe On November 2, 2011 · Leave a Comment · In Algorithms*

After discussing with some friends on StartupSeeds, I came up with a generalized random sub array algorithm.

This algorithm can be implemented with different data-structures, and it’s complexity is \(O(n)\) or \(O(k\log\space k)\) for 2 specific implementations.

We need a data-structure that can contain integers.

It must have 2 methods which are add (adds an integer) and contains (whether an integer is in or not).

We also need to be able to iterate it, I will use foreach loop for that.

## The Algorithm

Let \(A\) be the original array, with elements of type \(T\).

Let \(k\) be the number of elements we want in the sub array.

Let \(n=length(A)\).

Let \(D\) be the data-structure.

- Let \(B=T[k]\) (an array of type \(T\) with \(k\) elements)
- Let \(flag=(k\le n / 2)\)
- Let \(stop=(flag ? k : n-k)\)
- \(for(i=0;i\lt stop;i=i+1)\)
- \(while(true)\)
- Let \(key=random(0,n-1)\)
- \(if(!D.contains(key))\)
- \(D.add(key)\)
- \(break\)

- \(while(true)\)
- \(if(flag)\)
- Let \(p=0\)
- \(foreach(key\space in\space D)\)
- \(B[p]=A[key]\)
- \(p=p+1\)

- \(else\)
- \(keys=bool[n]\)
- \(foreach(key\space in\space D)\)
- \(keys[key]=true\)

- Let \(p=0\)
- \(for(i=0;i\lt k;i=i+1)\)
- \(while(keys[p])\)
- \(p=p+1\)

- \(B[i]=array[p]\)
- \(p=p+1\)

- \(while(keys[p])\)

- \(return\space B\)

## Complexity Analysis

I will calculate each stage’s complexity separately, and then sum them up.

But first:

Let \(T_c(j)\) be the complexity of \(D.contains\) when there are \(j\) elements in it.

Let \(T_a(j)\) be the complexity of \(D.add\) when there are \(j\) elements in it.

Let \(T_i(j)\) be the complexity of iterating the data-structure’s elements when there are \(j\) elements in it.

Stage 1 is \(O(k)\) or \(O(1)\) depending on the language’s implementation for array declarations, we will use \(O(k)\).

Stage 2 and 3 are clearly \(O(1)\).

Stage 4 is more complex to calculate and we can only calculate it for the average case.

In each of the iterations in the for-loop we are looking for a number between \(0\) and \(n-1\) that is not already in \(D\).

Because there are already \(i\) elements in the data-structure, the distribution of the number of while-loop iterations is a geometric distribution with probability \(\frac{n-i}{n}\), therefore the mean number of iterations is \(\frac{n}{n-i}\).

So this stage’s complexity is:

\[\displaylines{O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)\frac{n}{n-i}+T_a(i)))

\\

=O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)\frac{n}{n/2}+T_a(i)))

\\

=O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)+T_a(i)))}\]

Stage 5 is simply \(O(T_i(k))\).

Stage 6 is also quite simple.

There is an array deceleration (\(O(n)\) as mentioned before), iteration over the data-structure \(O(T_i(n-k))\) and an iteration on the arrays (\(O(n)\).

Summing up to \(O(T_i(n-k)+n)\).

But we know that \(k\ge n/2\), and therefore it is also \(O(T_i(n-k)+k)\).

Bear in mind that only one of stages 5 and 6 occurs, hence there total complexity of these stages is \(O(T_i(min(k,n-k))+k)\).

Stage 7 is also \(O(1)\).

Summing stages 1 to 7 gives us:

\[O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)+T_a(i))+T_i(min(k,n-k))+k)\]

## Implementation #1 – Boolean Array

This implementation was introduced in my previous article, Generating Random Sub Array.

In this implementation the data-structure acts the same as the array \(keys\) that is used in stage 6.1.

In this case \(T_c(j)=1\), \(T_a(j)=1\) and \(T_i(j)=n\).

With that in mind the algorithms complexity can be found easily: \(O(\sum_{i=0}^{min(k,n-k)-1}(1+1)+n+k)=O(n)\)

## Implementation #2 – Self-Balancing Binary Search Tree

In this implementation our data-structure is a binary search tree.

This implementation is much better than the one mentioned above when \(k\ll n\).

In this case \(T_c(j)=\log\space j\), \(T_a(j)=\log\space j\) and \(T_i(j)=j\).

We can again easily find the total time complexity.

For simplicity I will assume that \(k\le n/2\).

\(O(\sum_{i=0}^{k-1}(\log\space j+\log\space j)+k+k)=O(k\log\space k)\)

### Recent Comments

- liggywoth on Password Entropy
- liggywoth on Password Entropy
- Stephen Long on Password Entropy
- PHP Native Array vs. SPL SplFixedArray vs. ‘SPL Arraylist’ | Anthony's Blog on PHP’s native array vs SplFixedArray performance
- Shay Ben Moshe on Password Entropy

### Categories