In this article I would like to analyze the performance of one of the best sorting algorithms I know: Quicksort.
When I first saw the algorithm I was also told that on average it has $$O(n\log n)$$ time complexity, however I never saw a proof until today.

Let $$a_n$$ be the number of comparisons used on average in Quicksort, on an array with $$n$$ elements. ($$a_0=a_1=0$$)
The pivot has to be compared $$n-1$$ times, with each one of the elements.
Then, suppose the left part (the elements smaller than the pivot) has $$k$$ elements, thus the right part (the elements bigger than the pivot) has $$n-1-k$$ elements.
We have to recursively sort them, which means another $$a_k+a_{n-1-k}$$ comparison.
We are looking for the average of the different possibilities, where $$0\le k\le n-1$$, which we can express by $$\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}$$.
Therefore:
$$a_n=n-1+\frac{\sum_{k=0}^{n-1}a_k}{n}$$

All we have to do is solve this equation:
$$a_n=n-1+\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n} \\=n-1+\frac{\sum_{k=0}^{n-1}a_k+\sum_{k=0}^{n-1}a_{n-1-k}}{n} \\=n-1+\frac{2\sum_{k=0}^{n-1}a_k}{n} \\\Rightarrow a_{n-1}=n-2+\frac{2\sum_{k=0}^{n-2}a_k}{n-1}$$
From here we get:
$$na_n=n(n-1)+2\sum_{k=0}^{n-1}a_k \\(n-1)a_{n-1}=(n-1)(n-2)+2\sum_{k=0}^{n-2}a_k$$
Subtracting the two gives us:
$$na_n-(n-1)a_{n-1}=n(n-1)-(n-1)(n-2)+2a_{n-1} \\\Rightarrow na_n=(n-(n-2))(n-1)+2a_{n-1}+(n-1)a_{n-1} \\\Rightarrow na_n=2(n-1)+(n+1)a_{n-1} \\\Rightarrow \frac{a_n}{n+1}=2\frac{n-1}{n(n+1)}+\frac{a_{n-1}}{n} \\=2\frac{2n-(n+1)}{n(n+1)}+\frac{a_{n-1}}{n} \\=2(\frac{2}{n+1}-\frac{1}{n})+\frac{a_{n-1}}{n}$$
Let $$b_n=\frac{a_n}{n+1}$$, hence:
$$b_n=2(\frac{2}{n+1}-\frac{1}{n})+b_{n-1} \\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+b_{n-2} \\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+2(\frac{2}{n-1}-\frac{1}{n-2})+b_{n-3} \\=\ldots \\=2\sum_{k=1}^{n}(\frac{2}{k+1}-\frac{1}{k}) \\=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}$$

Until now, my proof is the same as the original, but now I choose a different direction.
I will try so simplify the proof as much as I can.
$$b_n=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k} \\=4-4+4\sum_{k=2}^{n+1}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k} \\=-4+4\times\frac{1}{n+1}+4\sum_{k=1}^{n}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k} \\=\frac{4}{n+1}-4+2\sum_{k=1}^{n}\frac{1}{k} \\\approx2\sum_{k=1}^{n}\frac{1}{k} \\\approx2\int_1^n\frac{1}{k}\,\mathrm{d}k \\=2(\log n-\log 1) \\=2\log n$$

Now all we have to do is find $$a_n$$:
$$b_n=\frac{a_n}{n+1} \\\Rightarrow a_n=(n+1)b_n \\=2(n+1)\log n$$
Or:
$$a_n=O(n\log n)$$

We haven’t found the exact result, but it is very close to the real value, to prove that Quicksort is $$O(n\log n)$$ algorithm.

Tagged with:

### 5 Responses to Quicksort Average Case Time Complexity

1. Nilay says:

thanks man

2. priyank Chheda says:

Can you plz explain the 3 step:

How you write a¬n-1-k=ak
It means that
n-1-k=k
2k=n-1
K=(n-1)/2
As n-1 is the total no. of element, k is the mid of array
And that is the best case condition????

3. nitish712 says:

Very nice proof… thanx alot!!

4. sharpstill says:

Last server step has a small flaw. Because 2∫ 1/kdk = ln n – ln 1 = ln n not log n ( log the base is 2,but ln the base is e)
so the last result’s coefficient is 1.3863 = 2 / log e

• When I write log I mean base e. This is the convention for mathematicians, and CS people usually write lg for log in base 2.