In this article I would like to analyze the performance of one of the best sorting algorithms I know: Quicksort.
When I first saw the algorithm I was also told that on average it has \(O(n\log n)\) time complexity, however I never saw a proof until today.
Today I read this article and found a way to simplify their proof.

Let \(a_n\) be the number of comparisons used on average in Quicksort, on an array with \(n\) elements. (\(a_0=a_1=0\))
The pivot has to be compared \(n-1\) times, with each one of the elements.
Then, suppose the left part (the elements smaller than the pivot) has \(k\) elements, thus the right part (the elements bigger than the pivot) has \(n-1-k\) elements.
We have to recursively sort them, which means another \(a_k+a_{n-1-k}\) comparison.
We are looking for the average of the different possibilities, where \(0\le k\le n-1\), which we can express by \(\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}\).
Therefore:
\(a_n=n-1+\frac{\sum_{k=0}^{n-1}a_k}{n}\)

All we have to do is solve this equation:
\(a_n=n-1+\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}
\\=n-1+\frac{\sum_{k=0}^{n-1}a_k+\sum_{k=0}^{n-1}a_{n-1-k}}{n}
\\=n-1+\frac{2\sum_{k=0}^{n-1}a_k}{n}
\\\Rightarrow a_{n-1}=n-2+\frac{2\sum_{k=0}^{n-2}a_k}{n-1}\)
From here we get:
\(na_n=n(n-1)+2\sum_{k=0}^{n-1}a_k
\\(n-1)a_{n-1}=(n-1)(n-2)+2\sum_{k=0}^{n-2}a_k\)
Subtracting the two gives us:
\(na_n-(n-1)a_{n-1}=n(n-1)-(n-1)(n-2)+2a_{n-1}
\\\Rightarrow na_n=(n-(n-2))(n-1)+2a_{n-1}+(n-1)a_{n-1}
\\\Rightarrow na_n=2(n-1)+(n+1)a_{n-1}
\\\Rightarrow \frac{a_n}{n+1}=2\frac{n-1}{n(n+1)}+\frac{a_{n-1}}{n}
\\=2\frac{2n-(n+1)}{n(n+1)}+\frac{a_{n-1}}{n}
\\=2(\frac{2}{n+1}-\frac{1}{n})+\frac{a_{n-1}}{n}\)
Let \(b_n=\frac{a_n}{n+1}\), hence:
\(b_n=2(\frac{2}{n+1}-\frac{1}{n})+b_{n-1}
\\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+b_{n-2}
\\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+2(\frac{2}{n-1}-\frac{1}{n-2})+b_{n-3}
\\=\ldots
\\=2\sum_{k=1}^{n}(\frac{2}{k+1}-\frac{1}{k})
\\=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}\)

Until now, my proof is the same as the original, but now I choose a different direction.
I will try so simplify the proof as much as I can.
\(b_n=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}
\\=4-4+4\sum_{k=2}^{n+1}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k}
\\=-4+4\times\frac{1}{n+1}+4\sum_{k=1}^{n}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k}
\\=\frac{4}{n+1}-4+2\sum_{k=1}^{n}\frac{1}{k}
\\\approx2\sum_{k=1}^{n}\frac{1}{k}
\\\approx2\int_1^n\frac{1}{k}\,\mathrm{d}k
\\=2(\log n-\log 1)
\\=2\log n\)

Now all we have to do is find \(a_n\):
\(b_n=\frac{a_n}{n+1}
\\\Rightarrow a_n=(n+1)b_n
\\=2(n+1)\log n\)
Or:
\(a_n=O(n\log n)\)

We haven’t found the exact result, but it is very close to the real value, to prove that Quicksort is \(O(n\log n)\) algorithm.

Tagged with:
 

5 Responses to Quicksort Average Case Time Complexity

  1. Nilay says:

    thanks man

  2. priyank Chheda says:

    Can you plz explain the 3 step:

    How you write a¬n-1-k=ak
    It means that
    n-1-k=k
    2k=n-1
    K=(n-1)/2
    As n-1 is the total no. of element, k is the mid of array
    And that is the best case condition????

  3. nitish712 says:

    Very nice proof… :) thanx alot!!

  4. sharpstill says:

    Last server step has a small flaw. Because 2∫ 1/kdk = ln n – ln 1 = ln n not log n ( log the base is 2,but ln the base is e)
    so the last result’s coefficient is 1.3863 = 2 / log e

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.