Quicksort Average Case Time Complexity
In this article I would like to analyze the performance of one of the best sorting algorithms I know: Quicksort.
When I first saw the algorithm I was also told that on average it has \(O(n\log n)\) time complexity, however I never saw a proof until today.
Today I read this article and found a way to simplify their proof.
Let \(a_n\) be the number of comparisons used on average in Quicksort, on an array with \(n\) elements. (\(a_0=a_1=0\))
The pivot has to be compared \(n-1\) times, with each one of the elements.
Then, suppose the left part (the elements smaller than the pivot) has \(k\) elements, thus the right part (the elements bigger than the pivot) has \(n-1-k\) elements.
We have to recursively sort them, which means another \(a_k+a_{n-1-k}\) comparison.
We are looking for the average of the different possibilities, where \(0\le k\le n-1\), which we can express by \(\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}\).
Therefore:
\(a_n=n-1+\frac{\sum_{k=0}^{n-1}a_k}{n}\)
All we have to do is solve this equation:
\(a_n=n-1+\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}
\\=n-1+\frac{\sum_{k=0}^{n-1}a_k+\sum_{k=0}^{n-1}a_{n-1-k}}{n}
\\=n-1+\frac{2\sum_{k=0}^{n-1}a_k}{n}
\\\Rightarrow a_{n-1}=n-2+\frac{2\sum_{k=0}^{n-2}a_k}{n-1}\)
From here we get:
\(na_n=n(n-1)+2\sum_{k=0}^{n-1}a_k
\\(n-1)a_{n-1}=(n-1)(n-2)+2\sum_{k=0}^{n-2}a_k\)
Subtracting the two gives us:
\(na_n-(n-1)a_{n-1}=n(n-1)-(n-1)(n-2)+2a_{n-1}
\\\Rightarrow na_n=(n-(n-2))(n-1)+2a_{n-1}+(n-1)a_{n-1}
\\\Rightarrow na_n=2(n-1)+(n+1)a_{n-1}
\\\Rightarrow \frac{a_n}{n+1}=2\frac{n-1}{n(n+1)}+\frac{a_{n-1}}{n}
\\=2\frac{2n-(n+1)}{n(n+1)}+\frac{a_{n-1}}{n}
\\=2(\frac{2}{n+1}-\frac{1}{n})+\frac{a_{n-1}}{n}\)
Let \(b_n=\frac{a_n}{n+1}\), hence:
\(b_n=2(\frac{2}{n+1}-\frac{1}{n})+b_{n-1}
\\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+b_{n-2}
\\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+2(\frac{2}{n-1}-\frac{1}{n-2})+b_{n-3}
\\=\ldots
\\=2\sum_{k=1}^{n}(\frac{2}{k+1}-\frac{1}{k})
\\=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}\)
Until now, my proof is the same as the original, but now I choose a different direction.
I will try so simplify the proof as much as I can.
\(b_n=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}
\\=4-4+4\sum_{k=2}^{n+1}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k}
\\=-4+4\times\frac{1}{n+1}+4\sum_{k=1}^{n}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k}
\\=\frac{4}{n+1}-4+2\sum_{k=1}^{n}\frac{1}{k}
\\\approx2\sum_{k=1}^{n}\frac{1}{k}
\\\approx2\int_1^n\frac{1}{k}\,\mathrm{d}k
\\=2(\log n-\log 1)
\\=2\log n\)
Now all we have to do is find \(a_n\):
\(b_n=\frac{a_n}{n+1}
\\\Rightarrow a_n=(n+1)b_n
\\=2(n+1)\log n\)
Or:
\(a_n=O(n\log n)\)
We haven’t found the exact result, but it is very close to the real value, to prove that Quicksort is \(O(n\log n)\) algorithm.
2 Responses to Quicksort Average Case Time Complexity
Leave a Reply Cancel reply
Categories





thanks man
Can you plz explain the 3 step:
How you write a¬n-1-k=ak
It means that
n-1-k=k
2k=n-1
K=(n-1)/2
As n-1 is the total no. of element, k is the mid of array
And that is the best case condition????