The rules are simple: Copy-paste the rules. List your favorite tools, libraries, and services for web development. List obvious ones as well, they [...]]]>

The rules are simple:

- Copy-paste the rules.
- List your favorite tools, libraries, and services for web development.
- List obvious ones as well, they may not be obvious to others.
- Tag other web dev bloggers and let them know.
- Link back to the post that tagged you.

Below are my favorite development tools:

**H5BP**(HTML5 Boilerplate) – An awesome front end base-template, provides the tools to build a web application with modern technologies.**Twitter Bootstrap**– An amazing set of HTML, CSS and JavaScript for rich UI, easily customizable for your project.**Initializr**– Allows you to easily create your favorite mixture of H5BP, Bootstrap and some other libraries and services.**jQuery**– This one is pretty obvious, dramatically simplifies HTML and CSS manipulations, Ajax interactions, events handling, animations, etc.**LESS**– An extension of CSS, which adds variables, mixins, operators and some more valuable tools. It compiles to normal CSS using a JavaScript compiler.**Lithium**– My favorite PHP framework, based on PHP 5.3, extensive yet lightweight and well tested.**MDN**(Mozilla Developer Network) – The best documentation available for web technologies, I mainly use it for JavaScript.**Node.js**– JavaScript on the server side, extremely fast and scalable, just check it out!**R2**– CSS LTR <=> RTL converter written in JavaScript. It actually works!**Humans.txt**– This isn’t really a development tool, but a nice way to leave your mark on your recent project.

Yehuda Katz, Chris Cornutt, Gonzalo Ayuso, Stoyan Stefanov, Phil Sturgeon

Over to you!

In this article I would like to find some approximations concerning collisions. Probability for a collision

Let \(\mathbf{M}\) be the size of the range (i.e. for md5 it is \(2^{128}\) since md5 returns 128 [...]]]>

In this article I would like to find some approximations concerning collisions.

Let \(\mathbf{M}\) be the size of the range (i.e. for md5 it is \(2^{128}\) since md5 returns 128 bits).

Let \(\mathbf{n}\) be the number of random values in this range.

We shall calculate \(p\), the probability for at least one collision.

First, we calculate \(\overline{p}\), the probability for no collisions:

The first value can be any one from \(\mathbf{M}\), the second value can be any one from \(\mathbf{M}-1\) different values, and so on.

Therefore, \(\overline{p}=\frac{\mathbf{M}}{\mathbf{M}}\frac{\mathbf{M}-1}{\mathbf{M}}…\frac{\mathbf{M}-(\mathbf{n}-1)}{\mathbf{M}}=\prod_{k=0}^{\mathbf{n}-1}\frac{\mathbf{M}-k}{\mathbf{M}}=\prod_{k=0}^{\mathbf{n}-1}(1-\frac{k}{\mathbf{M}})\).

Since \(p=1-\overline{p}\) it is clear that:

\[(1) p=1-\prod_{k=0}^{\mathbf{n}-1}(1-\frac{k}{\mathbf{M}})\]

Recall that \(e^x=\sum_{k=0}^{\infty}\frac{x^k}{k!}\approx \frac{x^0}{0!}+\frac{x^1}{1!}=1+x\), substituting \(x=-\frac{k}{\mathbf{M}}\) gives \(1-\frac{k}{\mathbf{M}}\approx e^{-\frac{k}{\mathbf{M}}}\).

Now, from \((1)\):

\(p\approx 1-\prod_{k=0}^{\mathbf{n}-1}e^{-\frac{k}{\mathbf{M}}}=1-e^{\sum_{k=0}^{\mathbf{n}-1}{-\frac{k}{\mathbf{M}}}}=1-e^{-\frac{1}{\mathbf{M}} \sum_{k=0}^{\mathbf{n}-1}k}=1-e^{-\frac{\mathbf{n}(\mathbf{n}-1)}{2\mathbf{M}}}\)

Thus:

\[(2) p\approx 1-e^{-\frac{\mathbf{n}(\mathbf{n}-1)}{2\mathbf{M}}}\]

Under the condition \(1\ll \mathbf{n}^2\ll 2\mathbf{M}\) an even simpler formula emerges.

Again, \(e^x\approx 1+x\), substituting \(x=-\frac{\mathbf{n}^2}{2\mathbf{M}}\) gives \(1–\frac{\mathbf{n}^2}{2\mathbf{M}}\approx e^{-\frac{\mathbf{n}^2}{2\mathbf{M}}}\).

From \((2)\):

\(p\approx 1-e^{-\frac{\mathbf{n}(\mathbf{n}-1)}{2\mathbf{M}}}\approx 1-e^{-\frac{\mathbf{n}^2}{2\mathbf{M}}}\approx 1-(1-\frac{\mathbf{n}^2}{2\mathbf{M}})=\frac{\mathbf{n}^2}{2\mathbf{M}}\)

Therefore:

\[(3) p\approx \frac{\mathbf{n}^2}{2\mathbf{M}}\]

Let \(\mathbf{M}\) be the size of the range (i.e. for md5 it is \(2^{128}\) since md5 returns 128 bits).

Let \(\mathbf{p}\) be the desired probability for at least one collision.

We shall calculate \(n\), the number of values such that the probability for a collision is \(p\).

From \((2)\):

\(\mathbf{p}\approx 1-e^{-\frac{n(n-1)}{2\mathbf{M}}}

\\

1-\mathbf{p}\approx e^{-\frac{n(n-1)}{2\mathbf{M}}}

\\

\log(1-\mathbf{p})\approx-\frac{n(n-1)}{2\mathbf{M}}

\\

n(n-1)\approx-2\mathbf{M}\log(1-\mathbf{p})=2\mathbf{M}\log \frac{1}{1-\mathbf{p}}

\\

n^2-n-2\mathbf{M}\log \frac{1}{1-\mathbf{p}}\approx0

\\

n\approx\frac{1+\sqrt{1-8\mathbf{M}\log \frac{1}{1-\mathbf{p}}}}{2}=\frac{1}{2}+\sqrt{\frac{1}{4}+2\mathbf{M}\log \frac{1}{1-\mathbf{p}}}\)

Hence:

\[(4) n\approx \frac{1}{2}+\sqrt{\frac{1}{4}+2\mathbf{M}\log \frac{1}{1-\mathbf{p}}}\]

If \(\mathbf{M}\mathbf{p}\gg 1\) the constants are negligible:

\[(5) n\approx\sqrt{2\mathbf{M}\log \frac{1}{1-\mathbf{p}}}\]

If in addition \(\mathbf{p}\ll 1\), formula \((3)\) implies (can be demonstrated using \((5)\) too):

\[(6) n\approx \sqrt{2\mathbf{M}\mathbf{p}}\]

The birthday problem (often referred as birthday paradox) states that if \(23\) people are randomly selected, the probability that two of them share their birthdays is higher than \(\frac{1}{2}\).

In this case \(\mathbf{M}=365\), \(\mathbf{n}=23\) and \(\mathbf{p}=\frac{1}{2}\).

Using formula\( (1)\):

\(p=1-\prod_{k=0}^{22}(1-\frac{k}{365})=0.507297\)

Luckily, formula \( (2)\) produces a similar result:

\(p\approx 1-e^{-\frac{23\times 22}{2\times 365}}=1-e^{-\frac{253}{365}}=0.500002\)

The inverse can be calculated using formula \( (4)\):

\(n\approx \frac{1}{2}+\sqrt{\frac{1}{4}+2\times 365\log \frac{1}{1-\frac{1}{2}}}=\frac{1}{2}+\sqrt{\frac{1}{4}+730\log 2}=22.9999\)

Formulas \( (3)\), \( (5)\) and \( (6)\) are inappropriate in this case (however their results are not too far from the actual results).

In this case \(\mathbf{M}=2^{128}\) as I have mentioned in the beginning.

Using formula \((5)\) (\(\mathbf{M}\) is very large) we can find \(n\) given \(\mathbf{p}\):

\(\mathbf{p}=\frac{1}{1000}\): \(n\approx\sqrt{2\times 2^{128} \log \frac{1}{1-\frac{1}{1000}}}=8.25170\times 10^{17}\)

\(\mathbf{p}=\frac{1}{100}\): \(n\approx\sqrt{2\times 2^{128} \log \frac{1}{1-\frac{1}{100}}}=2.61532\times 10^{18}\)

\(\mathbf{p}=\frac{1}{10}\): \(n\approx\sqrt{2\times 2^{128} \log \frac{1}{1-\frac{1}{10}}}=8.4678\times 10^{18}\)

\(\mathbf{p}=\frac{1}{2}\): \(n\approx\sqrt{2\times 2^{128} \log \frac{1}{1-\frac{1}{2}}}=2.17194\times 10^{19}\)

\(\mathbf{p}=\frac{3}{4}\): \(n\approx\sqrt{2\times 2^{128} \log \frac{1}{1-\frac{3}{4}}}=3.07158\times 10^{19}\)

\(\mathbf{p}=\frac{99}{100}\): \(n\approx\sqrt{2\times 2^{128} \log \frac{1}{1-\frac{99}{100}}}=5.59832\times 10^{19}\)

For \(\mathbf{p}=\frac{1}{1000}, \frac{1}{100}, \frac{1}{10}\) formula \((6)\) can be used (higher values give bad results):

\(\mathbf{p}=\frac{1}{1000}\): \(n\approx\sqrt{2\times 2^{128} \frac{1}{1000}}=8.24963\times 10^{17}\)

\(\mathbf{p}=\frac{1}{100}\): \(n\approx\sqrt{2\times 2^{128} \frac{1}{100}}=2.60876\times 10^{18}\)

\(\mathbf{p}=\frac{1}{10}\): \(n\approx\sqrt{2\times 2^{128} \frac{1}{10}}=8.24963\times 10^{18}\)

These results are consistent with the previous results.

We have obtained some formulas that relate \(\mathbf{M}\), \(\mathbf{n}\) and \(\mathbf{p}\):

\[(1) p=1-\prod_{k=0}^{\mathbf{n}-1}(1-\frac{k}{\mathbf{M}})

\\

(2) p\approx 1-e^{-\frac{\mathbf{n}(\mathbf{n}-1)}{2\mathbf{M}}}

\\

(3) p\approx \frac{\mathbf{n}^2}{2\mathbf{M}} (1\ll \mathbf{n}^2\ll 2\mathbf{M})

\\

(4) n\approx \frac{1}{2}+\sqrt{\frac{1}{4}+2\mathbf{M}\log \frac{1}{1-\mathbf{p}}}

\\

(5) n\approx\sqrt{2\mathbf{M}\log \frac{1}{1-\mathbf{p}}} (\mathbf{M}\mathbf{p}\gg 1)

\\

(6) n\approx \sqrt{2\mathbf{M}\mathbf{p}} (\mathbf{Mp}\gg 1, \mathbf{p}\ll 1)\]

This algorithm can be implemented with different data-structures, and it’s complexity is \(O(n)\) or \(O(k\log\space k)\) for 2 specific implementations.

We need a data-structure that can contain integers.

It must have 2 methods which are add [...]]]>

This algorithm can be implemented with different data-structures, and it’s complexity is \(O(n)\) or \(O(k\log\space k)\) for 2 specific implementations.

We need a data-structure that can contain integers.

It must have 2 methods which are add (adds an integer) and contains (whether an integer is in or not).

We also need to be able to iterate it, I will use foreach loop for that.

Let \(A\) be the original array, with elements of type \(T\).

Let \(k\) be the number of elements we want in the sub array.

Let \(n=length(A)\).

Let \(D\) be the data-structure.

- Let \(B=T[k]\) (an array of type \(T\) with \(k\) elements)
- Let \(flag=(k\le n / 2)\)
- Let \(stop=(flag ? k : n-k)\)
- \(for(i=0;i\lt stop;i=i+1)\)
- \(while(true)\)
- Let \(key=random(0,n-1)\)
- \(if(!D.contains(key))\)
- \(D.add(key)\)
- \(break\)

- \(while(true)\)
- \(if(flag)\)
- Let \(p=0\)
- \(foreach(key\space in\space D)\)
- \(B[p]=A[key]\)
- \(p=p+1\)

- \(else\)
- \(keys=bool[n]\)
- \(foreach(key\space in\space D)\)
- \(keys[key]=true\)

- Let \(p=0\)
- \(for(i=0;i\lt k;i=i+1)\)
- \(while(keys[p])\)
- \(p=p+1\)

- \(B[i]=array[p]\)
- \(p=p+1\)

- \(while(keys[p])\)

- \(return\space B\)

I will calculate each stage’s complexity separately, and then sum them up.

But first:

Let \(T_c(j)\) be the complexity of \(D.contains\) when there are \(j\) elements in it.

Let \(T_a(j)\) be the complexity of \(D.add\) when there are \(j\) elements in it.

Let \(T_i(j)\) be the complexity of iterating the data-structure’s elements when there are \(j\) elements in it.

Stage 1 is \(O(k)\) or \(O(1)\) depending on the language’s implementation for array declarations, we will use \(O(k)\).

Stage 2 and 3 are clearly \(O(1)\).

Stage 4 is more complex to calculate and we can only calculate it for the average case.

In each of the iterations in the for-loop we are looking for a number between \(0\) and \(n-1\) that is not already in \(D\).

Because there are already \(i\) elements in the data-structure, the distribution of the number of while-loop iterations is a geometric distribution with probability \(\frac{n-i}{n}\), therefore the mean number of iterations is \(\frac{n}{n-i}\).

So this stage’s complexity is:

\[\displaylines{O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)\frac{n}{n-i}+T_a(i)))

\\

=O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)\frac{n}{n/2}+T_a(i)))

\\

=O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)+T_a(i)))}\]

Stage 5 is simply \(O(T_i(k))\).

Stage 6 is also quite simple.

There is an array deceleration (\(O(n)\) as mentioned before), iteration over the data-structure \(O(T_i(n-k))\) and an iteration on the arrays (\(O(n)\).

Summing up to \(O(T_i(n-k)+n)\).

But we know that \(k\ge n/2\), and therefore it is also \(O(T_i(n-k)+k)\).

Bear in mind that only one of stages 5 and 6 occurs, hence there total complexity of these stages is \(O(T_i(min(k,n-k))+k)\).

Stage 7 is also \(O(1)\).

Summing stages 1 to 7 gives us:

\[O(\sum_{i=0}^{min(k,n-k)-1}(T_c(i)+T_a(i))+T_i(min(k,n-k))+k)\]

This implementation was introduced in my previous article, Generating Random Sub Array.

In this implementation the data-structure acts the same as the array \(keys\) that is used in stage 6.1.

In this case \(T_c(j)=1\), \(T_a(j)=1\) and \(T_i(j)=n\).

With that in mind the algorithms complexity can be found easily: \(O(\sum_{i=0}^{min(k,n-k)-1}(1+1)+n+k)=O(n)\)

In this implementation our data-structure is a binary search tree.

This implementation is much better than the one mentioned above when \(k\ll n\).

In this case \(T_c(j)=\log\space j\), \(T_a(j)=\log\space j\) and \(T_i(j)=j\).

We can again easily find the total time complexity.

For simplicity I will assume that \(k\le n/2\).

\(O(\sum_{i=0}^{k-1}(\log\space j+\log\space j)+k+k)=O(k\log\space k)\)

I wanted to choose a random sub array of size \(k\) from an array of size \(n\), while maintaining its order.

For example, if we have the array \(\{0,1,2,3,4,5,6,7,8,9\}\) a possible result for \(k=4\) is the array \(\{1,4,7,9\}\).

I came up with an algorithm with average [...]]]>

I wanted to choose a random sub array of size \(k\) from an array of size \(n\), while maintaining its order.

For example, if we have the array \(\{0,1,2,3,4,5,6,7,8,9\}\) a possible result for \(k=4\) is the array \(\{1,4,7,9\}\).

I came up with an algorithm with average case time complexity \(O(n)\).

Let \(A\) be the original array, with elements of type \(T\).

Let \(k\) be the number of elements we want in the sub array.

Let \(n=length(A)\).

- Let \(B=T[k]\) (an array of type \(T\) with \(k\) elements)
- Let \(keys=bool[n]\) (an array with \(n\) boolean elements)
- Let \(flag=(k\le n / 2)\)
- Let \(stop=(flag ? k : n-k)\)
- \(for(i=0;i\lt stop;i=i+1)\)
- \(while(true)\)
- Let \(key=random(0,n-1)\)
- \(if(keys[key]=false)\)
- \(keys[key]=true\)
- \(break\)

- \(while(true)\)
- Let \(p=0\)
- \(for(i=0;i\lt k;i=i+1)\)
- \(while(keys[p] \oplus flag)\) (where \(\oplus\) is xor)
- \(p=p+1\)

- \(B[i]=array[p]\)
- \(p=p+1\)

- \(while(keys[p] \oplus flag)\) (where \(\oplus\) is xor)
- \(return\space B\)

Stages 1 and 2 depends on the language’s implementation for array declaration.

However they are always \(O(n)\).

Stages 3, 4 and 6 are clearly \(O(1)\).

Let’s take a look at stage 7.

Since \(p\) can never be bigger than \(n\), and \(i\) can never be bigger than \(k\), and each iteration is \(O(1)\), these stages are \(O(n+k)=O(n)\).

Now, let’s consider stage 5.

Within each step of the loop we are trying to find a number between \(0\) and \(n-1\) which wasn’t picked earlier.

The distribution of the number of guesses it takes to find such a number, is a geometric distribution with probability \(\frac{n-i}{n}\), hence the mean is \(\frac{n}{n-i}\).

We know that \(i\) goes from \(0\) to \(min(k,n-k)-1\) which is less than \(n/2\).

Each step in the while loop is \(O(1)\).

Therefore the worst case complexity, on average, is \(O(\sum_{i=0}^{n/2-1}\frac{n}{n-i})=O(\sum_{i=0}^{n/2-1}\frac{n}{n/2})=O((n/2)\frac{n}{n/2})=O(n)\).

Summing all of the stages together, we can see that the algorithm has an average case time complexity of \(O(n)\).

I wrote a simple implementation of the algorithm in C#:

public static T[] RandomSubArray<T>(T[] array, int k) { int n = array.Length; T[] ret = new T[k]; bool[] keys = new bool[n]; Random rand = new Random(); bool flag = k <= n / 2; for (int i = 0, stop = flag ? k : n - k; i < stop; i++) { while (true) { int key = rand.Next(0, n); if (!keys[key]) { keys[key] = true; break; } } } int p = 0; for (int i = 0; i < k; i++) { while (keys[p] ^ flag) { p++; } ret[i] = array[p++]; } return ret; }]]>

Today I would like to discuss the entropy of a phrase based password. Phrase Based Password

A phrase based password is a password assembled from several easy to remember and spell words, delimited by spaces.

As a result, such passwords are very [...]]]>

Today I would like to discuss the entropy of a phrase based password.

A phrase based password is a password assembled from several easy to remember and spell words, delimited by spaces.

As a result, such passwords are very easily remembered.

To produce such a password, one must have a dictionary of words.

Each time a user asks for a password, the system randomly chooses a few words to generate the password.

Let \(N\) be the size of our dictionary, and let \(L\) be the number of words in the password, therefore there are \(T = {N \choose L}L!\) different possible passwords.

Assuming \(N \gg L\) we can approximate this number by \(T \approx N^L\).

As we know the entropy \(H\) is given by \(H = \log_2{T}\), thus \(H \approx \log_2{N^L} = L \log_2{N}\), which is identical to the entropy of a password of length \(L\) under an alphabet with \(N\) different symbols.

Let’s assume that \(N = 10,000\) (i.e. we have 10,000 unique words in our dictionary), and \(L = 5\), the entropy of a password under these conditions is \(H \approx 5 \log_2{10,000} \approx 66.44 \text{bits}\).

Unfortunately when \(N = 3,000\) and \(L = 4\) the entropy is much lower: \(H \approx 4 \log_2{3,000} \approx 46.2 \text{bits}\)

Phrase base passwords are easy to remember; hence they are great in terms of ease of use.

On the other hand, in order for this method to be reliable, the dictionary has to big quite large, and each password must contain at least 4 or 5 words.

The entropy is given by \(H = L \log_2{N}\) where \(L\) is the length of the password and \(N\) is the size of the alphabet, and it is usually measured in bits.

The entropy measures the number of [...]]]>

The entropy is given by \(H = L \log_2{N}\) where \(L\) is the length of the password and \(N\) is the size of the alphabet, and it is usually measured in bits.

The entropy measures the number of bits it would take to represent every password of length \(L\) under an alphabet with \(N\) different symbols.

For example, a password of 7 lower-case characters (such as: *example*, *polmnni*, etc.) has an entropy of \(H = 7 \log_2{26} \approx 32.9 \text{bits}\).

A password of 10 alpha-numeric characters (such as: *P4ssw0Rd97*, *K5lb42eQa2*) has an entropy of \(H = 10 \log_2{62} \approx 59.54 \text{bits}\).

Entropy makes it easy to compare password strengths, higher entropy means stronger password (in terms of resistance to brute force attacks).

An interesting fact is that a password that is usually considered strong, such as *f#Mo1e)*TjC8* (entropy \(H = 12 \log_2{72} \approx 74.04 \text{bits}\)), usually has lower entropy than a password assembled form several words delimited by spaces, such as *carrot ways base split* (entropy \(H = 22 \log_2{27} \approx 104.61 \text{bits}\)).

This fact was demonstrated wonderfully by Randall Munroe in the following picture (although I believe his entropy calculation was different than mine):

I wrote a simple entropy calculator in javascript, you can use it online here:

Password:

Entropy: 0

Calculator source: http://blog.shay.co/files/entropy.js.

When I first saw the algorithm I was also told that on average it has \(O(n\log n)\) time complexity, however I never saw a proof until today.

Today I read

When I first saw the algorithm I was also told that on average it has \(O(n\log n)\) time complexity, however I never saw a proof until today.

Today I read this article and found a way to simplify their proof.

Let \(a_n\) be the number of comparisons used on average in Quicksort, on an array with \(n\) elements. (\(a_0=a_1=0\))

The pivot has to be compared \(n-1\) times, with each one of the elements.

Then, suppose the left part (the elements smaller than the pivot) has \(k\) elements, thus the right part (the elements bigger than the pivot) has \(n-1-k\) elements.

We have to recursively sort them, which means another \(a_k+a_{n-1-k}\) comparison.

We are looking for the average of the different possibilities, where \(0\le k\le n-1\), which we can express by \(\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}\).

Therefore:

\(a_n=n-1+\frac{\sum_{k=0}^{n-1}a_k}{n}\)

All we have to do is solve this equation:

\(a_n=n-1+\frac{\sum_{k=0}^{n-1}(a_k+a_{n-1-k})}{n}

\\=n-1+\frac{\sum_{k=0}^{n-1}a_k+\sum_{k=0}^{n-1}a_{n-1-k}}{n}

\\=n-1+\frac{2\sum_{k=0}^{n-1}a_k}{n}

\\\Rightarrow a_{n-1}=n-2+\frac{2\sum_{k=0}^{n-2}a_k}{n-1}\)

From here we get:

\(na_n=n(n-1)+2\sum_{k=0}^{n-1}a_k

\\(n-1)a_{n-1}=(n-1)(n-2)+2\sum_{k=0}^{n-2}a_k\)

Subtracting the two gives us:

\(na_n-(n-1)a_{n-1}=n(n-1)-(n-1)(n-2)+2a_{n-1}

\\\Rightarrow na_n=(n-(n-2))(n-1)+2a_{n-1}+(n-1)a_{n-1}

\\\Rightarrow na_n=2(n-1)+(n+1)a_{n-1}

\\\Rightarrow \frac{a_n}{n+1}=2\frac{n-1}{n(n+1)}+\frac{a_{n-1}}{n}

\\=2\frac{2n-(n+1)}{n(n+1)}+\frac{a_{n-1}}{n}

\\=2(\frac{2}{n+1}-\frac{1}{n})+\frac{a_{n-1}}{n}\)

Let \(b_n=\frac{a_n}{n+1}\), hence:

\(b_n=2(\frac{2}{n+1}-\frac{1}{n})+b_{n-1}

\\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+b_{n-2}

\\=2(\frac{2}{n+1}-\frac{1}{n})+2(\frac{2}{n}-\frac{1}{n-1})+2(\frac{2}{n-1}-\frac{1}{n-2})+b_{n-3}

\\=\ldots

\\=2\sum_{k=1}^{n}(\frac{2}{k+1}-\frac{1}{k})

\\=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}\)

Until now, my proof is the same as the original, but now I choose a different direction.

I will try so simplify the proof as much as I can.

\(b_n=4\sum_{k=1}^{n}\frac{1}{k+1}-2\sum_{k=1}^{n}\frac{1}{k}

\\=4-4+4\sum_{k=2}^{n+1}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k}

\\=-4+4\times\frac{1}{n+1}+4\sum_{k=1}^{n}\frac{1}{k}-2\sum_{k=1}^{n}\frac{1}{k}

\\=\frac{4}{n+1}-4+2\sum_{k=1}^{n}\frac{1}{k}

\\\approx2\sum_{k=1}^{n}\frac{1}{k}

\\\approx2\int_1^n\frac{1}{k}\,\mathrm{d}k

\\=2(\log n-\log 1)

\\=2\log n\)

Now all we have to do is find \(a_n\):

\(b_n=\frac{a_n}{n+1}

\\\Rightarrow a_n=(n+1)b_n

\\=2(n+1)\log n\)

Or:

\(a_n=O(n\log n)\)

We haven’t found the exact result, but it is very close to the real value, to prove that Quicksort is \(O(n\log n)\) algorithm.

]]>The method itself is very simple, you give an initial guess of the root, \(x_0\), and calculate further values using the following equation: \(x_{n+1}=x_n-\frac{f’(x_n)}{f(x_n)}\).

Generally, each term in the sequence is more accurate in comparison to the previous terms.

The method itself is very simple, you give an initial guess of the root, \(x_0\), and calculate further values using the following equation: \(x_{n+1}=x_n-\frac{f’(x_n)}{f(x_n)}\).

Generally, each term in the sequence is more accurate in comparison to the previous terms.

Let \(f(x)\) be the function whose root we are looking for.

Let \(x_0\) be our initial guess for the root.

Let \(x_n\) be our latest guess for the root.

In each step we take our best guess, \(x_n\), and try to make it more accurate.

To do so, we draw the tangent in the point \((x_n, f(x_n))\) and find its root, which we than call \(x_{n+1}\).

\(x_{n+1}\) will be more accurate (usually).

A demonstration: (click on the image to enlarge it)

The math behind this is not very difficult.

The equation of the tangent line is: \(y-f(x_n)=f’(x_n)(x_{n+1}-x_n)\).

We look for \(x_{n+1}\) when \(y=0\), hence

\(0-f(x_n)=f’(x_n)(x_{n+1}-x_n)

\\\Rightarrow -f(x_n)=x_{n+1}f’(x_n)-x_nf’(x_n)

\\\Rightarrow x_{n+1}f’(x_n)=x_nf’(x_n)-f(x_n)

\\\Rightarrow x_{n+1}=x_n-\frac{f(x_n)}{f’(x_n)}\)

Say we want to calculate the square root of \(a\).

We need to find a value of \(x\) such that \(x=\sqrt{a}\Rightarrow x^2=a\Rightarrow x^2-a=0\).

So, we are looking for the non-negative root (\(x=\sqrt{a}\ge0\)) of the function \(f(x)=x^2-a\).

The derivative of this function is \(f’(x)=2x\).

And therefore:

\(x_{n+1}=x_n-\frac{x_n^2-a}{2x_n}=\frac{2x_n^2-x_n^2+a}{2x_n}=\frac{x_n^2+a}{2x_n}=\frac{x_n}{2}+\frac{a}{2x_n}\)

Let’s try and find an approximation for \(\sqrt{2}\) with initial guess \(x_0=1\).

\(x_1=\frac{1}{2}+\frac{2}{2\times1}=\frac{3}{2}

\\ x_2=\frac{3}{2\times2}+\frac{2\times2}{2\times3}=\frac{17}{12}

\\ x_3=\frac{17}{2\times12}+\frac{2\times12}{2\times17}=\frac{577}{408}

\\ x_4=\frac{577}{2\times408}+\frac{2\times408}{2\times577}=\frac{665857}{470832}\approx1.414213562\)

The error from the real value is lower than \(0.0000000000016\)!

We have got this amazing result with only four steps!

Calculating further will produce very accurate massive fractions:

\(|x_5-\sqrt{2}|<9\times10^{-25}

\\ |x_6-\sqrt{2}|<2.9\times10^{-49}

\\ |x_7-\sqrt{2}|<2.9\times10^{-98}\)

The C# implementation is super easy.

The parameters are \(a\), which is the number whose square root we are looking for, and \(times\), which determines the number of calculations.

public double Sqrt(double a, int times) { if (a < 0) throw new Exception("Can not sqrt a negative number"); double x = 1; while (times-- > 0) x = x / 2 + a / (2 * x); return x; }

Newton’s method is a very efficient way to find roots of functions, including square roots of numbers.

It is much more efficient than the method I introduced in my article Square Root Algorithm.

You can read more about Newton’s method in Wikipedia.

There is more great information in Math World.

The article will only observe connection to MySQL database. Introduction

PDO is an abstraction layer for database connections in PHP, and it became increasingly popular in the past few years.

[...]]]>The article will only observe connection to MySQL database.

PDO is an abstraction layer for database connections in PHP, and it became increasingly popular in the past few years.

PDO gives us the option to use a persistent connection.

If we don’t use this option, a new connection is created for each request.

If we do use this option, the connection is not closed at the end of the script, and it is then re-used by other script requests.

Connection to the database using this option is not very different from a regular connection:

$pdo = new PDO('mysql:dbname=database;host=127.0.0.1', 'username', 'password'); // Regular connection $pdo = new PDO('mysql:dbname=database;host=127.0.0.1', 'username', 'password', array(PDO::ATTR_PERSISTENT => true)); // Persistent connection

All of tests ran on my PC using zend server.

CPU: Intel E8400

RAM: 4GB DDR2 800MHz

I used a MySQL database with root permissions, with an empty database named pdotest.

In order to test the memory usage I wrote two simple scripts, which establish a connection and print the memory difference between the time before the connection was created to the time after it.

The results indicates that the persistent connection consumed 18.5 times less memory than the non-persistent connection.

The non-persistent connection used 6232 bytes while the persistent connection used 336 bytes.

In order to test the speed of the connection I used Apache’s ab tool.

I tested both methods with 1, 2, 3, 5, 10, 50 and 100 concurrent users, for 10 seconds every time.

Test results show that the persistent connection gave 6.3 to 1.5 times more requests per second than the non-persistent version.

The results are clear, persistent connection uses less memory and runs faster.

You can download the test files and the full results here.

I should mention that this algorithm probably exists.

First of all, my algorithm can only find roots between a specific range (for example \(0\) to \(1\)), thus we need to decrease [...]]]>

I should mention that this algorithm probably exists.

First of all, my algorithm can only find roots between a specific range (for example \(0\) to \(1\)), thus we need to decrease the number whose root we are looking for to our range.

To do so we use a simple math identity: \(\sqrt{p^2x}=p\sqrt{x}\).

All we have to do is pick a number \(p\), divide \(x\) by \(p^2\) until it’s square root is in our range, and remember the number of times we repeated it – \(q\). When we have the result we need to multiply it by \(p^q\).

For example, if the range is \(0\) to \(1\), \(x=289\) and \(p=2\), we have to divide \(289\) by \(p^2=4\) \(q=5\) times to get \(0.2822265625\). Then, we use the algorithm I will explain later to find the root which is \(0.53125\). Now we can return \(0.53125*2^5=17\).

And indeed \(\sqrt{289}=17\).

The algorithm is very simple and uses the same concept of binary search; it works by minimizing the possible range of the square root.

I use the algorithm in the range \(0\) to \(1\) but it can be used with any range.

We know that \(x\) is between \(0\) and \(1\), therefore its square root’s lower bound is \(a=0\) and upper bound is \(b=1\).

The next step is calculating the average of the bounds \(t=(a+b)/2\).

If \(t^2=x\) we return \(t\), if \(t^2<x\) all of the numbers between \(a\) and \(t\) are not the square root, hence \(a=t\). Similarly, if \(t^2>x\) our upper bound, \(b=t\).

We can repeat this step as many times as we want. Each iteration doubles the precision.

If we didn’t find the specific square root when we finish, we should return \((a+b)/2\), as it is the closest we can get to the actual square root.

I will demonstrate the whole algorithm in C#. In the following implementation \(p\) and the number of iterations are parameters.

public double Sqrt(double x, int p, int iterations) { if (x < 0) throw new Exception("Can not sqrt a negative number"); long multiplier = 1; int p2 = p * p; while (x > 1) { multiplier *= p; x /= p2; } if (x == 1 || x == 0) return multiplier * x; double a = 0; double b = 1; for (int i = 0; i < iterations; i++) { double t = (a + b) / 2; if (t * t == x) return multiplier * t; else if (t * t < x) a = t; else b = t; } return multiplier * (a + b) / 2; }

Rather than calculating the complexity of \(k\) iterations, I will calculate the complexity for precision of \(1\) to \(n\).

The first part consists of dividing \(x\) by \(p^2\) and multiplying the multiplier by \(p\), which is \(O(1)\) for each iteration.

\(x\) is divided \(\log_{p^2} x\) times, hence this part’s complexity is \(O(\log_{p^2} x)\).

Multiplying the result by the multiplier decreases the precision so we will take it in to consideration later. Finding the value of multiplier is simple: \(p^{\log_{p^2} x}=p^{(\log_p x)/(\log_p p^2)}=\sqrt{p^{\log_p x}}=\sqrt{x}\)

In the main part each iteration has a complexity of \(O(1)\) and it doubles the precision, as I explained before.

Let \(i\) be the minimal number of iterations to get a precision of \(1\) to \(n\).

That means that \(1/n=\sqrt{x}/2^i \Rightarrow 2^i=n \sqrt{x} \Rightarrow i=\log_2 (n \sqrt{x})\).

Therefore the total complexity of the main part is \(O(\log_2 (n \sqrt{x}))\).

Summing these two, the total complexity is

\(O(\log_{p^2} x + \log_2 (n \sqrt{x}))

\\=O((\log_p x)/(\log_p p^2) + \log_2 n + \log_2 \sqrt{x})

\\=O(0.5 \log_p x + \log_2 n + 0.5 \log_2 x)

\\=O(\log_2 x + \log_p x + \log_2 n)\)

We know that \(p \ge 2 \Rightarrow \log_p x \le \log_2 x\), therefore the worst-case complexity is \(O(\log_2 x + \log_2 x + \log_2 n)=O(\log_2 x + \log_2 n)=O(\log_2 nx)\).

I don’t know many square root algorithms which means that I don’t know other algorithm’s complexity. That said my algorithm is still very efficient.

More over, this method seems pretty intuitive to me, it is easy to understand and easy to implement.

It is also very easy to implement an algorithm that calculates square roots with a constant maximum error.

I hope someone will find this article useful.

**Update 24/05/2011:**

I read about Newton’s method, which is a method used to find the roots of functions. I found out that it is far more easy to implement and much more efficient in comparison to my method.