Friday, 28 June 2013

Big-O Notation Part-2

A Plain English Explanation of the Need for Big-O Notation:
When we program, we are trying to solve a problem. What we code is called an algorithm. Big O notation allows us to compare the worse case performance of our algorithms in a standardized way. Hardware specs vary over time and improvements in hardware can reduce the time it takes an algorithms to run. But replacing the hardware does not mean our algorithm is any better or improved over time, as our algorithm is still the same. So in order to allow us to compare different algorithms, to determine if one is better or not, we use Big O notation.
A Plain English Explanation of What Big O Notation is:
Not all algorithms run in the same amount of time, and can vary based on the number of items in the input, which we'll call n. Based on this, we consider the worse case analysis, or an upper-bound of the run-time as n get larger and larger. We must be aware of what n is, because many of the Big O notations reference it.
A Plain English Explanation of How to find the Big O Notation, the General Idea:
Algorithms can do one thing or multiple things. If your algorithm does multiple things, then one particular part may determine the overall Big O notation of the algorithm. The question is when you do multiple things, how much times does it take to do the individual parts, and do they depend on other parts of the algorithm, or are they independent? If they are independent, then you add the Big O notation together and the worse performing notation wins. If the separate parts are dependent upon each other, then they may not be added, and you may have a multiplication relations ship (like nested loops for bubble sort), or some other relationship (e.g. Powers, logs, etc.). Below I show some very simple analysis of some very simple algorithms.
Big-O Analysis of Some Very Simple Algorithms:
Algorithm 1) If you have an algorithm that prints a static statement (e.g. "Hello World"), then this will always run in the same amount of time, no matter how large the input is (because we aren't doing anything with any input). An algorithm that does something like this takes a constant amount of time, which is denoted as O(1) in Big O notation. O(1) is classified as constant time.
Algorithm 2) If you have an algorithm that gets a list of names, then prints hello to each of those people, then your algorithm will take longer. How long it takes is based on how many people you have in the list to say hello to. This type of algorithm takes O(n) in Big O notation. O(n) is classified as linear time.
Note: Linear time algorithms perform worse than constant time algorithms because as n becomes larger and larger, it takes a lot more time to do a linear time algorithm than it takes to do a constant time algorithm.
We are now going to makes some slight modifications to the above algorithms and see how those changes affect the Big O notation.
Algorithm 3) If we modified Algorithm 1, to say both "Hello World" then "Goodbye World", then this algorithm can be seen as doing "multiple" things. The question is how do these things multiple things relate to each other and how do they affect the overall Big O notation. Printing "Hello World" does not depend on the other print statement, and printing "Goodbye World" does not depend on any computation from the first part. So each of these tasks can be seen as independent of each other. From our analysis of Algorithm 1, we know the "hello" part takes O(1) and the goodbye part takes O(1). Both of these parts are constant and since they are independent of each other, we add their separate times together together (e.g. 1+1). Note that our result is also constant (e.g. 2). Since the result is constant, we know Algorithm 3 is O(1), the constant time classification.
Algorithm 4) If we modified Algorithm 2 to say hello to everyone then goodbye to everyone. Then this is similar to Algorithm 3, where the hellos are independent of the goodbyes. So the first part takes O(n)time and the 2nd part takes O(n) time. n+n is 2n, which is linear. So the result of Algorithm 4 is O(n).
Algorithm 5) We create an algorithm that says "hello world" (Algorithm 1) then says hello to everyone in the world list (Algorithm 2). The first part takes O(1) time and the 2nd part takes O(n) time. These 2 parts are independent of each other, so it would be O(1) + O(n) time. Since Big O notation looks at the upper-bound, its a question of which of O(1) and O(n) has the higher upper-bound (or worse performance). From above, we know O(n) takes more time, so O(1)+O(n)=O(n), meaning Algorithm 5 is O(n).

Big O
f(x) = O(g(x)) when x goes to a (for example, a = +∞) means that there is a function k such that:
  1. f(x) = k(x)g(x)
  2. k is bounded in some neighborhood of a (if a = +∞, this means that there are numbers N and M such that for every x > N, |k(x)| < M).
In other words, in plain English: f(x) = O(g(x)), x → a, means that in a neighborhood of a, f decomposes into the product of g and some bounded function.
Small o
By the way, here is for comparison the definition of small o.
f(x) = o(g(x)) when x goes to a means that there is a function k such that:
  1. f(x) = k(x)g(x)
  2. k(x) goes to 0 when x goes to a.
  • sin x = O(x) when x → 0.
  • sin x = O(1) when x → +∞,
  • x^2 + x = O(x) when x → 0,
  • x^2 + x = O(x^2) when x → +∞,
  • ln(x) = o(x) = O(x) when x → +∞.
Attention! The notation with the equal sign "=" uses a "fake equality": it is true that o(g(x)) = O(g(x)), but false that O(g(x)) = o(g(x)). Similarly, it is ok to write "ln(x) = o(x) when x → +∞", but the formula "o(x) = ln(x)" would make no sense.
It is very difficult to measure the speed of software programs, and when we try, the answers can be very complex and filled with exceptions and special cases. This is a big problem, because all those exceptions and special cases are distracting and unhelpful when we want to compare two different programs with one another to find out which is "fastest".
As a result of all this unhelpful complexity, people try to describe the speed of software programs using the smallest and least complex (mathematical) expressions possible. These expressions are very very crude approximations: Although, with a bit of luck, they will capture the "essence" of whether a piece of software is fast or slow.
Because they are approximations, we use the letter "O" (Big Oh) in the expression, as a convention to signal to the reader that we are making a gross oversimplification. (And to make sure that nobody mistakenly thinks that the expression is in any way accurate).
If you read the "Oh" as meaning "on the order of" or "approximately" you will not go too far wrong. (I think the choice of the Big-Oh might have been an attempt at humour).
The only thing that these "Big-Oh" expressions try to do is to describe how much the software slows down as we increase the amount of data that the software has to process. If we double the amount of data that needs to be processed, does the software need twice as long to finish it's work? Ten times as long? In practice, there are a very limited number of big-Oh expressions that you will encounter and need to worry about:
The good:
  • O(1) Constant: The program takes the same time to run no matter how big the input is.
  • O(log n) Logarithmic: The program run-time increases only slowly, even with big increases in the size of the input.
The bad:
  • O(n) Linear: The program run-time increases proportionally to the size of the input.
  • O(n^k) Polynomial: - Processing time grows faster and faster - as a polynomial function - as the size of the input increases.
... and the ugly:
  • O(k^n) Exponential The program run-time increases very quickly with even moderate increases in the size of the problem - it is only practical to process small data sets with exponential algorithms.
  • O(n!) Factorial The program run-time will be longer than you can afford to wait for anything but the very smallest and most trivial-seeming datasets.