Sunday, 30 June 2013

UGC JRF+NET JUNE 2013 Paper-III

Share your views about any question or better explanation if you find anywhere to help others.

Best of luck.

Q-1. The Software Maturity Index (SMI. ANSWER IS A

Ref. Software Maturity Index : (A metric in IEEE 982.1-1988, see http://www.standards.ieee.org/reading/ieee/std_public/description/se/982.1-1988_desc.html) M = number of modules in current version A = number of added modules in current version C = number of changed modules in current version D = number of deleted modules in current version compared to the previous version SMI = (M - (A + C + D)) / M More perspicuously, SMI = 1 - N/M, where M is the total number of modules in the current version of the system and N is the number of modules added, changed or deleted between the previous version and this one. SMI can be a measurement of product stability, when SMI approaches 1.0 the product is stable. When correlated with the time it takes to complete a version of the software, you have an indication of the maintenance effort needed.

Q-2. 

Answer is D
    Watson and felix model provides an relationship between delivered source lines of code L  and effort  E(person-month)and is given by equation -
    E = 5.2 L0.91
    Similarly duration of development D in months is given by
    D=4.1 L0.36
    Quick Fix model is  an adhoc approach to maintaining software. It is a fire fighting approach, waiting for the problem to occur and then trying to fix it .
    Putnam model describes the time and effort required to finish a software project of specified size.
    Logarithmetic Poisson Model  It is software reliability model that predicts expected failures (and hence related reliability quantities).

      Q-3.
      _____________ is a process model that removes defects before they can precipitate serious hazards.

      Answer is C (Cleanroom Software Engineering)

      Q-4.

      Equivalence partitioning is a  method that divides the input domain of a program into classes of data from which test cases can be derived
      a. White-Box testing
      b. Black box testing
      c. Orthogonal- array testing
      d. Stress Testing

      Q-5.
      The following three golden rules :
      (i) Place the user in control
      (ii) Reduce the user's memory load
      (iii) Make the interface consistent are f

      a. User satisfaction
      b. Good interface design
      c. Saving system's resources
      d. None of these  

      Ref. http://theomandel.com/resources/golden-rules-of-user-interface-design/

      Q-6. Software Safety is a Software Quality Assurance activity.

      Answer is B

      Q-7. The PROJECT operator of relational algebra creates a new table that has always same number of rows as the orginal table

      Answer is C

      Q-8. D

      Q-9. A

      Q-10. Wrong questions, no choice is correct, check

      http://books.google.co.in/books?id=9m382yDgxRsC&pg=PA109&lpg=PA109&dq=RAID+3+has+non+redundant+stripping&source=bl&ots=7IE6xRHe5c&sig=zLYEmpjwciEXTAlkOwZH5Kq_BU0&hl=en&sa=X&ei=IDHRUam9HMPprQeW6YGwAQ&ved=0CC4Q6AEwAA#v=onepage&q=RAID%203%20has%20non%20redundant%20stripping&f=false

      Q-11. C

      Q-12. D

      Q-13. A

      Q-14. D

      Q-15. C

      Q-16. A

      Q-17. A

      Q-18. B

      Q-19. C (1024*800*24/10000000)

      Q-20. D

      Q-21.

      Q-22 what is the baud rate of stndrd 10Mbps Ethernet ?
               A)10 megabaud
               B) 20
               C) 30
               D) 40

      Q-23. C

      Q-24. D

      Q-25

      Q-26. A

      Q-27

      Q-28.

      Q-29.

      Q-30. A

      Q-31

      Q-32. A

      Q-33. D

      Q-34.

      Q-35. Which of the following is/are the fundamental semantic model(s) of parameter passing?
             a) in mode
             b) out mode
             c) in-out mode
             d) all

      Q-36. A

      Q-37. C

      Q-38. B

      Q-39. C

      Q-40. B

      Q-41 D

      Q-42 D

      Q-43 C

      Q-44 B

      Q-45.

      Q-46. B

      Q-47. B

      Q-48. A

      Q-49.Suppose you want to delete the name that occurs before "Vivek" in an alphabetical listing. Which datastructure is most efficent
      a) Circular Linked List
      b) Double Linked List
      c) Linked List
      d) Dequeue
      B

      Q-50.output of the following code sequence 

      main()
      {
      char *s="hello world";
      int i=7;
      printf("%, *s",i,s);
      }
       
      Its printing "%, *s", which is there in none of the options.

      Q-51. C

      Q-52. A

      Q-53.Binary symmentric channel uses
              (a) half duplex protocol 
              (b) full duplex protocol 
              (c) bit oriented protocol 
              (d) none of the above

      Q-54.Hamming Distance between 100101000110 and 110111101101 is
      A. 3
      B. 4
      C. 5
      D. 6

      Q-55. A
      Q-56. D
      Q-57. A
      Q-58. B
      Q-59. C
      Q-60. D
      Q-61. A
      Q-62. A
      Q-63. B
      Q-64. C
      Q-65. B
      Q-66. C (BUT CONFUSING)
      Q-67
      Q-68. Which one of the following is not an informed search technique ?
      (a) Hill Climbing search
      (b) best first search
      (c) A* search
      (d) Depth first search
      Q-69. C
      Q-70. B OR C
      Q-71. B
      Q-72. B OR C
      Q-73. C
      Q-74. C
      Q-75.

      Friday, 28 June 2013

      FD and Normal Form Calculation

      Given R(A,B,C,D,E) with the set of FDs,
      F{AB-> CD, ABC-> E, C-> A}
      (i) Find any two candidate keys of R
      (ii) What is the normal form of R? Justify.

      (i) To find two candidate keys of R, we have to find the closure of the set of attributes
      under consideration and if all the attributes of R are in the closure then that set is a
      candidate key. Now from the set of FD’s we can make out that B is not occurring on the RHS of any FD, therefore, it must be a part of the candidate keys being considered otherwise it will not be in the closure of any attribute set. So let us consider the following sets AB and BC.

      Now (AB)+= ABCDE, CD are included in closure because of the FD, AB-> CD, and E is
      included in closure because of the FD ABC-> E.
      Now (BC) += BCAED, A is included in closure because of the FD C-> A, and then E is
      included in closure because of the FD ABC ->E and lastly D is included in closure
      because of the FD AB ->CD.

      Therefore two candidate keys are : AB and BC.

      (ii) The prime attributes are A, B and C and non-prime attributes are D and E.

      A relation scheme is in 2NF, if all the non-prime attributes are fully functionally
      dependent on the relation key(s). From the set of FDs we can see that the non-prime
      attributes (D,E) are fully functionally dependent on the prime attributes, therefore, the relation is in 2NF.
      A relation scheme is in 3NF, if for all the non-trivial FDs in F+ of the form X A, either
      X is a superkey or A is prime. From the set of FDs we see that for all the FDs, this is
      satisfied, therefore, the relation is in 3NF.
      A relation scheme is in BCNF, if for all the non-trivial FDs in F+ of the form X A, X is
      a superkey. From the set of FDs we can see that for the FD C A, this is not satisfied as LHS is not a superkey, therefore, the relation is not in BCNF.

      Hence, the given relation scheme is in 3NF.


      Given R {ABCD} and a set F of functional dependencies on R given as
      F = {AB->C,AB->D,C->A, D->B}. Find any two candidate keys of R. Show each
      step. In what normal form is R? Justify.

      With the given set of FDs, it is not possible to derive or access all the other attributes based on single attribute (e.g., {A}+ = {A}, {B}+ = {B}, {C}+ = {CA}, {D}+ = {DB}).
      Therefore, we have to chose the composite keys:
      1. If we assume X = {AB} then X+ = {ABCD} because we can determine the
      attributes C and D from AB as per F.
      2. If we assume X = {CD} then X+ = {CDAB} because we can determine the
      attributes A and B from C and D respectively as per F.

      The other possible candidate keys are: {CB} and {AD}.

      If we choose, {CD} as the primary key of the relation R, then FDs C A and D B are
      partial functional dependencies as CD is the key of the relation. Therefore, the given relation R is not in 2NF and therefore first normal form (1 NF).

      Find 3NF decomposition of the relation scheme,
      R = {Faculty, Dean, Dept, Chairperson, Professor, Rank, Student} with the set of
      functional dependencies,
      F = {Faculty->Dean 

      Department->Chairperson
      Professor->Rank, Chairperson

      Department->Faculty
      Student ->Department, Faculty,Dean 

      Dean ->Faculty
      Professor, Rank ->Department, Faculty}


      To find out 3NF decomposition of a relation we use canonical cover (Fc).
      Requirements for canonical cover (Fc):

      1. Each FD in Fc must be simple.
      F’ = {Faculty ->Dean, Department->Chairperson, Professor-> Rank,
      Professor ->Chairperson, Department ->Faculty, Student->Department, Student->Faculty, Student->Dean, Dean->Faculty, ProfessorRank->Department,
      ProfessorRank->Faculty}

      2. Each FD in Fc must be left reduced.
      As given in F, the attribute Rank is functionally dependent on the Professor. Therefore the FDs ProfessorRank->Department and ProfessorRank->Faculty can be left reduced to Professor->Department and Professor->Faculty respectively based on the Augmentation axiom.
      F’’ = {Faculty->Dean, Department->Chairperson, Professor->Rank, Professor->Chairperson, Department->Faculty, Student->Department, Student->Faculty, Student->Dean, Dean->Faculty, Professor->Department, Professor->Faculty}

      3. Each FD in Fc must be non-redundant.
      The FDs Professor->Chairperson, Student->Faculty, Student->Dean and
      Professor->Faculty are redundant and have to be removed based on the Transitivity
      axiom.
      Fc = {Faculty->Dean, Department->Chairperson, Professor->Rank, Department->Faculty, Student->Department, Dean->Faculty, Professor->Department}

      To make lossless join decompositions, it is required to preserve the key of the original relation. The key of the relation R is {Professor, Student}. The 3NF decompositions of the given relation are:

      R1 = {Faculty, Dean} F1= {Faculty Dean, Dean Faculty}
      R2 = {Department, Chairperson, Faculty} F2 = {Department Chairperson,Faculty}
      R3 = {Professor, Rank, Department} F3 = {Professor Rank,Department}
      R4 = {Student, Department} F4 = {Student Department }
      R5 = {Professor, Student} F5 = {ProfessorStudent f}


      Big-O Notation Part-2

      A Plain English Explanation of the Need for Big-O Notation:
      When we program, we are trying to solve a problem. What we code is called an algorithm. Big O notation allows us to compare the worse case performance of our algorithms in a standardized way. Hardware specs vary over time and improvements in hardware can reduce the time it takes an algorithms to run. But replacing the hardware does not mean our algorithm is any better or improved over time, as our algorithm is still the same. So in order to allow us to compare different algorithms, to determine if one is better or not, we use Big O notation.
      A Plain English Explanation of What Big O Notation is:
      Not all algorithms run in the same amount of time, and can vary based on the number of items in the input, which we'll call n. Based on this, we consider the worse case analysis, or an upper-bound of the run-time as n get larger and larger. We must be aware of what n is, because many of the Big O notations reference it.
      A Plain English Explanation of How to find the Big O Notation, the General Idea:
      Algorithms can do one thing or multiple things. If your algorithm does multiple things, then one particular part may determine the overall Big O notation of the algorithm. The question is when you do multiple things, how much times does it take to do the individual parts, and do they depend on other parts of the algorithm, or are they independent? If they are independent, then you add the Big O notation together and the worse performing notation wins. If the separate parts are dependent upon each other, then they may not be added, and you may have a multiplication relations ship (like nested loops for bubble sort), or some other relationship (e.g. Powers, logs, etc.). Below I show some very simple analysis of some very simple algorithms.
      Big-O Analysis of Some Very Simple Algorithms:
      Algorithm 1) If you have an algorithm that prints a static statement (e.g. "Hello World"), then this will always run in the same amount of time, no matter how large the input is (because we aren't doing anything with any input). An algorithm that does something like this takes a constant amount of time, which is denoted as O(1) in Big O notation. O(1) is classified as constant time.
      Algorithm 2) If you have an algorithm that gets a list of names, then prints hello to each of those people, then your algorithm will take longer. How long it takes is based on how many people you have in the list to say hello to. This type of algorithm takes O(n) in Big O notation. O(n) is classified as linear time.
      Note: Linear time algorithms perform worse than constant time algorithms because as n becomes larger and larger, it takes a lot more time to do a linear time algorithm than it takes to do a constant time algorithm.
      We are now going to makes some slight modifications to the above algorithms and see how those changes affect the Big O notation.
      Algorithm 3) If we modified Algorithm 1, to say both "Hello World" then "Goodbye World", then this algorithm can be seen as doing "multiple" things. The question is how do these things multiple things relate to each other and how do they affect the overall Big O notation. Printing "Hello World" does not depend on the other print statement, and printing "Goodbye World" does not depend on any computation from the first part. So each of these tasks can be seen as independent of each other. From our analysis of Algorithm 1, we know the "hello" part takes O(1) and the goodbye part takes O(1). Both of these parts are constant and since they are independent of each other, we add their separate times together together (e.g. 1+1). Note that our result is also constant (e.g. 2). Since the result is constant, we know Algorithm 3 is O(1), the constant time classification.
      Algorithm 4) If we modified Algorithm 2 to say hello to everyone then goodbye to everyone. Then this is similar to Algorithm 3, where the hellos are independent of the goodbyes. So the first part takes O(n)time and the 2nd part takes O(n) time. n+n is 2n, which is linear. So the result of Algorithm 4 is O(n).
      Algorithm 5) We create an algorithm that says "hello world" (Algorithm 1) then says hello to everyone in the world list (Algorithm 2). The first part takes O(1) time and the 2nd part takes O(n) time. These 2 parts are independent of each other, so it would be O(1) + O(n) time. Since Big O notation looks at the upper-bound, its a question of which of O(1) and O(n) has the higher upper-bound (or worse performance). From above, we know O(n) takes more time, so O(1)+O(n)=O(n), meaning Algorithm 5 is O(n).

      Big O
      f(x) = O(g(x)) when x goes to a (for example, a = +∞) means that there is a function k such that:
      1. f(x) = k(x)g(x)
      2. k is bounded in some neighborhood of a (if a = +∞, this means that there are numbers N and M such that for every x > N, |k(x)| < M).
      In other words, in plain English: f(x) = O(g(x)), x → a, means that in a neighborhood of a, f decomposes into the product of g and some bounded function.
      Small o
      By the way, here is for comparison the definition of small o.
      f(x) = o(g(x)) when x goes to a means that there is a function k such that:
      1. f(x) = k(x)g(x)
      2. k(x) goes to 0 when x goes to a.
      Examples
      • sin x = O(x) when x → 0.
      • sin x = O(1) when x → +∞,
      • x^2 + x = O(x) when x → 0,
      • x^2 + x = O(x^2) when x → +∞,
      • ln(x) = o(x) = O(x) when x → +∞.
      Attention! The notation with the equal sign "=" uses a "fake equality": it is true that o(g(x)) = O(g(x)), but false that O(g(x)) = o(g(x)). Similarly, it is ok to write "ln(x) = o(x) when x → +∞", but the formula "o(x) = ln(x)" would make no sense.
      It is very difficult to measure the speed of software programs, and when we try, the answers can be very complex and filled with exceptions and special cases. This is a big problem, because all those exceptions and special cases are distracting and unhelpful when we want to compare two different programs with one another to find out which is "fastest".
      As a result of all this unhelpful complexity, people try to describe the speed of software programs using the smallest and least complex (mathematical) expressions possible. These expressions are very very crude approximations: Although, with a bit of luck, they will capture the "essence" of whether a piece of software is fast or slow.
      Because they are approximations, we use the letter "O" (Big Oh) in the expression, as a convention to signal to the reader that we are making a gross oversimplification. (And to make sure that nobody mistakenly thinks that the expression is in any way accurate).
      If you read the "Oh" as meaning "on the order of" or "approximately" you will not go too far wrong. (I think the choice of the Big-Oh might have been an attempt at humour).
      The only thing that these "Big-Oh" expressions try to do is to describe how much the software slows down as we increase the amount of data that the software has to process. If we double the amount of data that needs to be processed, does the software need twice as long to finish it's work? Ten times as long? In practice, there are a very limited number of big-Oh expressions that you will encounter and need to worry about:
      The good:
      • O(1) Constant: The program takes the same time to run no matter how big the input is.
      • O(log n) Logarithmic: The program run-time increases only slowly, even with big increases in the size of the input.
      The bad:
      • O(n) Linear: The program run-time increases proportionally to the size of the input.
      • O(n^k) Polynomial: - Processing time grows faster and faster - as a polynomial function - as the size of the input increases.
      ... and the ugly:
      • O(k^n) Exponential The program run-time increases very quickly with even moderate increases in the size of the problem - it is only practical to process small data sets with exponential algorithms.
      • O(n!) Factorial The program run-time will be longer than you can afford to wait for anything but the very smallest and most trivial-seeming datasets.

      Big-O Notation a small and simple explanation

      Big-O notation is a relative representation of the complexity of an algorithm.
      There are some important and deliberately chosen words in that sentence:
      • relative: you can only compare apples to apples. You can't compare an algorithm to do arithmetic multiplication to an algorithm that sorts a list of integers. But two algorithms that do arithmetic operations (one multiplication, one addition) will tell you something meaningful;
      • representation: Big-O (in its simplest form) reduces the comparison between algorithms to a single variable. That variable is chosen based on observations or assumptions. For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering). This assumes that comparison is expensive. But what if comparison is cheap but swapping is expensive? It changes the comparison; and
      • complexity: if it takes me one second to sort 10,000 elements how long will it take me to sort one million? Complexity in this instance is a relative measure to something else.
      Come back and reread the above when you've read the rest.
      The best example of Big-O I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learnt in school were:
      • addition;
      • subtraction;
      • multiplication; and
      • division.
      Each of these is an operation or a problem. A method of solving these is called an algorithm.
      Addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.
      Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.
      See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits n in the larger number. We call this O(n) or linear complexity.
      Subtraction is similar (except you may need to borrow instead of carry).
      Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.
      If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (1012) multiplications and two million adds.
      As the algorithm scales with n-squared, this is O(n2) or quadratic complexity. This is a good time to introduce another important concept:
      We only care about the most significant portion of complexity.
      The astute may have realized that we could express the number of operations as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.0002% of the total operations by that stage).
      One can notice that we've assumed the worst case scenario here. While multiplying 6 digit numbers if one of them is 4 digit and the other one is 6 digit, then we only have 24 multiplications. Still we calculate the worst case scenario for that 'n', i.e when both are 6 digit numbers. Hence Big-O notation is about the Worst-case scenario of an algorithm

      The Telephone Book

      The next best example I can think of is the telephone book, normally called the White Pages or similar but it'll vary from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.
      Now if you were instructing a computer to look up the phone number for "John Smith" in a telephone book that contains 1,000,000 names, what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?
      A typical implementation might be to open up to the middle, take the 500,000th and compare it to "Smith". If it happens to be "Smith, John", we just got real lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.
      This is called a binary search and is used every day in programming whether you realize it or not.
      So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 20 times. In comparing search algorithms we decide that this comparison is our 'n'.
      • For a phone book of 3 names it takes 2 comparisons (at most).
      • For 7 it takes at most 3.
      • For 15 it takes 4.
      • For 1,000,000 it takes 20.
      That is staggeringly good isn't it?
      In Big-O terms this is O(log n) or logarithmic complexity. Now the logarithm in question could be ln (base e), log10, log2 or some other base. It doesn't matter it's still O(log n) just like O(2n2) and O(100n2) are still both O(n2).
      It's worthwhile at this point to explain that Big O can be used to determine three cases with an algorithm:
      • Best Case: In the telephone book search, the best case is that we find the name in one comparison. This is O(1) or constant complexity;
      • Expected Case: As discussed above this is O(log n); and
      • Worst Case: This is also O(log n).
      Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.
      Back to the telephone book.
      What if you have a phone number and want to find a name? The police have a reverse phone book but such lookups are denied to the general public. Or are they? Technically you can reverse lookup a number in an ordinary phone book. How?
      You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is unordered (by phone number anyway).
      So to find a name:
      • Best Case: O(1);
      • Expected Case: O(n) (for 500,000); and
      • Worst Case: O(n) (for 1,000,000).

      The Travelling Salesman

      This is quite a famous problem in computer science and deserves a mention. In this problem you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Travelling Salesman problem is to find the shortest tour that visits every town.
      Sounds simple? Think again.
      If you have 3 towns A, B and C with roads between all pairs then you could go:
      • A → B → C
      • A → C → B
      • B → C → A
      • B → A → C
      • C → A → B
      • C → B → A
      Well actually there's less than that because some of these are equivalent (A → B v C and C → B → A are equivalent, for example, because they use the same roads, just in reverse).
      In actuality there are 3 possibilities.
      • Take this to 4 towns and you have (iirc) 12 possibilities.
      • With 5 it's 60.
      • 6 becomes 360.
      This is a function of a mathematical operation called a factorial. Basically:
      • 5! = 5 × 4 × 3 × 2 × 1 = 120
      • 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
      • 7! = 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5040
      • 25! = 25 × 24 × … × 2 × 1 = 15,511,210,043,330,985,984,000,000
      • 50! = 50 × 49 × … × 2 × 1 = 3.04140932 × 1064
      So the Big-O of the Travelling Salesman problem is O(n!) or factorial or combinatorial complexity.
      By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.
      Something to think about.

      Polynomial Time

      Another point I wanted to make quick mention of is that any algorithm that has a complexity of O(na) is said to have polynomial complexity or is solvable in polynomial time.
      Traditional computers can solve polynomial-time problems. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.

      It shows how an algorithm scales.
      O(n^2):
      • 1 item: 1 second
      • 10 items: 100 seconds
      • 100 items: 10000 seconds
      Notice that the number of items increases by a factor of 10, but the time increases by a factor of 10^2. Basically, n=10 and so O(n^2) gives us the scaling factor n^2 which is 10^2.
      O(n):
      • 1 item: 1 second
      • 10 items: 10 seconds
      • 100 items: 100 seconds
      This time the number of items increases by a factor of 10, and so does the time. n=10 and so O(n)'s scaling factor is 10.
      O(1):
      • 1 item: 1 second
      • 10 items: 1 second
      • 100 items: 1 second
      The number of items is still increasing by a factor of 10, but the scaling factor of O(1) is always 1.
      That's the gist of it. They reduce the maths down so it might not be exactly n^2 or whatever they say it is, but that'll be the dominating factor in the scaling.

      Big-O notation (also called "asymptotic growth" notation) is what functions "look like" when you ignore constant factors and stuff near the origin. We use it to talk about how thing scale.

      Basics
      for "sufficiently" large inputs...
      • f(x) ∈ O(upperbound) means f "grows no faster than" upperbound
      • f(x) ∈ Ɵ(justlikethis) mean f "grows exactly like" justlikethis
      • f(x) ∈ Ω(lowerbound) means f "grows no slower than" lowerbound
      big-O notation doesn't care about constant factors: the function 9x² is said to "grow exactly like"10x². Neither does big-O asymptotic notation care about non-asymptotic stuff ("stuff near the origin" or "what happens when the problem size is small"): the function 10x² is said to "grow exactly like" 10x² - x + 2.
      Why would you want to ignore the smaller parts of the equation? Because they become completely dwarfed by the big parts of the equation as you consider larger and larger scales; their contribution becomes dwarfed and irrelevant. (See example section.)
      Put another way, it's all about the ratio. If you divide the actual time it takes by the O(...), you will get a constant factor in the limit of large inputs. Intuitively this makes sense: functions "scale like" one another if you can multiply one to get the other. That is, when we say...
      actualAlgorithmTime(N) ∈ O(bound(N))
                                             e.g. "time to mergesort N elements 
                                                   is O(N log(N))"
      
      ... this means that for "large enough" problem sizes N (if we ignore stuff near the origin), there exists some constant (e.g. 2.5, completely made up) such that:
      actualAlgorithmTime(N)                 e.g. "mergesort_duration(N)       "
      ────────────────────── < constant            ───────────────────── < 2.5 
             bound(N)                                    N log(N)         
      
      There are many choices of constant; often the "best" choice is known as the "constant factor" of the algorithm... but we often ignore it like we ignore non-largest terms (see Constant Factors section for why they don't usually matter). You can also think of the above equation as a bound, saying "In the worst-case scenario, the time it takes will never be worse than roughly N*log(N), within a factor of 2.5 (a constant factor we don't care much about)".
      In general, O(...) is the most useful one because we often care about worst-case behavior. If f(x)represents something "bad" like processor or memory usage, then "f(x) ∈ O(upperbound)" means "upperbound is the worse-case scenario of processor/memory usage".

      Intuition
      This lets us make statements like...
      "For large enough inputsize=N, and a constant 
       factor of 1, if I double the input size...
        ... I double the time it takes."        ( O(N) )
        ... I quadruple the time it takes."     ( O(N²) )
        ... I add 1 to the time it takes."      ( O(log(N)) )
        ... I don't change the time it takes."  ( O(1) )
      

      Applications
      As a purely mathematical construct, big-O notation is not limited to talking about processing time and memory. You can use it to discuss the asymptotics of anything where scaling is meaningful, such as:
      • the number of possibly handshakes among N people at a party (Ɵ(N²), specifically N(N-1)/2, but what matters is that it "scales like" )
      • probabilistic expected number of people who have seen some viral marketing as a function of time
      • how website latency scales with the number of processing units in a CPU or GPU or computer cluster
      • how heat output scales on CPU dies as a function of transistor count, voltage, etc.

      Example
      For the handshake example, #handshakes ∈ Ɵ(N²). The number of handshakes is exactly n-choose-2 or (N²-N)/2 (each of N people shakes the hands of N-1 other people, but this double-counts handshakes so divide by 2). However, for very large numbers of people, the linear term N is dwarfed and effectively contributes 0 to the ratio. Therefore the scaling behavior is order N², or the number of handshakes "grows like N²".
      #handshakes(N)
      ────────────── ≈ 1/2
           N²
      
      If you wanted to prove this to yourself, you could perform some simple algebra on the ratio to split it up into multiple terms (lim means "considered in the limit of", you can ignore it if it makes you feel better):
          N²/2 - N/2         (N²)/2   N/2         1/2
      lim ────────── = lim ( ────── - ─── ) = lim ─── = 1/2
      N→∞     N²       N→∞     N²     N²      N→∞  1
                                     ┕━━━┙
                   this is 0 in the limit of N→∞:
                   graph it, or plug in a really large number for N
      

      Constant factors
      Usually we don't care what the specific constant factors are, because they don't affect the way the function grows. For example, two algorithm may both take O(N) time to complete, but one may be twice as slow as the other. We usually don't care too much unless the factor is very large, since optimizing is tricky business ( When is optimisation premature? ); also the mere act of picking an algorithm with a better big-O will often improve performance by orders of magnitude.
      Some asymptotically superior algorithms (e.g. a non-comparison O(N log(log(N))) sort) can have so large a constant factor (e.g. 100000*N log(log(N))), or overhead that is relatively large like O(N log(log(N))) with a hidden + 100*N, that they are rarely worth using even on "big data".

      Why O(N) is sometimes the best you can do, i.e. why we need datastructures
      O(N) algorithms are in some sense the "best" algorithms if you need to read all your data. The very act of reading a bunch of data is an O(N) operation. Loading it into memory is usually O(N) (or faster if you have hardware support, or no time at all if you've already read the data). However if you touch or even look at every piece of data (or even every other piece of data), your algorithm will takeO(N) time to perform this looking. Nomatter how long your actual algorithm takes, it will be at leastO(N) because it spent that time looking at all the data.
      The same can be said for the very act of writing. For example, all algorithms which print out all permutations of a number N are O(N!) because the output is at least that long.
      This motivates the use of data structures: a data structure requires reading the data only once (usuallyO(N) time), plus some arbitrary amount of preprocessing (e.g. O(N) or O(N log(N)) or O(N²)) which we try to keep small. Thereafter, modifying the data structure (insertions / deletions / etc.) and making queries on the data take very little time, such as O(1) or O(log(N)). You then proceed to make a large number of queries! In general, the more work you're willing to do ahead of time, the less work you'll have to do later on.
      For example, say you had the latitude and longitude coordinates of millions of roads segments, and wanted to find all street intersections.
      • Naive method: If you had the coordinates of a street intersection, and wanted to examine nearby streets, you would have to go through the millions of segments each time, and check each one for adjacency.
      • If you only needed to do this once, it would not be a problem to have to do the naive method ofO(N) work only once, but if you want to do it many times (in this case, N times, once for each segment), we'd have to do O(N²) work, or 1000000²=1000000000000 operations. Not good (a modern computer can perform about a billion operations per second).
      • If we use a simple structure called a hash table (an instant-speed lookup table, also known as a hashmap or dictionary), we pay a small cost by preprocessing everything in O(N) time. Thereafter, it only takes constant time on average to look up something by its key (in this case, our key is the latitude and longitude coordinates, rounded into a grid; we search the adjacent gridspaces of which there are only 9, which is a constant).
      • Our task went from an infeasible O(N²) to a manageable O(N), and all we had to do was pay a minor cost to make a hash table.
      The moral of the story: a data structure lets us speed up operations. Even more advanced data structures can let you combine, delay, or even ignore operations in incredibly clever ways, like leaving the equivalent of "to-do" notes at junctions in a tree.

      Amortized / average-case complexity
      There is also the concept of "amortized" or "average case". This is no more than using big-O notation for the expected value of a function, rather than the function itself. For example, some data structures may have a worse-case complexity of O(N) for a single operation, but guarantee that if you do many of these operations, the average-case complexity will be O(1).

      Multidimensional big-O
      Most of the time, people don't realize that there's more than one variable at work. For example, in a string-search algorithm, your algorithm may take time O([length of text] + [length of query]), i.e. it is linear in two variables like O(N+M). Other more naive algorithms may be O([length of text]*[length of query]) or O(N*M). Ignoring multiple variables is one of the most common oversights I see in algorithm analysis, and can handicap you when designing an algorithm.

      The whole story
      Keep in mind that big-O is not the whole story. You can drastically speed up some algorithms by using caching, making them cache-oblivious, avoiding bottlenecks by working with RAM instead of disk, using parallelization, or doing work ahead of time -- these techniques are often independent of the order-of-growth "big-O" notation, though you will often see the number of cores in the big-O notation of parallel algorithms.
      Also keep in mind that due to hidden constraints of your program, you might not really care about asymptotic behavior. You may be working with a bounded number of values, for example:
      • If you're sorting something like 5 elements, you don't want to use the speedy O(N log(N))quicksort; you want to use insertion sort, which happens to perform well on small inputs. These situations often comes up in divide-and-conquer algorithms, where you split up the problem into smaller and smaller subproblems, such as recursive sorting, fast Fourier transforms, or matrix multiplication.
      • If some values are effectively bounded due to some hidden fact (e.g. the average human name is softly bounded at perhaps 40 letters, and human age is softly bounded at around 150). You can also impose bounds on your input to effectively make terms constant.
      In practice, even among algorithms which have the same or similar asymptotic performance, their relative merit may actually be driven by other things, such as: other performance factors (quicksort and mergesort are both O(N log(N)), but quicksort takes advantage of CPU caches); non-performance considerations, like ease of implementation; whether a library is available, and how reputable and maintained the library is.
      Many things can implicitly contribute to the running time's constant factor, such as whether you run your algorithm on a 500MHz computer vs 2GHz computer, whether your programming language is interpreted or using a JIT compiler, whether you are doing a constant amount of extra work in a critical section of code, etc. The effect may be small (e.g. 0.9x speed) or large (e.g. 0.01x speed) compared to a different implementation and/or environment. Do you switch languages to eek out that little extra constant factor of work? That literally depends on a hundred other reasons (necessity, skills, coworkers, programmer productivity, the monetary value of your time, familiarity, workarounds, why not assembly or GPU, etc...), which may be more important than performance.
      The above issues, like programming language, are almost never considered as part of the constant factor (nor should they be); yet one should be aware of them, because sometimes (though rarely) they may not be constant. For example in cpython, the native priority queue implementation is asymptotically non-optimal (O(log(N)) rather than O(1) for your choice of insertion or find-min); do you use another implementation? Probably not, since the C implementation is probably faster, and there are probably other similar issues elsewhere. There are tradeoffs; sometimes they matter and sometimes they don't.

      Math addenda
      For completeness, the precise definition of big-O notation is as follows: f(x) ∈ O(g(x)) means that "f is asymptotically upper-bounded by const*g": ignoring everything below some finite value of x, there exists a constant such that |f(x)| ≤ const * |g(x)|. (The other symbols are as follows: just likeO means ≤, Ω means ≥. There are lowercase variants: o means <, and ω means >.) f(x) ∈ Ɵ(g(x)) means both f(x) ∈ O(g(x)) and f(x) ∈ Ω(g(x)) (upper- and lower-bounded by g): there exists some constants such that f will always lie in the "band" between const1*g(x) andconst2*g(x). It is the strongest asymptotic statement you can make and roughly equivalent to ==. (Sorry, I elected to delay the mention of the absolute-value symbols until now, for clarity's sake; especially because I have never seen negative values come up in a computer science context.)
      People will often use = O(...). It is technically more correct to use ∈ O(...).  means "is an element of". O(N²) is actually an equivalence class, that is, it is a set of things which we consider to be the same. In this particular case, O(N²) contains elements like {2 N², 3 N², 1/2 N², 2 N² + log(N), - N² + N^1.9, ...} and is infinitely large, but it's still a set. People will know what you mean if you use = however. Additionally, it is often the case that in a casual setting, people will say O(...)when they mean Ɵ(...); this is technically true since the set of things Ɵ(exactlyThis) is a subset of O(noGreaterThanThis)... and it's easier to type. ;-)

      Big O describes an upper limit on the growth behaviour of a function, for example the runtime of a program, when inputs become large.
      Examples:
      • O(n): If I double the input size the runtime doubles
      • O(n2): If the input size doubles the runtime quadruples
      • O(log n): If the input size doubles the runtime increases by one
      • O(2n): If the input size increases by one, the runtime doubles
      The input size is usually the space in bits needed to represent the input.

      Big O notation is most commonly used by programmers as an approximate measure of how long a computation (algorithm) will take to complete expressed as a function of the size of the input set.
      Big O is useful to compare how well two algorithms will scale up as the number of inputs is increased.
      More precisely Big O notation is used to express the asymptotic behavior of a function. That means how the function behaves as it approaches infinity.
      In many cases the "O" of an algorithm will fall into one of the following cases:
      • O(1) - Time to complete is the same regardless of the size of input set. An example is accessing an array element by index.
      • O(Log N) - Time to complete increases roughly in line with the log2(n). For example 1024 items takes roughly twice as long as 32 items, because Log2(1024) = 10 and Log2(32) = 5. An example is finding an item in a binary search tree (BST).
      • O(N) - Time to complete that scales linearly with the size of the input set. In other words if you double the number of items in the input set, the algorithm takes roughly twice as long. An example is counting the number of items in a linked list.
      • O(N Log N) - Time to complete increases by the number of items times the result of Log2(N). An example of this is heap sort and quick sort.
      • O(N^2) - Time to complete is roughly equal to the square of the number of items. An example of this is bubble sort.
      • O(N!) - Time to complete is the factorial of the input set. An example of this is the traveling salesman problem brute-force solution.
      Big O ignores factors that do not contribute in a meaningful way to the growth curve of a function as the input size increases towards infinity. This means that constants that are added to or multiplied by the function are simply ignored.

      Big O is just a way to "Express" yourself in a common way, "How much time / space does it take to run my code?".
      You may often see O(n), O(n^2), O(nlogn) and so forth, all these are just ways to show; How does an algorithm change?
      O(n) means Big O is n, and now you might think, "What is n!?" Well "n" is the amount of elements. Imaging you want to search for an Item in an Array. You would have to look on Each element and as "Are you the correct element/item?" in the worst case, the item is at the last index, which means that it took as much time as there are items in the list, so to be generic, we say "oh hey, n is a fair given amount of values!".
      So then you might understand what "n^2" means, but to be even more specific, play with the thought you have a simple, the simpliest of the sorting algorithms; bubblesort. This algorithm needs to look through the whole list, for each item.
      My list
      1. 1
      2. 6
      3. 3
      The flow here would be:
      • Compare 1 and 6, which is biggest? Ok 6 is in the right position, moving forward!
      • Compare 6 and 3, oh, 3 is less! Let's move that, Ok the list changed, we need to start from the begining now!
      This is O n^2 because, you need to look at all items in the list there are "n" items. For each item, you look at all items once more, for comparing, this is also "n", so for every item, you look "n" times meaning n*n = n^2
      I hope this is as simple as you want it.
      But remember, Big O is just a way to experss yourself in the manner of time and space.
      Big O describes the fundamental scaling nature of an algorithm.
      There is a lot of information that Big O does not tell you about a given algorithm. It cuts to the bone and gives only information about the scaling nature of an algorithm, specifically how the resource use (think time or memory) of an algorithm scales in response to the "input size".
      Consider the difference between a steam engine and a rocket. They are not merely different varieties of the same thing (as, say, a Prius engine vs. a Lamborghini engine) but they are dramatically different kinds of propulsion systems, at their core. A steam engine may be faster than a toy rocket, but no steam piston engine will be able to achieve the speeds of an orbital launch vehicle. This is because these systems have different scaling characteristics with regards to the relation of fuel required ("resource usage") to reach a given speed ("input size").
      Why is this so important? Because software deals with problems that may differ in size by factors up to a trillion. Consider that for a moment. The ratio between the speed necessary to travel to the Moon and human walking speed is less than 10,000:1, and that is absolutely tiny compared to the range in input sizes software may face. And because software may face an astronomical range in input sizes there is the potential for the Big O complexity of an algorithm, it's fundamental scaling nature, to trump any implementation details.
      Consider the canonical sorting example. Bubble-sort is O(n^2) while merge-sort is O(n log n). Let's say you have two sorting applications, application A which uses bubble-sort and application B which uses merge-sort, and let's say that for input sizes of around 30 elements application A is 1,000x faster than application B at sorting. If you never have to sort much more than 30 elements then it's obvious that you should prefer application A, as it is much faster at these input sizes. However, if you find that you may have to sort ten million items then what you'd expect is that application B actually ends up being thousands of times faster than application A in this case, entirely due to the way each algorithm scales.