Monday, 24 June 2013

Three O(n2 ) Sorting Algorithms

17.1 Bubble Sort
    Bubble Sort is the simplest to code, but one of the worst 
performers.

void BubbleSort(Etype A[], const unsigned int N) 
{ 
  for (unsigned int top = N; top < 1; top--) 
  {
    for (unsigned int i = 1; i < top; i++) 
      if (A[i+1] < A[i]) 
        swap (A, i+1, i); 
  }
}

Sort the array A[] = {'\0' 'E', 'C', 'F', 'D', 'A', 'B'} by 
calling BubbleSort(A,6). The elements of A at the beginning of each 
iteration of the inner loop are (comparisons are shown in bold­face): 

ECFDAB
CEFDAB
CEFDAB    5 compares
CEDFAB    4 swaps
CEDAFB
------------------------------
CEDABF
CEDABF    4 compares
CDEABF    3 swaps
CDAEBF
------------------------------
CDABEF    3 compares
CDABEF    2 swaps
CADBEF
------------------------------
CABDEF    2 compares
ACBDEF    2 swaps
------------------------------
ABCDEF    1 compare, 0 swaps
------------------------------
The number of comparisons for bubble sort, C, is given by: 

C = (n - 1) + (n - 2) + ...... +     2 +       1           

writing the terms in reverse order, we also have
C =     1    +     2    + .......+ (n - 2) + (n - 1)

adding the 2 equations
2C =  (n + n + n + n + .... + n)    NOTE that there are (n - 1) terms

dividing by 2

C =   n ( n - 1) / 2

The number of swaps depends on the data.  In the WORST CASE, when the array is in
reverse order to begin with, the number of swaps is also  n ( n - 1) / 2.

The nifty proof that the sum of the integers from 1 to n-1 is due to Gauss.  He is said to have done
the proof while in elementary school


17.2 Selection sort
         
Selection sort is very much like bubble sort, but swaps only 
once per outer loop iteration. 

void SelectionSort(Etype A[], const unsigned int N) 
{ 
  unsigned int largeposition; // position largest element 
  for (unsigned int top = N; top < 1; top­­) 
  {
    largeposition = 1; 
    for (unsigned int j = 2; j <= top; j++) 
    {
      if (A[largeposition] < A[j]) 
        largeposition = j; 
    }
    if (top != largeposition) 
      swap(A, top, largeposition); 
  }
}


Sort the array A[] = --'``0', 'E', 'C', 'F', 'D', 'A', 'B' by 
calling SelectionSort(A, 6). The elements of A at the beginning of 
each iteration of the inner loop are (comparisons are shown in bold­face): 

ECFDAB
ECFDAB
ECFDAB     5 compares
ECFDAB     1 swap
ECFDAB
ECBDAF  (swap)
------------------------------
ECBDAF
ECBDAF     4 compares
ECBDAF     1 swap
ECBDAF
ACBDEF  (swap)
------------------------------
ACBDEF     3 compares
ACBDEF     0 swaps
ACBDEF
ACBDEF  (no swap)
------------------------------
ACBDEF     2compares
ACBDEF     1 swap
ABCDEF  (swap)
------------------------------
ABCDEF     1 compare; 0 swaps
ABCDEF  (no swap)
 The number of comparisons, C, is the same as for BubbleSort, namely 

       C = n (n - 1) / 2   compares, 

The number of swaps depends on the data, but there is never more than one per outer-loop iteration.
Therefore, in the worst case, there are (n - 1) swaps made.

Since swaps are typically more expensive than compares, 
SelectionSort is generally a better algorithm than BubbleSort. 


17.3 Insertion Sort
Suppose the array is known to be sorted for the first k elements, k < N
(i.e. the ``left­hand side'' of the array is sorted). We can then ``insert'' the k + 1­st 
element into the appropriate place in the already sorted left­hand side, 
making room by moving elements as needed. This increases the size of the 
sorted left­hand side. Continue doing this until the entire array is sorted. 

An array of one element is already sorted. 

void InsertionSort(Etype A[], const unsigned int N) 
{ 
  Etype tmp; 

  A[0] = MinVal();                   // smallest value of Etype 
  for (int P = 2; P <= N; P++) 
  { 
    tmp = A[P]; 

    for (int j = P; tmp < A[j - 1]; j--) 
      A[j] = A[j - 1]; 

    A[j] = tmp; 
  }
}
  
Sort the array A[] = {``0', 'E', 'C', 'F', 'D', 'A', 'B'} by 
calling InsertionSort(A, 6). The elements of A at the beginning of 
each iteration of the inner loop are (the sorted left­hand side is shown in 
bold­face): 

ECFDAB
CEFDAB   1 compare; 1 swap
CEFDAB   1 compare; 0 swap
CEDFAB   3 compares; 2 swaps
ACDEFB   4 compare; 4 swaps
ABCDEF   5 compare; 4 swaps
The number of comparisons and swaps, in the worst case (array 
reverse­ordered), is  n (n - 1) / 2

The number of comparisons, in the best case (array already sorted), is 
N ­ 1, (the outer loop goes from 2 through N, and the inner loop test 
always fails). 

The number of swaps, in the best case, is zero. 

18 Proof by Induction
1. Prove true for a base case. 
2. Assume true for all cases up to some limit (the inductive hypothesis). 
3. Using the assumption in 2, prove true for the limit. 

EXAMPLE: Prove Sum of the integers from 1 to n is   n (n + 1) / 2

1. Base Case: True for n = 1 since  1 * (1 + 1) / 2 = 1
2  Inductive Hypothesis: Assume true for all n < k 
3. Prove true for k = n + 1

 Substituting (n + 1) for n in our formula, we must show that
the sum of the integers from 1 to (n + 1) = (n + 1)(n + 1 + 1) / 2

The sum of the integers from 1 to (n + 1) 
    = (sum of integers from 1 to n) + (n + 1)
    = 1/2[n(n + 1)] + (n + 1) 
    = 1/2[n(n + 1) + (n + 1) + (n + 1)] 
    = 1/2[n**2 + 3n + 2] 
     = (n + 2)(n + 1) / 2


19Asymptotic Analysis

DEFINITION: T (n) = O(f(n)) iff there are constants c and n0
such that T (n) <= cf(n) when n >= n0.
This is pronounced T (n) is "Big­Oh" of f(n). It means that the function
f(n) is an upper bound on the function T (n).


DEFINITION: T (n) = W (n)) iff there are constants c and n>0 such that T (n) – cf(n) when n >= n0.
This is pronounced T(n) is "Omega" of f(n). It means that the function
f(n) is a lower bound on the function T(n).

DEFINITION: T (n) = q (f(n)) iff T (n) = O(f(n)) and T (n) = W (f(n))
This is pronounced T (n) is "Theta" of f(n). It means that the function
f(n) is a tight bound on the function T (n).


DEFINITION: T (n) = o(f(n)) iff there are constants c and n0 such that T (n) < cf(n) when n >= n0.
This is pronounced T (n) is ``little-oh'' of f(n). It means that the function
f(n) is a loose upper bound on the function T(n). An alternative way of
defining little-oh is that T (n) = o(f(n)) iff T (n) = O(f(n)) and
T(n) not equal q (f (n)).


Asymptotic analysis is concerned with relative rates of growth.
EXAMPLE: 1000n > n2 for small values of n, but n2 grows faster and will
eventually become (and remain) greater than 1000n. It will do so for
n > 1000.

In the analysis of BubbleSort, SelectionSort, and InsertionSort, we
found that the worst case number of comparisons was n(n - 1)/2
What is O(n (n - 1) / 2)?


THEOREM: O(cf(x)) = O(f(x)) (constants don't matter)
PROOF:
T(x) = O(cf(x)) implies that there are constants c0 and n 0 such that
T(x) <= c0 (cf(x)) when x >= n0
Therefore, T(x) <= c1f(x) when x >= n0 where c1 = c0c
Therefore, T(x) = O(f(x))


THEOREM: Let T1 (n) = O(f(n)) and T2 (n) = O(g(n)). Then, T1 (n) + T2 (n) = max(O(f(n)); O(g(n))) (the sum rule)
PROOF:
T1(n) <= c1f(n) for n >= n1
T2(n) <= c2g(n) for n >= n2
Let n0 = max(n1 , n2 )
Then, for n >= n0 , T1 (n) + T2 (n) <= c1f(n) + c2g(n)
Let c3 = max(c1,c2 )
Then
       T1(n) + T2(n) <= c3 f(n) + c3 g(n)
                     <= 2c3max(f(n), g(n))
                     <= cmax(f(n), g(n))

THEOREM: If T(n) is a polynomial of degree x, then T (n) = O(nx ) PROOF:
T(n) = nx + nx-1+ . . . + k
By the sum rule above, the largest term dominates
Therefore T(n) = O(nx)

THEOREM: If T1(n) = O(f(n)) and T2(n) = O(g(n)) then T1(n) * T2(n) = O(f(n)g(n)) (the product rule)
PROOF:
T1(n) * T2(n) <= c1c2f(n)g(n) <= cf(n)g(n) when n >= n0
Therefore, T1(n)T2(n) = O(f(n)g(n))

THEOREM: logk n = O(n) for any positive constant k Note that logk n means (log n)k
We must show logk n <= cn for n >= n0
Equivalently, we can show log n <= cn1/k
Letting a = 1/k ,
we will show that log n = O(na), for any positive constant a
Use L'Hospital's rule:

lim         log n    =    lim     log e / n    =    lim   c2 = 0
n® ¥       cn a           n ®¥      acna-1        n ®¥  n a
Therefore, log n <= cna for n >= n0

THEOREM: nk = O(an) for a > 1
(no polynomial is bigger than an exponential) PROOF:
Use L'Hospital's rule

lim     nk    =   lim     knk-1

n® ¥   an       n® ¥    an ln a

            = lim      k(k-1)nk-2
              n® ¥     an lnk a
            = . . .
            = . . .
            = lim     k(k-1)(k-2). . . 1
               n®¥       an lnk a

            = 0
 


19.1 Some Common Orders of Growth Here are some commonly encountered orders of growth, in increasing order:
  • any constant is O(1)
  • logk n for k < 1
  • logn
  • logk n for k > 1
  • nk for k < 1
  • n (linear)
  • n log n
  • n1+k for k > 0 (polynomials)
  • 2n (exponential)

No comments:

Post a Comment