Real Computer Science begins where we almost stop reading ...: Three O(n2 ) Sorting Algorithms

Monday, 24 June 2013

Three O(n2 ) Sorting Algorithms

17.1 Bubble Sort

    Bubble Sort is the simplest to code, but one of the worst 
performers.

void BubbleSort(Etype A[], const unsigned int N) 
{ 
  for (unsigned int top = N; top < 1; top--) 
  {
    for (unsigned int i = 1; i < top; i++) 
      if (A[i+1] < A[i]) 
        swap (A, i+1, i); 
  }
}

Sort the array A[] = {'\0' 'E', 'C', 'F', 'D', 'A', 'B'} by 
calling BubbleSort(A,6). The elements of A at the beginning of each 
iteration of the inner loop are (comparisons are shown in boldface):

E	C	F	D	A	B
C	E	F	D	A	B
C	E	F	D	A	B	5 compares
C	E	D	F	A	B	4 swaps
C	E	D	A	F	B

------------------------------

C	E	D	A	B	F
C	E	D	A	B	F	4 compares
C	D	E	A	B	F	3 swaps
C	D	A	E	B	F

------------------------------

C	D	A	B	E	F	3 compares
C	D	A	B	E	F	2 swaps
C	A	D	B	E	F

------------------------------

C	A	B	D	E	F	2 compares
A	C	B	D	E	F	2 swaps

------------------------------

1 compare, 0 swaps

------------------------------

The number of comparisons for bubble sort, C, is given by: 

C = (n - 1) + (n - 2) + ...... +     2 +       1           

writing the terms in reverse order, we also have
C =     1    +     2    + .......+ (n - 2) + (n - 1)

adding the 2 equations
2C =  (n + n + n + n + .... + n)    NOTE that there are (n - 1) terms

dividing by 2

C =   n ( n - 1) / 2

The number of swaps depends on the data.  In the WORST CASE, when the array is in
reverse order to begin with, the number of swaps is also  n ( n - 1) / 2.

The nifty proof that the sum of the integers from 1 to n-1 is due to Gauss.  He is said to have done
the proof while in elementary school

17.2 Selection sort

         
Selection sort is very much like bubble sort, but swaps only 
once per outer loop iteration. 

void SelectionSort(Etype A[], const unsigned int N) 
{ 
  unsigned int largeposition; // position largest element 
  for (unsigned int top = N; top < 1; top) 
  {
    largeposition = 1; 
    for (unsigned int j = 2; j <= top; j++) 
    {
      if (A[largeposition] < A[j]) 
        largeposition = j; 
    }
    if (top != largeposition) 
      swap(A, top, largeposition); 
  }
}


Sort the array A[] = --'``0', 'E', 'C', 'F', 'D', 'A', 'B' by 
calling SelectionSort(A, 6). The elements of A at the beginning of 
each iteration of the inner loop are (comparisons are shown in boldface):

E	C	F	D	A	B
E	C	F	D	A	B
E	C	F	D	A	B	5 compares
E	C	F	D	A	B	1 swap
E	C	F	D	A	B
E	C	B	D	A	F	(swap)

------------------------------

E	C	B	D	A	F
E	C	B	D	A	F	4 compares
E	C	B	D	A	F	1 swap
E	C	B	D	A	F
A	C	B	D	E	F	(swap)

------------------------------

A	C	B	D	E	F	3 compares
A	C	B	D	E	F	0 swaps
A	C	B	D	E	F
A	C	B	D	E	F	(no swap)

------------------------------

A	C	B	D	E	F	2compares
A	C	B	D	E	F	1 swap
A	B	C	D	E	F	(swap)

------------------------------

A	B	C	D	E	F	1 compare; 0 swaps
A	B	C	D	E	F	(no swap)

 The number of comparisons, C, is the same as for BubbleSort, namely 

       C = n (n - 1) / 2   compares, 

The number of swaps depends on the data, but there is never more than one per outer-loop iteration.
Therefore, in the worst case, there are (n - 1) swaps made.

Since swaps are typically more expensive than compares, 
SelectionSort is generally a better algorithm than BubbleSort.

17.3 Insertion Sort

Suppose the array is known to be sorted for the first k elements, k < N
(i.e. the ``lefthand side'' of the array is sorted). We can then ``insert'' the k + 1st 
element into the appropriate place in the already sorted lefthand side, 
making room by moving elements as needed. This increases the size of the 
sorted lefthand side. Continue doing this until the entire array is sorted. 

An array of one element is already sorted. 

void InsertionSort(Etype A[], const unsigned int N) 
{ 
  Etype tmp; 

  A[0] = MinVal();                   // smallest value of Etype 
  for (int P = 2; P <= N; P++) 
  { 
    tmp = A[P]; 

    for (int j = P; tmp < A[j - 1]; j--) 
      A[j] = A[j - 1]; 

    A[j] = tmp; 
  }
}
  
Sort the array A[] = {``0', 'E', 'C', 'F', 'D', 'A', 'B'} by 
calling InsertionSort(A, 6). The elements of A at the beginning of 
each iteration of the inner loop are (the sorted lefthand side is shown in 
boldface):

E	C	F	D	A	B
C	E	F	D	A	B	1 compare; 1 swap
C	E	F	D	A	B	1 compare; 0 swap
C	E	D	F	A	B	3 compares; 2 swaps
A	C	D	E	F	B	4 compare; 4 swaps
A	B	C	D	E	F	5 compare; 4 swaps

The number of comparisons and swaps, in the worst case (array 
reverseordered), is  n (n - 1) / 2

The number of comparisons, in the best case (array already sorted), is 
N  1, (the outer loop goes from 2 through N, and the inner loop test 
always fails). 

The number of swaps, in the best case, is zero.

18 Proof by Induction

1. Prove true for a base case. 
2. Assume true for all cases up to some limit (the inductive hypothesis). 
3. Using the assumption in 2, prove true for the limit. 

EXAMPLE: Prove Sum of the integers from 1 to n is   n (n + 1) / 2

1. Base Case: True for n = 1 since  1 * (1 + 1) / 2 = 1
2  Inductive Hypothesis: Assume true for all n < k 
3. Prove true for k = n + 1

 Substituting (n + 1) for n in our formula, we must show that
the sum of the integers from 1 to (n + 1) = (n + 1)(n + 1 + 1) / 2

The sum of the integers from 1 to (n + 1) 
    = (sum of integers from 1 to n) + (n + 1)
    = 1/2[n(n + 1)] + (n + 1) 
    = 1/2[n(n + 1) + (n + 1) + (n + 1)] 
    = 1/2[n**2 + 3n + 2] 
     = (n + 2)(n + 1) / 2

19Asymptotic Analysis

DEFINITION: T (n) = O(f(n)) iff there are constants c and n₀
such that T (n) <= cf(n) when n >= n₀.
This is pronounced T (n) is "BigOh" of f(n). It means that the function
f(n) is an upper bound on the function T (n).

DEFINITION: T (n) = W (n)) iff there are constants c and n>₀ such that T (n) – cf(n) when n >= n₀.
This is pronounced T(n) is "Omega" of f(n). It means that the function
f(n) is a lower bound on the function T(n).

DEFINITION: T (n) = q (f(n)) iff T (n) = O(f(n)) and T (n) = W (f(n))
This is pronounced T (n) is "Theta" of f(n). It means that the function
f(n) is a tight bound on the function T (n).

DEFINITION: T (n) = o(f(n)) iff there are constants c and n₀ such that T (n) < cf(n) when n >= n₀.
This is pronounced T (n) is ``little-oh'' of f(n). It means that the function
f(n) is a loose upper bound on the function T(n). An alternative way of
defining little-oh is that T (n) = o(f(n)) iff T (n) = O(f(n)) and
T(n) not equal q (f (n)).

Asymptotic analysis is concerned with relative rates of growth.
EXAMPLE: 1000n > n² for small values of n, but n² grows faster and will
eventually become (and remain) greater than 1000n. It will do so for
n > 1000.

In the analysis of BubbleSort, SelectionSort, and InsertionSort, we
found that the worst case number of comparisons was n(n - 1)/2
What is O(n (n - 1) / 2)?

THEOREM: O(cf(x)) = O(f(x)) (constants don't matter)
PROOF:
T(x) = O(cf(x)) implies that there are constants c₀ and n₀ such that
T(x) <= c₀ (cf(x)) when x >= n₀
Therefore, T(x) <= c₁f(x) when x >= n₀ where c₁ = c₀c
Therefore, T(x) = O(f(x))

THEOREM: Let T₁ (n) = O(f(n)) and T₂ (n) = O(g(n)). Then, T₁ (n) + T₂ (n) = max(O(f(n)); O(g(n))) (the sum rule)
PROOF:
T₁(n) <= c₁f(n) for n >= n₁
T₂(n) <= c₂g(n) for n >= n₂
Let n₀ = max(n₁ , n₂ )
Then, for n >= n₀ , T₁ (n) + T₂ (n) <= c₁f(n) + c₂g(n)
Let c₃ = max(c₁,c₂ )
Then

       T₁(n) + T₂(n) <= c₃ f(n) + c₃ g(n)
                     <= 2c₃max(f(n), g(n))
                     <= cmax(f(n), g(n))

THEOREM: If T(n) is a polynomial of degree x, then T (n) = O(n^x ) PROOF:
T(n) = n^x + n^x-1+ . . . + k
By the sum rule above, the largest term dominates
Therefore T(n) = O(n^x)

THEOREM: If T₁(n) = O(f(n)) and T₂(n) = O(g(n)) then T₁(n) * T₂(n) = O(f(n)g(n)) (the product rule)
PROOF:
T₁(n) * T₂(n) <= c₁c₂f(n)g(n) <= cf(n)g(n) when n >= n₀
Therefore, T₁(n)T₂(n) = O(f(n)g(n))

THEOREM: log^k n = O(n) for any positive constant k Note that log^k n means (log n)^k
We must show log^k n <= cn for n >= n₀
Equivalently, we can show log n <= cn^1/k
Letting a = 1/k ,
we will show that log n = O(n^a), for any positive constant a
Use L'Hospital's rule:

lim log n = lim log e / n = lim c₂ = 0
n® ¥ cn ^a n ®¥ acn^a-1 n ®¥ n^a
Therefore, log n <= cn^a for n >= n₀

THEOREM: n^k = O(aⁿ) for a > 1
(no polynomial is bigger than an exponential) PROOF:
Use L'Hospital's rule

lim     n^k    =   lim     kn^k-1

n® ¥   aⁿ       n® ¥    aⁿ ln a

            = lim      k(k-1)n^k-2
              n® ¥     aⁿ ln^k a
            = . . .
            = . . .
            = lim     k(k-1)(k-2). . . 1
               n®¥       aⁿ ln^k a

            = 0

19.1 Some Common Orders of Growth Here are some commonly encountered orders of growth, in increasing order:

any constant is O(1)
log^k n for k < 1
logn
log^k n for k > 1
n^k for k < 1
n (linear)
n log n
n^1+k for k > 0 (polynomials)
2ⁿ (exponential)

Real Computer Science begins where we almost stop reading ...

Monday, 24 June 2013

Three O(n2 ) Sorting Algorithms

19Asymptotic Analysis

No comments:

Post a Comment