Tuesday, 2 July 2013

Sorting

Review of Sorting Algorithms
  • Selection Sort uses a priority queue P implemented with an unsorted sequence:
    • Phase 1: the insertion of an item into P takes O(1) time; overall O(n)
    • Phase 2: removing an item takes time proportional to the number of elements k in P, Q(k); overall Q(n2)
    • Total time complexity: Q(n2)
  • Insertion Sort is performed on a priority queue implemented as a sorted sequence:
    • Phase 1: the first insertItem takes O(1), the second O(2), until the last insertItem takes O(n); overall O(n2).
    • Phase 2: removing an item takes O(1) time; overall O(n).
    • Total time complexity: O(n2)
  • Heap Sort uses a priority queue implemented as a heap.
    • insertItem and removeMinElement each take O(log k) where k is the number of elements in the heap at a given time.
    • Phase 1: n elements inserted: O(n log n) time.
    • Phase 2: n elements removed: O(n log n) time.
    • Total time complexity: O(n log n)

Divide and Conquer
  • Divide and conquer is more than just a military strategy, it is also a method of algorithm design that has led to efficient algorithms such as merge-sort and quick-sort.
  • This algorithmic approach has three distinct steps:
    • Divide: If the input size is too large to deal with in a trivial (or nearly trivial) manner, divide the data into two or more disjoint subsets.
    • Recurse: Use the same divide-and-conquer algorithm to solve the subproblems associated with the data subsets.
    • Conquer: Take the solutions to these now-solved subproblems and "merge" them into the solution for the original problems.

Merge-Sort
  • Algorithm:
    • Divide: If S has at least two elements (nothing needs to be done if S has zero or one elements), remove all the elements from S and put them into two subsequences, S1 and S2, each containing about half of the elements of S.
      • S1 contains the first én/2ù elements,
      • S2 contains the remaining ën/2û elements.
    • Recurse: Recursively sort subsequences S1 and S2.
    • Conquer: Put back the elements into S by merging the now-sorted subsequences S1 and S2 into a sorted final sequence S.

Merge-Sort Tree
  • A conceptual tool to visualize the execution of merge sort.
  • Not a data structure.
    • Take a binary tree T.
    • Each node of T represents a recursive call of the merge-sort algorithm.
    • We associate with each node v of T the set of input passed to the invocation that v represents.
    • The children of node v are associated with the recursive calls on the subsequences of that node.
    • The external nodes represent base cases and are associated with individual elements of S, on which no further recursion is required.

Merge-Sort Example



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merge-Sort Example (cont.)



Merging Two Sequences
  • Pseudo-Code for the "conquer" step

  • Algorithmmerge(S1, S2, S)
      Input: Sequences S1 and S2, on whose elements a total order relation is defined and which is sorted in nonedecreasing order, and an empty sequence S.Output: Sequence S containing the union of the elements from S1 and S2 sorted in nondecreasing order; sequences S1 and S2 become empty as a result of the execution of this algorithm.
      while
      S1 is not empty and S2 is not empty do
          if S1.first().element()£S2.first().element() then
              // move the first element of S1 to the end of S        S.insertLast(S1.remove(S1.first()))
          else
            // move the last element of S2 to the end of S        S.insertLast(S2.remove(S2.first()))
       // move the remaining elements (if any) of S1 to Swhile S1 is not empty do
          S.insertLast(S1.remove(S1.first()))
      // move the remaining elements (if any) of S2 to S
      while S2 is not empty do
          S.insertLast(S2.remove(S2.first()))

Merge Example
a)
b)


Merge Example (cont.)
c)
d)


Merge Example (cont.)
e)
f)


Merge Example (cont.)
g)
h)


Merge Example (cont.)





i)


Analysis of Merge-Sort
  • Assume that n is a power of 2. The same result can be shown for general n.
  • Proposition 1: The merge-sort tree associated with the execution of merge-sort on a sequence of n elements has a height of .
    • Informal justification: If n is a power of 2, log n equals the number of times that a sequence of size n can be divided in half.

Analysis of Merge-Sort (cont).
  • Proposition 2: A merge-sort algorithm sorts a sequence of size n in O(n log n) time.
  • Justification:
    • Assume that access, insertion, and deletion operations on the first elements of S, S1, and S2 are O(1).
    • When considering the time spent at node v in the recursive tree, only consider the time spent at that node exclusive of the time spent at node v’s children. Then the time spent at node v is proportional to the size of its sequence.
    • Let i be the depth of node v. The size of node v’s sequence is n/2i Þ The cost of node v is O(n/2i).
    • T has exactly 2i nodes at depth i. The time spent in all nodes at depth i is then O(2in/2i) = O(n).
    • Since the tree has O(log n) height, the total cost of merge-sort is O(n log n).

Interface SortObject
public interface SortObject {
// sort sequence S in nondecreasing order using comparator c
publicvoid sort(Sequence S, Comparator c);
}

Class ListMergeSort
public class ListMergeSort implements SortObject {
    publicvoid sort(Sequence S, Comparator c) {
        int n = S.size();
        // a sequence with 0 or 1 element is already sorted
        if (n < 2) return;
        // divide
        Sequence S1 = (Sequence)S.newContainer();
        for (int i=1; i <= (n+1)/2; i++) {
            S1.insertLast(S.remove(S.first()));
        }
        Sequence S2 = (Sequence)S.newContainer();
        for (int i=1; i <= n/2; i++) {
            S2.insertLast(S.remove(S.first()));
        }
        // recur
        sort(S1,c);
        sort(S2,c);
        //conquer
        merge(S1,S2,c,S);
    }

Class ListMergeSort (cont.)
    public void merge(Sequence S1, Sequence S2, Comparator c, Sequence S) {
        while(!S1.isEmpty() && !S2.isEmpty()) {
            if(c.isLessThanOrEqualTo(S1.first().element(),S2.first().element())) {
                S.insertLast(S1.remove(S1.first()));
            }
            else {
                S.insertLast(S2.remove(S2.first()));
            }
        }
        if(S1.isEmpty()) {
            while(!S2.isEmpty()) {
                S.insertLast(S2.remove(S2.first()));
            }
        }
        if(S2.isEmpty()) {
            while(!S1.isEmpty()) {
                S.insertLast(S1.remove(S1.first()));
            }
        }
    }
}

Quick-Sort
  • Quick-sort
    • is another divide-and-conquer sorting algorithm,
    • can be performed "in-place" meaning that only a constant amount of memory is needed in addition to the memory required for the objects being sorted, and
    • is probably the most commonly used sorting algorithm.
  • High-level description:
1) Divide: If the sequence S has two or more elements, select an element x from S to be the pivot. Any arbitrary element such as the last element, will do. Subdivide S into 3 subsequences:
    • L, holds the elements of S £x.
    • E, the pivot element (single-element sequence)
    • G, holds the elements of S ³x.
2) Recurse: Recursively execute the quick-sort algorithm on L and G. 3) Conquer: A simple concatenation of L + E + G.  
Quick-Sort Tree



Quick-Sort Tree (cont.)

 
 



Quick-Sort Tree (cont.)

 
 



Quick-Sort Tree (cont.)

 
 



Quick-Sort Tree (cont.)

 
 



Quick-Sort Tree (cont.)



Quick-Sort Tree (cont.)

 
 



Quick-Sort Tree (cont.)



Quick-Sort Tree (cont.)

 
 



Quick-Sort Divide Step
  • In-place quick-sort: The in-place algorithm uses array indices l and r to partitions the array-based sequence into subsequences L, E, G.
    • l scans the sequence from the left, and r from the right.
    • A swap is performed when l is at an element larger than the pivot and r is at one smaller than the pivot.

Quick-Sort Partition Step (cont.)
    • A final swap with the pivot completes the divide (i.e., partition) step.

Analysis of Quick-Sort
  • Consider a quick-sort tree T:
    • Let si(n) denote the sum of the input sizes of the nodes at depth i in T.
  • We know that s0(n) = n since the root of T is associated with the entire input set.
  • Also, s1(n) = n-1, since the pivot is not propagated.
  • Thus, either s2(n) = n – 3, or, if one of the subsequences at level 1 has zero input size, s2(n) = n - 2 in the worst case.
  • In general, si(n) = ni in the worst case.
  • The worst-case running time of quick-sort is then:
  • The worst-case running time of quick-sort is O(n2). This occurs when the input sequence is sorted. Nearly worst-case performance occurs on nearly sorted sequences.

Analysis of Quick-Sort (cont.)
  • Now to look at the best-case running time of quick-sort:
  • We can see that quick-sort behaves optimally if, whenever a sequence S is divided into subsequences L and G, the subsequences are of equal size.
  • More precisely,
  • The height of the tree is the maximum value of i for which
  • That is to say, the height of the quick-sort tree in the best case is h = log n, and the best-case performance of quick-sort is O(n log n).

Randomized Quick-Sort
  • The main drawback to quick-sort is that it achieves its worst-case time complexity on data sets that are common in practice, namely sequences that are already sorted (or mostly sorted).
  • To avoid this, modifications to quick-sort are available that are more careful about choosing the pivot.
  • One such modification chooses a random element of the sequence. This modified version is known as randomized quick-sort.
  • The expected time of a randomized quick-sort on a sequence of size n is O(n log n).
  • Justification: see next page.

Average-Case Analysis of Quick-Sort
Cost of quick-sort on n items = partition step + cost of 2 recursive calls
T(n) = (an+b)+ TL(n) + TG(n)
Assume that the size of any subsequence from 0 to n-1 is equally likely.


which implies that
Subtract the above two equations
Rearrange and drop insignificant constant terms


Average-Case Analysis of Quick-Sort (cont.)
Divide by n(n+1)
Solve the recursive relationship
where Hn is the nth Harmonic number which can be bounded above and below by .
Thus .

Selection and Order Statistics
Order Statistic - Identify a single element relative to its rank in the collection.
example: median, element that would be at rank
     in a sorted collection.
Selection problem – select the kth smallest element from a collection of n similar elements.  
 
Approaches to the selection problem
  • Sort the collection: 
using comparison-based sorting algorithm.
  • Randomized quick-select algorithm
    • O(n2) worst-case performance, but
    • O(n) expected performance
    • Similar to randomized quick-sort. Chooses an arbitrary element from the collection as the pivot and recursively calls itself.

Randomized Quick-Select
Algorithm quickSelect(S,k):
Input: Sequence S of n comparable elements Input: integer 
Output: The kth smallest element of S if n = 1 then
    return the (first) element of S
Pick a random integer r in the range [0,n-1]
Let x be the element of S at rank r
Remove all the elements from S and put them into three sequences:
  • L, storing the elements in S less than x
  • E, storing the elements in S equal to x
  • G, storing the elements in S greater than x
if k £L.size()then
    quickSelect(L,k)
else if k £L.size() + E.size() then
    // each element in E is equal to x
    return x
else

    // note the new selection parameter
    quickSelect(G,k-L.size()-E.size())