Review of Sorting Algorithms
- Selection Sort uses a priority queue P implemented with an unsorted sequence:
- Phase 1: the insertion of an item into P takes O(1) time; overall O(n)
- Phase 2: removing an item takes time proportional to the number of elements k in P, Q(k); overall Q(n^{2})
- Total time complexity: Q(n^{2})
- Insertion Sort is performed on a priority queue implemented as a sorted sequence:
- Phase 1: the first insertItem takes O(1), the second O(2), until the last insertItem takes O(n); overall O(n^{2}).
- Phase 2: removing an item takes O(1) time; overall O(n).
- Total time complexity: O(n^{2})
- Heap Sort uses a priority queue implemented as a heap.
- insertItem and removeMinElement each take O(log k) where k is the number of elements in the heap at a given time.
- Phase 1: n elements inserted: O(n log n) time.
- Phase 2: n elements removed: O(n log n) time.
- Total time complexity: O(n log n)
- Divide and conquer is more than just a military strategy, it is also a method of algorithm design that has led to efficient algorithms such as merge-sort and quick-sort.
- This algorithmic approach has three distinct steps:
- Divide: If the input size is too large to deal with in a trivial (or nearly trivial) manner, divide the data into two or more disjoint subsets.
- Recurse: Use the same divide-and-conquer algorithm to solve the subproblems associated with the data subsets.
- Conquer: Take the solutions to these now-solved subproblems and "merge" them into the solution for the original problems.
- Algorithm:
- Divide: If S has at least two elements (nothing needs to be done if S has zero or one elements), remove all the elements from S and put them into two subsequences, S_{1} and S_{2}, each containing about half of the elements of S.
- S_{1} contains the first én/2ù elements,
- S_{2} contains the remaining ën/2û elements.
- Recurse: Recursively sort subsequences S_{1} and S_{2}.
- Conquer: Put back the elements into S by merging the now-sorted subsequences S_{1} and S_{2} into a sorted final sequence S.
Merge-Sort Tree
- A conceptual tool to visualize the execution of merge sort.
- Not a data structure.
- Take a binary tree T.
- Each node of T represents a recursive call of the merge-sort algorithm.
- We associate with each node v of T the set of input passed to the invocation that v represents.
- The children of node v are associated with the recursive calls on the subsequences of that node.
- The external nodes represent base cases and are associated with individual elements of S, on which no further recursion is required.
Merge-Sort Example
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merge-Sort Example (cont.)
Merging Two Sequences
- Pseudo-Code for the "conquer" step
Algorithmmerge(S_{1}, S_{2}, S)
- Input: Sequences S_{1}
and S_{2}, on whose elements a total order relation is defined
and which is sorted in nonedecreasing order, and an empty sequence S.Output: Sequence S containing the union
of the elements from S_{1} and S_{2} sorted
in nondecreasing order; sequences S_{1} and S_{2}
become empty as a result of the execution of this algorithm.
while S_{1} is not empty and S_{2} is not empty do
if S_{1}.first().element()£S_{2}.first().element() then
// move the first element of S_{1} to the end of S S.insertLast(S_{1}.remove(S_{1}.first()))
else // move the last element of S_{2} to the end of S S.insertLast(S_{2}.remove(S_{2}.first()))
// move the remaining elements (if any) of S_{1} to Swhile S_{1} is not empty do
S.insertLast(S_{1}.remove(S_{1}.first()))
// move the remaining elements (if any) of S_{2} to S
while S_{2} is not empty do
S.insertLast(S_{2}.remove(S_{2}.first()))
a)
b)
Merge Example (cont.)
c)
d)
Merge Example (cont.)
e)
f)
Merge Example (cont.)
g)
h)
Merge Example (cont.)
i)
Analysis of Merge-Sort
- Assume that n is a power of 2. The same result can be shown for general n.
- Proposition 1: The merge-sort tree associated with the execution of merge-sort on a sequence of n elements has a height of .
- Informal justification: If n is a power of 2, log n equals the number of times that a sequence of size n can be divided in half.
- Proposition 2: A merge-sort algorithm sorts a sequence of size n in O(n log n) time.
- Justification:
- Assume that access, insertion, and deletion operations on the first elements of S, S1, and S2 are O(1).
- When considering the time spent at node v in the recursive tree, only consider the time spent at that node exclusive of the time spent at node v’s children. Then the time spent at node v is proportional to the size of its sequence.
- Let i be the depth of node v. The size of node v’s sequence is n/2^{i} Þ The cost of node v is O(n/2^{i}).
- T has exactly 2^{i} nodes at depth i. The time spent in all nodes at depth i is then O(2^{i}n/2^{i}) = O(n).
- Since the tree has O(log n) height, the total cost of merge-sort is O(n log n).
Interface SortObject
public interface
SortObject {
// sort sequence S in nondecreasing order using comparator c
publicvoid sort(Sequence S, Comparator c);
}
// sort sequence S in nondecreasing order using comparator c
publicvoid sort(Sequence S, Comparator c);
}
public class
ListMergeSort implements SortObject {
publicvoid sort(Sequence S, Comparator c) {
int n = S.size();
// a sequence with 0 or 1 element is already sorted
if (n < 2) return;
// divide
Sequence S1 = (Sequence)S.newContainer();
for (int i=1; i <= (n+1)/2; i++) {
S1.insertLast(S.remove(S.first()));
}
Sequence S2 = (Sequence)S.newContainer();
for (int i=1; i <= n/2; i++) {
S2.insertLast(S.remove(S.first()));
}
// recur
sort(S1,c);
sort(S2,c);
//conquer
merge(S1,S2,c,S);
}
publicvoid sort(Sequence S, Comparator c) {
int n = S.size();
// a sequence with 0 or 1 element is already sorted
if (n < 2) return;
// divide
Sequence S1 = (Sequence)S.newContainer();
for (int i=1; i <= (n+1)/2; i++) {
S1.insertLast(S.remove(S.first()));
}
Sequence S2 = (Sequence)S.newContainer();
for (int i=1; i <= n/2; i++) {
S2.insertLast(S.remove(S.first()));
}
// recur
sort(S1,c);
sort(S2,c);
//conquer
merge(S1,S2,c,S);
}
Class ListMergeSort (cont.)
public void merge(Sequence S1, Sequence
S2, Comparator c, Sequence S) {
while(!S1.isEmpty() && !S2.isEmpty()) {
if(c.isLessThanOrEqualTo(S1.first().element(),S2.first().element())) {
S.insertLast(S1.remove(S1.first()));
}
else {
S.insertLast(S2.remove(S2.first()));
}
}
if(S1.isEmpty()) {
while(!S2.isEmpty()) {
S.insertLast(S2.remove(S2.first()));
}
}
if(S2.isEmpty()) {
while(!S1.isEmpty()) {
S.insertLast(S1.remove(S1.first()));
}
}
}
}
while(!S1.isEmpty() && !S2.isEmpty()) {
if(c.isLessThanOrEqualTo(S1.first().element(),S2.first().element())) {
S.insertLast(S1.remove(S1.first()));
}
else {
S.insertLast(S2.remove(S2.first()));
}
}
if(S1.isEmpty()) {
while(!S2.isEmpty()) {
S.insertLast(S2.remove(S2.first()));
}
}
if(S2.isEmpty()) {
while(!S1.isEmpty()) {
S.insertLast(S1.remove(S1.first()));
}
}
}
}
- Quick-sort
- is another divide-and-conquer sorting algorithm,
- can be performed "in-place" meaning that only a constant amount of memory is needed in addition to the memory required for the objects being sorted, and
- is probably the most commonly used sorting algorithm.
- High-level description:
- L, holds the elements of S £x.
- E, the pivot element (single-element sequence)
- G, holds the elements of S ³x.
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Tree (cont.)
Quick-Sort Divide Step
- In-place quick-sort: The in-place algorithm uses array indices l and r to partitions the array-based sequence into subsequences L, E, G.
- l scans the sequence from the left, and r from the right.
- A swap is performed when l is at an element larger than the pivot and r is at one smaller than the pivot.
Quick-Sort Partition Step (cont.)
- A final swap with the pivot completes the divide (i.e., partition) step.
Analysis of Quick-Sort
- Consider a quick-sort tree T:
- Let s_{i}(n) denote the sum of the input sizes of the nodes at depth i in T.
- We know that s_{0}(n) = n since the root of T is associated with the entire input set.
- Also, s_{1}(n) = n-1, since the pivot is not propagated.
- Thus, either s_{2}(n) = n – 3, or, if one of the subsequences at level 1 has zero input size, s_{2}(n) = n - 2 in the worst case.
- In general, s_{i}(n) = n – i in the worst case.
- The worst-case running time of quick-sort is then:
- The worst-case running time of quick-sort is O(n^{2}). This occurs when the input sequence is sorted. Nearly worst-case performance occurs on nearly sorted sequences.
- Now to look at the best-case running time of quick-sort:
- We can see that quick-sort behaves optimally if, whenever a sequence S is divided into subsequences L and G, the subsequences are of equal size.
- More precisely,
- The height of the tree is the maximum value of i for which
- That is to say, the height of the quick-sort tree in the best case is h = log n, and the best-case performance of quick-sort is O(n log n).
- The main drawback to quick-sort is that it achieves its worst-case time complexity on data sets that are common in practice, namely sequences that are already sorted (or mostly sorted).
- To avoid this, modifications to quick-sort are available that are more careful about choosing the pivot.
- One such modification chooses a random element of the sequence. This modified version is known as randomized quick-sort.
- The expected time of a randomized quick-sort on a sequence of size n is O(n log n).
- Justification: see next page.
Cost of quick-sort on n items = partition step
+ cost of 2 recursive calls
Assume that the size of any subsequence from 0 to n-1
is equally likely.
which implies that
Subtract the above two equations
Rearrange and drop insignificant constant terms
Average-Case Analysis of Quick-Sort (cont.)
Divide by n(n+1)
Solve the recursive relationship
where H_{n} is the n^{th}
Harmonic number which can be bounded above and below by .
Thus .
Selection and Order Statistics
Order Statistic - Identify
a single element relative to its rank in the collection.
example: median, element
that would be at rank
Selection problem – select
the kth smallest element from a collection of n similar elements.
- Sort the collection:
- Randomized quick-select algorithm
- O(n^{2}) worst-case performance, but
- O(n) expected performance
- Similar to randomized quick-sort. Chooses an arbitrary element from the collection as the pivot and recursively calls itself.
Algorithm quickSelect(S,k):
Input: Sequence S of n comparable elements Input: integer
Output: The kth smallest element of S if n = 1 then
return the (first) element of S
Pick a random integer r in the range [0,n-1]
Let x be the element of S at rank r
Remove all the elements from S and put them into three sequences:
- L, storing the elements in S less than x
- E, storing the elements in S equal to x
- G, storing the elements in S greater than x
if k £L.size()then
quickSelect(L,k)
else if k £L.size() + E.size() then
// each element in E is equal to x
return x
else
// note the new selection parameter
quickSelect(G,k-L.size()-E.size())