Real Computer Science begins where we almost stop reading ...: Sorting Revisited

Sunday, 23 June 2013

Sorting Revisited

Quicksort

Divide and Conquer algorithm for sorting A[p .. r]
practically runs on average case Th(n lg n) and very small hidden constant factor
in place sorting algorithm
not stable
widely used and considered best for use

Divide: choose middle element such that all elements to right are greater and to left are smaller
Conquer sort the left and right subarray recursively
Combine once subarrays are sorted in place the list is in sorted order.

Divide
main job in divide step is to get the pivot element to right of which elements are greater and left elements smaller
Partition(A,p,r)
Pivot choice: we choose last element as pivot x=A[r]
we maintain 4 partitions
first partition keeps elements no greater than x
second keeps elements greater than x
third partition is for unprocessed elements
fourth partition is the x, pivot

j is used to process each element from p to r-1
we are interested in this three conditions
first partition(index p to i) will have all the elements <= x
second partition (index i+1 to j) will have all elements > x
and third will have x

Partition(A,p,r) Algorithm
get the last element as pivot x
set i to p-1
for each element j from p to r-1
if current elment A[j] <= pivot x
//then it belongs to first partition replace first element in partition 2 with this element
i=i+1 //points to first element in second partition
exchange A[i] to A[j] // exchange with this current element
// for first element is less than x than we exchange it with itself because of conditions above
At last we replace the first index in second partition with pivot to get pivot in middle of two subarray

Running time of partition algorithm is O(n) because we parse all n-1 elements and perform some constant time operations

Performance of Quicksort

running time depends on whether partitioning is balanced or unbalanced
If balanced partition it behaves like merge sort
If unbalanced partition them as slow as insertion sort (O(n^2))

Worst case partitioning
when sub problem is divided in to size n-1 and 0
assuming same unbalanced partition at each level
T(n) = T(n-1) + T(0) + Th(n) //Th(n) for partition
Tn =Th(n^2)

This case occurs when list is already sorted, in which insertion gives time of O(n).

Best-case Partitioning
when partition divides list even size, that is, floor(n/2) and n/2-1
T(n) <= 2T(n/2) + Th(n)
T(n) =O(nlogn)

Average case
whenever partition yields constant proportionality we get the running time of O(nlogn)

Space complexity (O(lg n) in the worst case can be achieved)
Partitioning is in place O(1)
After partition element are sorted requiring recursive call taking most O(lg n) space

Selection based pivoting

selection algorithms can be used to choose best pivot to partition list
selection algorithm worst case is O(n) so we can use it to find best pivot at each stage and get worst case of O(nlogn) but this variant is slower compared to normal in average case

Problem Show that minimum depth of a leaf in recursion tree of quicksort split of 1-a and a (0<=1/2) is
-lg n/log a and max is -lg n/lg 1-a
Solution we have to look for path which is largest or has maximum depth
splits of larger size takes more number of further splits to reduce to base case so 1- a side split should have more depth because a < 1-a
at each level number of elements n is reduced by a that is (1-a)^i*n at i level
for leaf (1-a)^i*n =1
(1-a)^i =1/n
i lg (1-a) = -lg n
i = -lg n /lg(1-a)

similarly for the minimum depth we look for a splits
a^i*n=1
i=-lg n/lg a

Problem What happens in case of all duplicates in list for above algorithm
Solution we look at the partition algorithm, what happens in case of same values of pivot and current element j

if A[j] <=x then we swap that with first element in second partition i+1 index
this happens for all the elements giving at last, same list as original with pivot at original position only and i and j pointing to index before r.
so it divides the list in n-1 and 0, worst case scenario of quicksort

Problem Show that running time of quicksort can be improved by taking advantage of insertion sort fast running time for nearly sorted list, running time O(nk + nlg(n/k)) when insertion sort is called when subarray size is<=k
Solution
recursion stops when n/2^i =k that is i=lg n/k
recursion takes in total O(n log n/k)
n is cost of each level and log n/k is depth of level
The resulting array is composed of k subarray of size n/k elements in each subarray are less than elements in the following subarray Insertion sort call on sublist of size k take Th(k^2) for worst case to sort n/k such list would take n/k*Th(k^2) = Th(nk)
O(nk + nlog n/k)
k should be chosen such that no bigger than lg n

Expected Running time

Merge Sort
also example of divide and conquer algorithms
Divide: list of elements is divided equally in two sublists of size n/2
Conquer: Sort the two sub list recursively
Combine: Merge the two sorted sublist to get the final sorted list

In case of quicksort we did actual work in Divide step to find the pivot with partition procedure here we do most of work in combine step to merge two sorted lists

Merge(A,p,q,r)
merge sorted subarray A[p ..q] and A[q+1 ..r] and give single list of sorted sequence
In merge procedure we compare the first elements of each list and check for smallest between two then place that in new list similarly we check until one of the list goes empty at that point we just add the remaining elements in other list to final list
we can see the number of comparison when list1 is of length L1 and list2 of length L2 can be at most L1+L2-1
so we can say for merge procedure time complexity should be Th(n) where L1=L2~=n/2

In case of Merge procedure we include sentinel element at end which makes the comparisons uniform without need to check for empty list separately
The number of comparisons in this case would be equal to L1+L2, that is n in case of Merge procedure

Merge(A,p,q,r) Algorithm
create two arrays from element of A like L[p ...q-p+1] and L[r-q+1]

//with size one greater to include sentinel element(very large number in this case,say infinity)

//we merge the two list L and R in our original list A

//parse the two list L and R with index i and j respectively

for each position k in list A[p ...r]

// we know that in total we have to place r-p+1 elements and at each step we would insert one element in list //A, not necessarily in its correct position as in sorted sequence

if L[i] <= R[j] //compare current element in each list
A[k] = L[i] // at each step A gets the small of the two lists
i=i+1 //compare the next element in list L with current in R
else
A[k] = R[j]
j=j+1

At start iteration k of the loop Array A contains elements from index p to k-1 as smallest of the two list L and R that is k-p elements

Running time of Merge Procedure
populating the list L and R takes time Th(n1+n2) =Th(n)
looping for each index in A takes time Th(n)
so total running time is Th(n)

Merge sort Algorithm
Merge-sort(A,p,r)
if p < r // when p>=r there is single element and its already sorted
find middle index q = floor(p+r/2)
Merge-sort(A,p,q)
Merge-sort(A,q+1,r)
Merge(A,p,q,r)

Running time of Merge sort
Divide step takes Th(1) time, finding the middle index
Conquer step if T(n) be time for complete sorting of n elements than conquer step would take 2T(n/2)
Combine step is Merge procedure which takes Th(n)
T(n) = 2T(n/2) + Th(n)
T(1) = Th(1)

T(n)= Th(n lg n)

Insertion sort

in place sorting algorithm
stable sorting
efficient algorithm for sorting small number of elements
Insertion-sort(A)
take each element in array A and find its correct position in elements before it
two partitions of the set
first partition contains sorted list of elements processed so far
second partition is elements not processed yet

At each step first partition contains sorted list of elements processed so far

Algorithm
for each elements from index 2 to length[A]
k=A[j] //call current element key
//no insert this element in to its correct position in A[1] to A[j-1]
i = j-1
while i>0 and A[i] >key // for each element greater than key shift it one position to right
A[i+1] = A[i]
i=i-1
when we find i such that A[i] < key
we insert out key in position after that
A[i+1] = key

Running time of Insertion Sort
Worst case when we need to shift all elements at left by one position this case occurs when A[j] is less than all the elements in the first partition, this is case of list in decreasing order
O(n^2)

Average case
Average case time is also O(n^2)
but fastest algorithm for sorting small arrays

Best Case
array that is already sorted
during each iteration A[j] is compared with only its left element which is smaller than key so inner loop only runs once for each j
Th(n)

Selection sort

Selection-sort(A, p, r) // p: starting index; r: ending index
if p < r
then largest = p // call first element largest
for i = p+1 to r   // linear search to find largest
do if A[i] >= A[largest]
       then largest = i

Exchange A[r] and A[largest]
selection-sort(A, p, r-1);

Divide: Select the largest element in the array, and then swap the largest element
and the last element. The selection process uses a linear search. Thus, the divide step
takes linear time.

Conquer: Sort the the ﬁrst n − 1 elements recursively. This step takes T (n − 1) time.

Combine: Do nothing.

Recurrence: T (n) = T (n − 1) + n

Median and order statistics

Real Computer Science begins where we almost stop reading ...

Sunday, 23 June 2013

Sorting Revisited

No comments:

Post a Comment