- A set is a data structure modeled after the mathematical notation of a set. The fundamental set operations are union, intersection, and subtraction.
- A brief aside on mathematical set notation:
- Elements of a set may not be repeated.
- The operations of a Set ADT do not require a total order relationship among its elements. One of the simplest and most versatile ways to implement a set of elements, however, is via an ordered sequence. The elements of a set will almost always lend themselves to a total order.
- Methods of the Set ADT (invoked on Set A):
- size():
Input: none; Output: int
- isEmpty():
- Returns true if the set
if empty, false if not.
Input: none; Output: boolean
- insertElement(e):
- Insert the element e into the set A, unless
e
is already in A.
Input: Object; Output: none
- elements():
- Return an iterator of the elements in set A.
Input: None; Output: Iterator
- isMember(e):
- Return true if e is in set A, false if
not.
Input: Object; Output: Boolean
The Set ADT (Cont.)
- union(B):
- Return A È B.
Nondestructive.
Input: Set; Output: Set.
- intersect(B):
- Return A Ç B.
Nondestructive.
Input: Set; Output: Set
- subtract(B):
- Return A – B. Nondestructive.
Input: Set; Output: Set
- isEqual(B):
- Return true iff set A = set B.
Input: Set; Output: boolean.
Template Method Pattern
- We can implement set operations such as union, intersection, and subtraction by specializing the generic merge algorithm.
- Recall that in the template method pattern, an abstract class defines a principal method that calls a number of auxiliary methods. In the abstract class, the auxiliary methods are abstract (i.e., empty). The concrete subclasses override the auxiliary methods but not the principal method.
- In this application of the template method pattern, the abstract class is Merger and the concrete subclasses are UnionMerger, IntersectMerger, and SubtractMerger.
- The methods union, intersect, and subtract of the Set ADT can be implemented via the corresponding concrete subclass of Merger.
- For example, the union(B) method would instantiate an object of class UnionMerger, call merge(this, B, C, comp) on that object, and return C.
Abstract Class Merger
{
// abstract methods to be overridden
protected abstract void firstIsLess(Object a, Sequence C);
protected abstract void bothAreEqual(Object a, Object b, Sequence C);
protected abstract void firstIsGreater(Object a, Sequence C);
private Object a, b; // store current elements for comparison
public void merge(Sequence A, Sequence B, Sequence C, Comparator comp)
{
// use iterators to meet the non-destructive criterion
Iterator iterA = A.elements(), iterB = B.elements();
boolean aExists = advanceA(iterA),
bExists = advanceB(iterB);
while (aExists && bExists)
{
if(comp.isLessThan(a,b))
{
firstIsLess(a,C);
aExists = advanceA(iterA);
}
else if (comp.isEqualTo(a,b))
{
bothAreEqual(a,b,C);
aExists = advanceA(iterA);
bExists = advanceB(iterB);
}
else
{
firstIsGreater(b,C);
bExists = advanceB(iterB);
}
}
while(aExists)
{
firstIsLess(a,C);
aExists = advanceA(iterA);
}
while(bExists)
{
firstIsGreater(b,C);
bExists = advanceB(iterB);
}
}
Abstract Class Merger (cont.)
{
if(iterA.hasNext())
{
a = iterA.next();
return true;
}
else
{
a = null; return false;
}
}
private boolean advanceB(Iterator iterB)
{
if(iterB.hasNext())
{
b = iterB.next();
return true;
}
else
{
b = null; return false;
}
}
}
Class UnionMerger
{
protected void firstIsLess(Object a, Sequence C)
{
C.insertLast(a);
}
protected void bothAreEqual(Object a, Object b, Sequence C)
{
C.insertLast(a);
}
protected void firstIsGreater(Object b, Sequence C)
{
C.insertLast(b);
}
}
Class IntersectMerger
public class IntersectMerger extends Merger
{
protected void firstIsLess(Object a, Sequence C)
{} // empty method
protected void bothAreEqual(Object a, Object b, Sequence C)
{
C.insertLast(a);
}
protected void firstIsGreater(Object b, Sequence C)
{} // empty method
}
{
protected void firstIsLess(Object a, Sequence C)
{
C.insertLast(a);
}
protected void bothAreEqual(Object a, Object b, Sequence C)
{} // empty method
protected void firstIsGreater(Object b, Sequence C)
{} // empty method
}
Example of Set Operations
B={2,3,4,5,6}
a | b | Resultant set C when operation is | ||
union | intersection | subtraction | ||
1 | 2 | {1} | {} | {1} |
3 | 2 | {1,2} | {} | {1} |
3 | 3 | {1,2,3} | {3} | {1} |
4 | 4 | {1,2,3,4} | {3,4} | {1} |
6 | 5 | {1,2,3,4,5} | {3,4} | {1} |
6 | 6 | {1,2,3,4,5,6} | {3,4,6} | {1} |
8 | N/A | {1,2,3,4,5,6,8} | {3,4,6} | {1,8} |
Complexity of Set Methods
|
|
size, isEmpty
|
|
insertElement
|
|
elements, isMember
|
|
union, intersect, subtract
|
|
isEqual
|
|
Radix Sort
- Unlike other sorting algorithms, radix-sort considers the structure of the keys.
- Assume that keys are represented in a base N numbering system where N is known as the radix, For example, if N = 2, the keys are respresented by binary numbers.
- For the primitive int type, b = 32.
- Sorting is done by looking at bits in the same position. No comparisons are needed.
- This idea can be extended to decimal integers with a fixed number of digits or to keys that are ASCII or Unicode strings.
- Sort array with respect to the leftmost bit.
- Partition array
- Recurse
- recursively sort the top subarray, ignoring leftmost bit.
- recursively sort the bottom subarray, ignoring the leftmost bit.
Radix Exchange Sort
scan bottom-up to find next key starting with 0
exchange keys
Radix Exchange Sort
array before sort | array after sort on leftmost bit | array after recursive sort on second-from-leftmost bit |
- Similarities:
- Both partition the array
- Both recursively sort on sub-arrays.
- Differences:
- Method of partitioning
- Radix-exchange divides array based on greater than or less than 2i-1. No comparisons are performed.
- Quick-sort partitions based on greater than or less than some element of the array. Requires comparisons.
- Time complexity
- Radix-exchange O(bn)
- Quicksort, average case O(n log n)
- Quickwort, worst case O(n2)
For example, observe the first step of the sort from the previous page (i.e., the sort on the rightmost bit):
- We show that any two keys are in the correct relative order at the end of the algorithm.
- Given two keys, let k be the leftmost bit-position where they differ.
- At step k, the two keys are put in the correct relative order.
- Because of stability, successive steps do not change the relative order of the two keys.
|
|
|
It makes no difference what order the two keys are in when the sort begins. | When the sort visits bit k, the keys are put in the correct relative order. | Because the sort is stable, the order of the two keys will not be changed when bits greater than k are compared. |
- Radix sorting can be applied to decimal numbers that have a fixed or maximum number of digits.
- Note the order of the affected bits after sorting.
Analysis of Straight Radix Sort
- Suppose that we can perform the stable sort on a single digit in O(n) time. Then the total time complexity would be
- Indeed, we can perform a stable sort on the key’s kth digit in O(n) time.
- The embedded algorithm is known as bucket sort.
- n numbers
- Each number
- stable
- time complexity: O(n + N)
- As an example, consider the case where N = 3 and let our array be (kth digit shown only)
- (note that there are two "0"s and two "1"s.)
- First, we create N "buckets"
- Each element of the array is put into one of the N "buckets."
Bucket Sort (cont.)
- Now, pull the elements from the buckets into the array.
Cost of Straight Radix Sort
O(b(n+N))
representation | N |
binary | 2 |
decimal | 10 |
hexidecimal | 16 |
ASCII | 128 |
What is it?
A review of familiar material?
A side order for your eggs?
A combination of the two?
Adapted from the Goodrich and Tamassia lecture on Hashing
Dictionary Problem
- A phone company wants to provide caller ID capability:
- Given a phone number, return the caller’s name.
- Phone numbers are in the range 0 £ r £ R where here R = 107 - 1.
- They want to do this as efficiently as possible.
- A few suboptimal ways to design this dictionary:
- an array indexed by a key: takes O(1) time, O(n + R) – a huge amount of wasted space.
- a linked list: Takes O(n) time, O(n) space
- A balanced binary tree: O(log n) time, O(n) space.
- We can do better with a hashtable – O(1) expected time, O(n + N) space, where N is the size of the hash table.
- We need a function to map the large range of keys into a smaller range of table indices.
- e.g., h(K) = K mod N
- Insert {402-3045, "CAJ (w)"} into a hashed array with, say, N = 5 slots.
- 4023045 mod 5 = 0, so {402-3045, "CAJ (w)"} goes in slot 0 of the hash table.
- A lookup uses the same process: hash the query key, then check the array at that slot.
- Insert {428-7971, "CAJ (h)"}
- How do we deal with two keys that hash to the same index in the array?
- One policy is to use separate chaining
- Set up an array (i.e., table) of links, indexed by the keys, to sequences of items with the same key.
- Separate chaining is the most time-wise efficient collision resolution policy. Other policies are more efficient space-wise.
Pseudo-code for Dictionary ADT Methods
- Constructor
- insertItem(k, e)
- Create a table of N sequences.
insert e into sequence at table[index]
- findElement(k)
step through sequence at table[index], looking for a match.
if search was successful, return found element
else, return NO_SUCH_KEY
- remove(k)
step through sequence at table[index], looking for a match.
if search was successful,
remove found item from sequence
return found element
else, return NO_SUCH_KEY
Hash Functions
- A good hash function:
- is quick to compute, and
- distributes the keys uniformly throughout the table.
- Dealing with non-integer keys:
- find some way of converting the keys into integers.
- If the key is a string, we could add the ASCII values of the characters.
- Then use a standard hash function for integers.
- A common choice for the hash function is
- Choosing N, the size of the table:
- N = be (bad)
- If N is a power of 2, for example, h(k) will give the e least-significant bits of k.
- All keys with the same ending will go to the same place.
- Prime number N (good)
- helps ensure uniform distribution of keys.
- Mid-square
- h(k) = middle digits of k2.
- sometimes it is impractical to use a table whose size is a prime number.
- This hash function can be used to reduce the probability of collisions when N = be.
- e.g., table size is a power of 10
h(1150130) = 13454 2361 7100
- e.g., table size is a power of 2
h(1101) = 101 010 01
Open Addressing Collision Resolution
- Open addressing uses only an array. The array slots store the items directly.
- Uses less memory than separate chaining.
- Don’t need to store links, sentinels, etc.
- Fewer empty array slots. Clusters spread through the array.
- To lookup a key, start at hashed index probe, then step through table until either the key is found or an empty slot is found.
- Slower than separate chaining. The lookup may have to step through many table indices before the key is found.
- Removing an item poses difficulties. Either mark the deleted slot as empty or fill in the slot by shifting some elements downward.
- Specific open-addressing resolution policies include:
- Linear probing
- Quadratic probing
- Double hashing
Linear Probing
- Linear probing is the simplest of the open addressing policies.
- If the current slot is already being used, just try the next slot.
Input: Key k, element e
if(table is full)
error
probe¬ h(k)
while table[probe] is occupied
probe ¬ (probe + 1) mod N
table[probe] ¬ (k,e)
Linear Probing Example
- Table size N = 13.
- h(k) = k mod 13
- Insert the keys
Lafore Linear Probing Applet
Double Hashing
- Uses two hash functions, h1(k) and h2(k).
- Typically h1(k) = k mod N.
- Typically, h2(k) = q – (k mod q),
- If N is prime, all slots in the table will eventually be examined.
- Many of the same (dis)advantages as linear probing. Space-efficient but slow compared with separate chaining.
- Distributes keys more uniformly than linear probing.
Input: key k, element e
if (table is full)
error
probe¬ h1(k)
offset¬ h2(k)
while (table[probe] is occupied)
probe ¬ (probe + offset) mod N
table[probe] ¬ (k,e)
Double Hashing Example
- Table size N = 13
- h1(k) = k mod 13
- h2(k) = 7 – k mod 7
- Keys to be inserted:
0 1 2 3 4 5 6 7 8 9 10 11 12
Lafore Double/Quadratic Probing Applet
Theoretical Results
- The load factor a is the average number of keys per array index:
- The analysis is probabilistic rather than worst-case.
|
|
|
chaining |
|
|
linear probing |
|
|
double hashing |
|
|
Expected Number of Probes in a Search vs. Load Factor
No comments:
Post a Comment