For nodes other than root
Minimum number of keys
Every node other than root must have atleast t-1 keys
so every internal node must have at least t childrens
Maximum number of keys
every node can contain at most 2t-1 keys
therefore internal node can have at most 2t children
For root
For Non empty tree root should have at least 1 children
Height of B Tree
Height of Btree is function of total number of keys in Btree
number of nodes at depth 1 has to be >=2 So we have atleast 2 nodes at depth 1
next level would have t nodes for each node in level 1 so t*2
depth 3 t^2*2
for depth h it has at least t^h-1 * 2
Each node has t-1 number of keys
so the tree has at least 1+ (t-1) * {sum i to h}2t^i-1 keys
n >= 1+ (t-1) * {sum i to h}2t^i-1
n>=2t^h -1
h <= log(t, (n+1)/2)
Operations on Btree
n(x)+1 way branching decision since there are n+1 different choices to follow
Searching
Search (x,k)
find the smallest index such that k < key[xi] //from left find the position which is just greater than k
if its the key k then return the index and node
else
if its leaf then return NIL //failed search
else
Disk-Read(ci[x]) // bring the new children in memory and
Search(ci[x],k) //search child
Performance
Children is brought in memory if key not found in present node
number of disk pages accessed is height of tree Th(h) = Th(log(t,n))
searching the index at each node takes O(t)
so total time = O(ht) =O(t log(t,n)) that is for each disk read we have to search that nodes keys
Create Empty Btree
Create Root
B-Tree -Create(T)
allocate a disk page x //O(1)
set it as leaf and number of keys n[x]= 0
write it back to disk
assign it as root root(T)=x
O(1) disk operations and O(1) CPU time
Insert Key
New key has to be inserted such that it maintains the B tree property of min and max nodes
If leaf y is full then it contains all 2t-1 key, that is, maximum allowed and we
split the node y around its median key key[y] into two nodes having t-1 keys each
move y to parent to identify dividing point
If y's parent is full then it must be splited too
We can insert the key on pass from root to leaf node
Travel down the tree searching for the position where the new key belongs and split each full node we come along the way including the leaf
Insert(T,k)
If root is full that is it has 2t-1 keys then
s= allocate-node() //make new node s as root
set r as child of s
split child (s,1,r) //
Insert-Nonfull(s,k) //insert key k in non full node s
else
Insert Nonfull(r,k)
Performance : O(th) time and disk access =O(h)
B tree increases in height at top not at bottom unlike binary tree
Insert-Nonfull(x,k)
i =n[x]
If x is leaf node then // that is we have only root and its not full
while k < keyi[x] //traversing from right key
key(i+1)=key(i) // move the key to right
i=i-1
key(i+1)[x] =k //store the key at position i+1
Disk Write(x)
Else //when not leaf
loop to find right index in x to search which children
Disk-Read(ci[x]) //read that children in memory
If the children is full
Split-Tree(x,i,c[x])
if k > keyi[x]
traverse i+1
Insert-Nonfull(ci[x],k)
Performance
There are O(1) disk read and write operations on each recursion so total O(h)
CPU time O(th)
tail recursion so can be converted into while loop and allowing number of block in memory O(1)
Split Node
inputs : x non full node. child of x, y, is full
output : will split child y and adjusts x to have additional child
Height of tree increases by one on split of root
Its the only operation which increases the height
Split(x,i,y)
y is ith child of x
divides child y into y and z,
z = allocate-node() // to have last t-1 keys that is, larger t-1 keys
for index j to t-1
change key position // what was j+t key in y will be now jth key in z
if node y is not leaf then
change child positions for z //what is j+t child in y is jth child of z
similarly move the index position and key for x to make space for
median[y]
key[xi] = key[yt]
and n[x] = n[x] + 1
Disk-Write(y,z,x)
Performance
O(t) time to change the children and keys of x and z
and O(1) disk operation
Deletion
Deletion from an internal node requires that nodes be rearranged that is when interal node has minimum number of keys
Btree-Delete deletes the key k from the subtree rooted at x.
guarantees that whenever procedure is called recursively on node x, the number of keys is at least t that is 1 more than required to satisfy B tree property
allows to delete a key from the tree in one pass without back up
Procedure
preceding child is the left child
predecessor of key is the largest key in its left child
1)If key k is in node which is leaf then delete key k from x
2)If key k is in internal node x then
i) if child y that precedes x has at least t keys then find the predecessor k' of k in the subtree rooted at y.
Recursively delete k' and replace k by k' in x
ii)If child z that follows k in the node x has atleast t keys then find successor k' of k in subtree rooted at z
iii)If both y and z have only t-1 keys then merge k and all of z in to y so that now
B+ tree
order of m : internal node can contain m-1 keys
Deletion
We always delete key from leaf
If key is in index also we delete from there also
Redistribution of leaf node:
middle key is copied up and sibling is borrowed
if sibling has atleast t/2 keys than we can redistribute the sibling (same parent as L)
Key is deleted from leaf and a key is borrowed from sibling inserted into L and a common parent is changed to middle key that is if
k | l | m |
/ \ \ \
/ \ \ \
a b c d e
delete a
key a is in node L(a,b)
and if we delete a and t -1 = 2 that is fill factor is 2 then
1)we can borrow a key c from sibling
2) we have to replace k also with the middle key after sorting these two sibling that is d
so we move d to parent
d | l | m |
/ \ \ \
/ \ \ \
a c d e
Merging
toss parent index entry and pull down index's parent if index deficient
If the sibling do not have sufficient node then we merge L with its sibling
q
/ \
o p d | l |
/ | | / \ \
/ \ \
a c d e
delete c
key c is in L (a,c)
since siblings do not have sufficient keys we merge L and sibling (d e)
and delete the parent d but that would make parent node l deficient so we need to merge index page also
we pull down parent of l, q down that is
q
/ \
o p l
/ | | / \
/ \
a d e
o p q l
/ | | / \
/ \
a d e
Redistribution in Non leaf node
q
/ \
r o p l
/ | | / \
t / \
a d e
here we can move p to position of q and q with l
right child of p will become left child of new position of q
p
/ \
r o q l
/ | / \
/ \
t a d e
1)If index page and leaf page are not below fill factor
delete record from leaf page Arrange keys in ascending order to fill void
If key appears in the index page use next key to replace it
First check left then right for merging
2)If leaf page is below fill and index page not then
combine leaf page and its siblings
Change index page to reflect change
3)If both index and leaf are below fill then
Combine leaf and siblings
Adjust index page
Combine index with its siblings
Problem
If we delete 60 in above case from node 60 65 we will have less than 2(fill factor) in the node so we must merge this node with left node 50 55 and also we have to delete its index 75 But deleting index
Merging of leaf pages reduces the index entry by onewill result in less than fill factor for parent 85 so index pages need to be merged Since 60 is the only key and merging reduces the key so we need to delete the root
Problem
Deleting 25 from 25 28 30 leaves number of keys above fill factor so we delete 25 from leaf and we need to delete corresponding key also and fill that position in parent with key next to 25 that is 28
If it had been 25 alone in that node we would have deleted that node and also 25 from parent and set left of parent 25 to left of 50
Problem
Delete 6 from above tree
Deleting 6 we need to delete from index also which would leave index below fill factor so we need to merge index pages
avid.cs.umass.edu/courses/445/f2008/17-tree-indexes-2.pdf
No comments:
Post a Comment