Saturday 5 January 2013

BTree Algorithms

If t is minimum degree  of B tree

For nodes other than root
Minimum number of keys
Every node other than root must have atleast t-1 keys 
so every internal node must have at  least t childrens



 Maximum number of keys
every node can contain at most 2t-1 keys
therefore internal node can have at most 2t children

For root

For Non empty tree root should have at least 1 children

Height of B Tree
Height of Btree is function of total number of keys in Btree

number of nodes at depth 1 has to be >=2 So we have atleast 2 nodes at depth 1
next level would have t nodes for each node in level 1 so t*2
depth 3 t^2*2
for depth h  it has at least t^h-1 * 2
Each node has t-1 number of keys 
so the tree has at least 1+ (t-1) * {sum i to h}2t^i-1    keys
n >= 1+ (t-1) * {sum i to h}2t^i-1
n>=2t^h -1
h <= log(t, (n+1)/2)

Operations on Btree
n(x)+1 way branching decision since there are n+1 different choices to follow
Searching
Search (x,k)
find the smallest index such that k < key[xi]  //from left find the position which is just greater than k
if its the key k then return the index and node
else
if its leaf then return NIL         //failed search
 else
        Disk-Read(ci[x])           // bring the new children in memory and 
      Search(ci[x],k)                  //search child

Performance
Children is brought in memory  if key not found in present node
number of disk pages accessed is height of tree  Th(h) = Th(log(t,n))
searching the index at each node takes O(t)
so total time = O(ht) =O(t log(t,n)) that is for each disk read we have to search that nodes keys

Create Empty Btree
Create Root
B-Tree -Create(T)
allocate a disk page  x //O(1)
set it as leaf and number of keys n[x]= 0
write it back to disk
assign it as root root(T)=x
O(1) disk operations and O(1) CPU time

Insert Key
New key has to be inserted such that it maintains the B tree property of min and max nodes

If leaf y is full then it contains all 2t-1 key, that is, maximum allowed and we
split the node y around its median key key[y] into two nodes having t-1 keys each
move y to parent to identify dividing point
If y's parent is full then it must be splited too

We can insert the key on pass from root to leaf node
Travel down the tree searching for the position where the new key belongs and split each full node we come along the way including the leaf 

Insert(T,k)
If root is full that is it has 2t-1 keys then
  s= allocate-node() //make new node s as root
  set r as child of s
  split child (s,1,r)    //
  Insert-Nonfull(s,k)  //insert key k in non full node s
else
 Insert Nonfull(r,k)

Performance : O(th) time and disk access =O(h)


B tree increases in height at top not at bottom unlike binary tree

Insert-Nonfull(x,k)
i =n[x]
If x is leaf node then  // that is we have only root and its not full

  while k < keyi[x]       //traversing  from right key
          key(i+1)=key(i)         // move the key to right
             i=i-1
   key(i+1)[x] =k           //store the key at position i+1
   Disk Write(x)
Else  //when not leaf
  loop to find right index in x to search which children
  Disk-Read(ci[x]) //read that children in memory
  If the children is full
   Split-Tree(x,i,c[x])
   if k > keyi[x]
     traverse i+1
Insert-Nonfull(ci[x],k)

Performance
There are O(1) disk read and write operations on each recursion so total O(h)
CPU time O(th)
tail recursion so can be converted into while loop and allowing number of block in memory O(1)




Split Node
inputs :   x non full node. child of x, y, is full
output : will split child y and adjusts x to have additional child

Height of tree increases by one on split of root
Its the only operation which increases the height

Split(x,i,y)
y is ith child of x
divides child y into y and z,

z = allocate-node()            // to have last t-1 keys that is,  larger t-1 keys
for index j to t-1
 change key position       // what was j+t key in y will be now jth key in z
if node y is not leaf then
 change child positions for z //what is j+t child in y is jth child of z
similarly move the index position and key for x to make space for
median[y]
key[xi] = key[yt]
and n[x] = n[x] + 1

Disk-Write(y,z,x)

Performance
O(t) time  to change the children and keys of x and z
and O(1) disk operation

Deletion
Deletion from an internal node requires that nodes be rearranged that is when  interal node has minimum number of keys

Btree-Delete deletes the key k from the subtree rooted at x.
guarantees that whenever procedure is called recursively on node x, the number of keys is at least t that is 1 more than required to satisfy B tree property
allows to delete a key from the tree in one pass without back up

Procedure
preceding child is the left child
predecessor of key is the largest key in its left child
1)If  key k is in node which is leaf then delete key k from x
2)If key k is in internal node x then
 i) if child y that precedes x has at least t keys then find the predecessor k' of k in the subtree rooted at y.
   Recursively delete k' and replace k by k' in x
 ii)If child z that follows k in the node x has atleast t keys then find successor k' of k in subtree rooted at z
iii)If both y and z have only t-1 keys then merge k and all of z in to y so that now


B+ tree
order of m : internal node can contain m-1 keys

Deletion
We always delete key from leaf
If key is in index also we delete from there also

Redistribution of leaf node:
middle key is copied up and sibling is borrowed
if sibling has atleast t/2 keys than we can redistribute the sibling (same parent as L)
Key is deleted from leaf and a key is borrowed from sibling inserted into L and a common parent is changed to middle key that is if
             k |  l  |  m |   
            /   \     \      \
          /       \     \      \
      a  b     c d e
delete a 
key a is in node L(a,b)
and if we delete a and t -1 = 2 that is fill factor is 2 then
1)we can borrow a key c from sibling
2) we have to replace k also with the middle key after sorting these two sibling that is d
so we move d to parent

             d |  l  |  m |   

            /   \     \      \

          /       \     \      \

      a  c      d e

Merging 
toss parent index entry and pull down index's parent if index deficient
If the sibling do not have sufficient node then we merge L with its sibling

          q
        /    \
   o  p    d |  l  |   

 /  |  |     /   \     \     
          /       \     \   
      a  c      d e

delete c
key c is in L (a,c)
since siblings do not have sufficient keys we merge L and sibling (d e)
and delete the parent d but that would make parent node l deficient so we need to merge index page also
we pull down  parent of l, q down that is
          q

        /    \

   o  p      l     

 /  |  |     /   \          
          /       \      
      a d e

               

   o  p q  l     

 /  |  |     /   \          
          /       \      
      a d e



Redistribution in Non leaf node

         q

        /    \

r  o  p      l     

 /  |  |     /   \          
       t   /       \      
      a d e


here we can move p to position of q and q with l
right child of  p will become left child of new position of q
          p

        /    \

r  o       q  l     

/  |        /   \          
          /       \      
         t        a d e



1)If index page and leaf page are not below fill factor
delete record from leaf page Arrange keys in ascending order to fill void
If key appears in the index page use next key to replace it
First check left then right for merging

2)If leaf page is below fill and index page not then
combine leaf page and its siblings
Change index page to reflect change
3)If both index and leaf are below fill then
Combine leaf and siblings
Adjust index page
Combine index with its siblings

Problem 


If we delete 60 in above  case from node 60 65 we will have less than 2(fill factor) in the node so we must merge this node with left node 50 55 and also we have to delete its index 75 But deleting index
Merging of leaf pages reduces the index entry by one
will result in less than fill factor for parent 85 so index pages need to be merged Since 60 is the only key and merging reduces the key so we need to delete the root

 Problem


 Deleting 25 from 25 28 30 leaves number of keys above fill factor so we delete 25 from leaf and we need to delete corresponding key also and fill that position in parent with key next to 25 that is 28

 If it had been 25 alone in that node we would have deleted that node and also 25 from parent and set left of parent 25 to left of 50
Problem
Delete 6 from above tree
Deleting 6 we need to delete from index also which would leave index below fill factor so we need to merge index pages


avid.cs.umass.edu/courses/445/f2008/17-tree-indexes-2.pdf

No comments:

Post a Comment