Monday 24 June 2013

Binary Search Trees



DEFINITION: A binary search tree is a binary tree with the further property that, for every node,

  1. the value of the element in the left child is less than the value of the element at the node, and
  2. the value of the element in the right child is greater than the value of the element at the node.

Here's a definition of a class for binary search trees:
template <class Etype>
class Binary_Search_Tree
{
  protected:
    void Make_Empty(Tree_Node<Etype> * &);
    Tree_Node<Etype> * Find(const Etype &, Tree_Node<Etype> * ) const;
    Tree_Node<Etype> * Find_Max(Tree_Node<Etype> *) const;
    Tree_Node<Etype> * Find_Min(Tree_Node<Etype> *) const;
    void Insert(const Etype &, Tree_Node<Etype> * &);
    void Remove(const Etype &, Tree_Node<Etype> * &);

    Tree_Node<Etype> *Root;
    Tree_Node<Etype> *Last_Find;

  public:
    Binary_Search_Tree( ) : Root( NULL ), Last_Find( NULL ) { }
    Binary_Search_Tree( Binary_Search_Tree & Value );
    virtual ~Binary_Search_Tree( )
      { Make_Empty( Root ); Root = Last_Find = NULL; }
    const Etype & operator ( ) ( ) const
       { return Last_Find != NULL ? Last_Find­>Element : NULL; }
     virtual int Find( const Etype & X )
       { return int( Last_Find = Find( X, Root ) ); }
    virtual int Find_Max()
      { return int (Last_Find = Find_Max(Root)); }
    virtual int Find_Min()
       { return int (Last_Find = Find_Min(Root)); }
    virtual void Insert( const Etype & X )
      { Insert( X, Root ); }
    void Make_Empty( )
       { Make_Empty( Root ); Root = Last_Find = NULL; }
    virtual void Remove( const Etype & X )
      { Remove( X, Root ); }
};


  • The Root of the tree is kept as a protected member of the class.

  • The constructor Binary_Search_Tree( ) simply sets Root and Last_Find to NULL.

  • As with the List class, a pointer Last_Find is kept to the last node accessed in the Find, Find_Max, and Find_Min operations. The Find operations return Boolean, not the node found.

  • A number of member functions are overloaded. For example, Insert is
  • overloaded with a public definition and a protected definition. The public function simply calls the protected function on Root.

33.1 Insert
The Insert function is overloaded. The Insert in the public section passes the element to be inserted and Root to the protected section Insert function.

Insert traverses the tree rooted at T to find the insertion point. If X is already in the tree, it is not inserted. When the insertion point is found, a new Tree_Node is created and "attached" at the insertion point.

template <class Etype>
void Binary_Search_Tree<Etype>::
Insert( const Etype & X, Tree_Node<Etype> * & T )
{
  if( T == NULL )
  {
    T = new Tree_Node<Etype>( X );
    if( T == NULL )
        Error("Out of space");
  }

  else if( X < T­>Element )
    Insert( X, T­>Left );

  else if( X > T­>Element )
    Insert( X, T­>Right );
}

QUESTION: Why is T passed as a reference to pointer?
ANSWER:


33.2 Make Empty
This is a recursive definition. It deletes nodes bottom­up
(i.e. from the leaves towards the root)
The figure below shows the recursive calls to Make_Empty for a binary search tree.

template <class Etype>
void Binary_Search_Tree<Etype>::
Make_Empty( Tree_Node<Etype> * & T )
{
  if( T != NULL )
  {
    Make_Empty( T­>Left );
    Make_Empty( T­>Right );
    delete T;
    T = NULL;
  }
}



33.3 Remove
  • The Remove operation must do more than simply delete a node. That could result in the tree becoming disconnected. Consider, for example, what would happen if the Root were removed.

  • When a node is removed, another node must replace it to keep the tree connected. Moreover, the binary search tree property must be maintained.

  • When the node to be removed has been found, there are three cases to consider:


           1. The node to be removed is a leaf node.
                  Since it has no children, just delete it
           2. The node to be removed has only one child.
                  Replace it by that child.
           3. The node to be removed has two children
                  Replace it by the smallest node which is larger.
                  This is the leftmost right descendent.
The figure below shows a binary search tree (BST) before and after removal of Root.
Note that the smallest node larger than Root replaced Root.


33.4 Find, Find_Min, and Find_Max The various public Find operations return a Boolean which indicates if the node has been found. If it has been found, Last_Find points at it. The contents of the node can then be retrieved via operator(). Each of the public functions calls the appropriate protected function, passing it the Root of the tree.

The following are the protected functions:
Find traverses the tree rooted at T, guiding the traversal by the size of the element at each node
compared to the size of the element sought
template <class Etype>
Tree_Node<Etype> *
Binary_Search_Tree<Etype>::
Find( const Etype & X, Tree_Node<Etype> * T ) const
{
  if( T == NULL )
    return NULL;
  else if( X < T­>Element )
    return Find( X, T­>Left );
  else if( X > T­>Element )
     return Find( X, T­>Right );
  else
    return T;
}



Find_Min traverses the tree rooted at node T, but only searches along the leftmost edges

template <class Etype>
Tree_Node<Etype>*
Binary_Search_Tree<Etype>::
Find_Min( Tree_Node<Etype> *T ) const
{
  if( T == NULL )
    return NULL;
  else if( T­>Left == NULL )
    return T;
  else
  return Find_Min( T­>Left );
}



Find_Max traverses the tree rooted at node T, but only searches along the rightmost edges

template <class Etype>
Tree_Node<Etype>*
Binary_Search_Tree<Etype>::
Find_Max( Tree_Node<Etype> *T ) const
{
  if( T != NULL )
    while( T­>Right != NULL )
      T = T­>Right;
  return T;
}


33.5 Asymptotic Analysis of Binary Search Trees

  • BST operations on any single node are O(d), where d is the

  • depth of the node.
  • The worst case depth of a node in a BST is O(n)

  • The average case depth of a node in a BST is O(lg n)

33.5.1 Enumeration of Binary Trees
In order to understand what is meant by the "average" binary tree, we must look at all possible binary trees. The figure below shows all the binary trees of three nodes. Since these trees are binary trees, not binary search trees, they are differentiated strictly on structure, not on values at the nodes.The figure below also shows the binary search trees constructed from all the permutations of {1, 2, 3}. The six permutations are {1,2,3}, {1,3,2}, {2,1,3}, {2,3,1}, {3,1,2}, and {3,2,1}. These trees are differentiated on the basis of the values at the nodes.


The number of three­node binary trees is not the same as the number of hree­node binary search trees.

33.5.2 Average­case Analysis
The average depth over all nodes in a binary search tree is O(lg n) (assuming all BST's of n nodes are equally likely.

Average depth over all nodes means 1/N * SUMi=1N Depth(ti), where Depth(t) is
the depth of node t
DEFINITION: The internal path length (IPL) of a tree is the sum of the depths of all the nodes in the tree.
Average depth of a tree can therefore be given as IPL / N
Theorem 31.1 The internal path length of a binary tree T which has n nodes is
  D(n) = D(i) + D(n - i - 1) + n - 1
where i is the number of nodes in the left subtree of T
Proof:
          D( n ) = D( i )               // IPL of left subtree
                   + D( n - i - 1 )     // IPL of right subtree
                   + i                     // 1 for each node of left subtree
                   + (n - i - 1)         // 1 for each node of right subtree

                   = D( i )  + D( n - i - 1) + n - 1


  • Note that D(1) = 0.

  • For binary search trees, all subtree sizes are equally likely. This is not true for binary trees.
    We demonstrate by example. Consider binary trees of size 3.
    Let P ( i , j ) be the probability of a left subtree of size i and a right subtree of size j.
Examining the figure above, the probabilities for binary trees are
    P(1, 1) = 1/5 , P(2, 0) = 2/5, and P(0, 2) = 2/5

the probabilities for binary search trees are
   P(1, 1) = 1/3, P(2, 0) = 1/3, and P (0, 2) = 1/3

Theorem 33.2 In binary search trees, the average value of D(n) is O(n lg n)
Proof: Theorem 33.1 gives a recurrence for D(n) based on the sizes of left­ and right­subtrees.
Since all subtrees are equally likely for binary search trees,
the average value of both D( i ) and D( n - i - 1 ) is
    1/n * SUMj=0n-1 D(j)
therefore the average value of D(n) is
2 * 1/n SUMj=0n-1 D(j) + n - 1
It can be shown by this recurrence D(n) = O(n lg n)

Theorem 33.3 D( n ) = 2/n SUM j=0n-1 D( j ) + n - 1 = O(n lg n)
Proof: Multiply by n, then evaluate for (n - 1) to get 2 equations
    nD(n) = 2 SUM j=0n-1 D(j) + n2 - n
  and
(n - 1)D(n) = 2 SUM j=0n-2 D(j) + (n - 1)2 - (n - 1)

Subtract the second equation from the first to get
nD(n) - (n - 1)D(n - 1) = 2D(n - 1) + 2n + 2
Rearrange, dropping insignificant constant 2 and dividing by n(n - 1)
             D(n)      =   D(n - 1)  +      2
            --------     -----------    --------
             n + 1           n            n + 1
We can now write the following sequence of equations by repetitively
substituting n - 1 for n

             D(n)      =   D(n - 1)  +      2
            --------     -----------    --------
             n + 1           n            n + 1

             D(n-1)      =   D(n-2)  +      2
            --------        --------    --------
               n             n -1            n 

             D(n-2)      =   D(n - 3)  +      2
            --------       -----------    --------
             n - 1             n-2          n - 1

               D( 2 )     =     D( 1 )    +     2
               --------        -------        ------
                   3               2             3
Adding these (remembering that D(1) = 0 yields

D(n) = 2 SUMi=3n+1 1/i = ln(n + 1)
n + 1
Therefore, D(n) = O(n lg n)

It is NOT true that the average depth of the nodes in ANY binary search tree is O(lg n). Consider the binary search tree formed by inserting keys in sorted order. The average depth in this tree is O(n).
It is true that, averaged over all binary search trees, the expected depth of any node is O(lg n)


34 Attempts to Guarantee O(lg n) Behavior

There are a number of variations on the binary search tree which attempt to guarantee that operations on the tree can be done in O(lg n) time.

Tight Balance Rearrange the binary search tree so it has no extraordinarily long paths. The AVL tree uses this approach and guarantees that the depth of the tree is O(lg n).
Loose Balance Rearrange the binary search tree so it is pretty well balanced. Red­black trees do this and guarantee O(lg n) tree height. The constant of proportionality is not as good as for AVL.
Splay No explicit balance conditions are maintained. Rather, the tree is reorganized on each operation in such a way that the amortized cost of each operation is O(lg n). This means that n operations will cost O(n lg n).

No comments:

Post a Comment