Wednesday, January 14, 2009

trees



CSC 112 Data Structure








Binary Trees



Definition: A binary tree is either empty or consists
of a node called the root together with two binary trees
called the left subtree and the right subtree.




  • If h = height of a binary tree,





max # of leaves = 2h



max # of nodes = 2h + 1 - 1




  • A binary tree with height h and 2h + 1 - 1
    nodes (or 2h leaves) is called a full binary tree

     













    Figure 4.4: Examples of binary trees








  •  
     













    Figure 4.5: Level by level numbering of a binary tree







  • Figure 4.4
    shows several examples of binary trees.



  • The nodes of a binary tree can be numbered in a natural way, level
    by level, left to right. For example, see Figure 4.5.



  • A binary tree with n nodes is said to be complete if
    it contains all the first n nodes of the above numbering
    scheme. Figure 4.6
    shows examples of complete and incomplete binary trees.



  • Total number of binary trees having n nodes





= number of stack-realizable permutations of length n



= number of well-formed parentheses (with n left parentheses and n
right parentheses)















Figure 4.6: Examples of complete, incomplete binary trees







Binary Tree Traversal:



1.



Preorder




Visit root, visit left subtree in preorder, visit right subtree
in preorder



2.



Postorder




Visit left subtree in postorder, right subtree in postorder, then
the root



3.



Inorder




Visit left subtree in inorder, then the root, then the right
subtree in inorder




  • Figure 4.7
    shows several binary trees and the traversal sequences in preorder,
    inorder, and postorder.



  • We can construct a binary tree uniquely from preorder and inorder or
    postorder and inorder but not preorder and postorder.














Figure 4.7: Binary tree traversals







Data Structures for Binary Trees:



1.



Arrays, especially suited for complete and full binary trees



2.



Pointer-based



Pointer-based Implementation:






typedef struct node-tag {



           
item-type info ;



         struct
node-tag * left ;



         struct
node-tag * right ;



    } node-type ;



node-type * root ; /* pointer to root */



node-type * p ; /* temporary pointer */



void preorder(node-type * root)



    {



            if
(root) {



           
visit (root) ;



        preorder (root 
left) ;



        preorder (root 
right) ;



    }



}



void inorder (node-type * root)



{



            if
(root) {



           
inorder (root 
left) ;



               
visit (root) ;



           
inorder (root 
right) ;



       }
}



void postorder (node-type * root)



{



        if (root) {



               
postorder (root 
left) ;



               
postorder (root 
right) ;



                       
visit (root) ;



        }



}



binary tree representation of trees



(data structure)



Definition: A way to represent a multiway
tree
as a binary
tree
. The leftmost
child,
c, of a node,
n, in the multiway tree is the left child, c', of the corresponding
node, n', in the binary tree. The immediately right sibling
of c is the right child of c'.



Formal Definition: A multiway tree T can be
represented by a corresponding binary tree B. Let {n1,...,
nk} be nodes of the multiway tree, T. Let {n'1,...,
n'k} be nodes of the corresponding binary tree B. Node nk
corresponds to n'k. In particular, nodes nk and
n'k have the same labels and if nk is the root
of T, n'k is the root of B. Connections correspond as
follows:




  • If nl is the leftmost child of nk,
    n'l is the left child of n'k. (If nk
    has no children, n'k has no left child.)



  • If ns is the next (immediately right)
    sibling of nk, n's is the right child of n'k.




Also known as first child-next sibling
binary tree, doubly-chained tree, filial-heir chain.



Aggregate parent (I am a part of or used in
...)
multiway
tree
, k-ary
tree
, Schorr-Waite
graph marking algorithm
.



Note: See [Knuth97,
1:333, Sect. 2.3.2]
.



The binary
tree
representation
of a
multiway
tree
or k-ary
tree
is based on
first child-next sibling representation of the tree. In this
representation every node is linked with its leftmost child and its
next (right nearest) sibling.



Let us see one example. Consider the following multiway tree



1
/|\

/ | \
/ | \

2 3 4
/ \ |

5 6 7
/ \

8 9



This tree can be represented in first child-next sibling manner as
follows



1
/

/
/

2---3---4
/ /

5---6 7
/

8---9



Now, if we look at the first child-next sibling representation of
the tree closely, we will see that it forms a binary tree. To see
this better, we can rotate every next-sibling edge 45 degrees
clockwise. After that we get the following binary tree:



1
/

2
/ \

5 3
\ \

6 4
/

7
/

8
\

9


This
is binary tree representation of the given (multiway) tree.








Binary
Trees
















Frame1



Trees
are natural structures for representing certain kinds of
hierarchical data. A (rooted) tree consists of a set of
nodes
(or vertices) and a set of
arcs
(or edges). Each arc links a parent node to one of the parent's
children. A special
root
node has no parent. Every other node has exactly one parent. It is
possible to reach any node by following a unique path of arcs from
the root. If arcs are considered bidirectional, there is a unique
path between any two nodes. The simplest kind of tree is a binary
tree where each parent has at most two children.
(We consider
unrooted
trees elsewhere, e.g. in [
minimum
spanning trees
].)



A
way to think of a
binary
tree

is that it is either empty (
emptyTree)
or it is a
fork
which contains an element and two subtrees which are themselves
binary trees.



Types:








©
L
.
A
l
l
i
s
o
n












type
tree e = emptyTree | fork e × (tree e) × (tree e)







Operations:



empty:
tree e -> boolean



leaf
: tree e -> boolean



fork
: e × tree e × tree e -> tree e



left
: tree e -> tree e



right:
tree e -> tree e



contents:
tree e -> e







Rules:



empty(emptyTree)
= true



empty(fork(x,
T, T'))= false







leaf(fork(x,
emptyTree, emptyTree)) = true



leaf(fork(x,
T, T'))



=
false, if not empty(T) or not empty(T')



leaf(emptyTree)
= error







left(fork(x,
T, T')) = T



left(emptyTree)
= error







right(fork(x,
T, T')) = T'



right(emptyTree)
= error







contents(fork(x,
T, T')) = x



contents(emptyTree)
= error











The
Tree Abstract Data Type.



(It
is also common to use curried versions of constructors and
functions, such as
fork,
especially in functional languages.)



Many
operations can be defined on trees. For example:



height(emptyTree)
= 0 |



height(fork(e,T,T'))
= 1+max(height(T), height(T'))







weight(emptyTree)
= 0 |



weight(fork(e,T,T'))
= 1+weight(T)+weight(T')







Note
that these functions are binary-recursive as is the definition of
the tree type. (Also note that a tree consisting of a single
leaf
is sometimes defined to be of height zero, rather than 1, as
here.
)



Trees
have many uses in computing. For example a
parse-tree
can represent the structure of an expression:



input:
a+b*c







---------->
+



.
.



.
.



a
*



.
.



.
.



b
c







Multiplication
has a higher
priority
then addition and binds more tightly. This tree shows that a+b*c
is interpreted as a+(b*c) rather than as (a+b)*c. Such trees are
used in compilers and other programs.


Implementation
of Binary Trees by Pointers and Records



A
tree data type can be implemented as a collection of records and
pointers.



[C/Tree/Tree.h]



[C/Tree/TreeElement.h]



.so
tree/tree.type.t







The
basic operations can create new records and manipulate pointers.



[C/Tree/Ops.c]



.so
tree/tree.cons.t







Not
surprisingly this tree implementation is similar to the
implementation of
hierarchical
lists
.



An
output routine for trees is a virtual necessity. A tree can be
printed in a style like that used for lists but a graphical
two-dimensional layout is more informative. It is difficult to
print trees down the page, because they quickly grow too wide, but
it is relatively easy to print them across the page so that the
root is at the left and the tree grows to the right.



[C/Tree/Write.c]



.so
tree/tree.put.t







For
an example of tree output, see later sections.


Example:
Parse Trees



Parsing
an input language according to a grammar is frequently necessary
in computing. Here we consider a language of simple arithmetic
expressions. A grammar written in Backus Naur form (BNF) is given
below.



<exp>
::= <exp> + <term> | <term>



<term
::= <term> * <operand> | <operand>



<operand>
::= <identifier> | ( <exp> )



<identifier>
::= <letter> <identifier> | <letter>



<letter>
::= "a" | "b" | ... | "z"







Grammar
for Simple Expression.







The
grammar can be read as a definition of the syntactic structure of
expressions, <exp>. The symbol `::=' can be read as `is' or
`can be replaced by'. The symbol `|' can be read as `or'. The
names in angle brackets, such as '<exp>', are variables for
parts of the language. The string <exp>+1 is not a legal
expression but it stands for many legal expressions such as x+1,
y+1 and (x+y*z)+1. The first line of the grammar can be read as:
an <exp> is an <exp> plus <term> or a <term>.
An expression is either an expression plus a term or just a term.
Note that the syntax rules are recursive. A little thought shows
that an expression is a sequence of terms separated by plus signs.
Similarly, a term is a sequence of operands separated by
multiplication signs. An operand is either an identifier or a
bracketed subexpression. An identifier is a sequence of letters.



A
parser can be written using the
recursive-descent
technique. A procedure is written to recognise each class of
object in the grammar - exp, term, operand. These routines are
recursive, as is the grammar, and call each other to implement the
grammar rules. For example, the routine for expression calls the
routine for term. The repetitive nature of expressions and terms
is coded by the use of a loops. A bracketed subexpression is
inherently recursive and so is the parser at that point. The
complete parser is given below. It is moderately long but not
complex, especially if read with the grammar in mind. A
lexical
routine, insymbol, skips white space and packages input letters
into identifiers and other kinds of symbol. It is followed by the
parser proper consisting of the routines Operand, Term and Exp.



[C/Tree/Parser.h]



[C/Tree/Parser.c]



.so
tree/tree.parse.t







Note
the use of mutual recursion. Various errors may be detected during
parsing. If the expression is syntactically correct, the tree
representing its structure is finally printed.



input:
a + b*(c+d)







Parse
Tree:







|
| |d---------|



|
|+---------|



|
| |c---------|



|*---------|



|
|b---------|



+---------|



|a---------|






Recursive
Tree Traversal



There
are three classic ways of recursively
traversing
a tree or of visiting every one of its nodes once. In each of
these, the left and right subtrees are visited recursively and the
distinguishing feature is when the element in the root is visited
or processed.



In
a
preorder
or
prefix
traversal the root is visited first (pre) and then the left and
right subtrees are traversed.



[C/Tree/Prefix.c]



.so
tree/tree.prefix.t







In
an
infix
traversal, the left subtree is traversed and then the root is
visited and finally the right subtree is traversed.



[C/Tree/Infix.c]



.so
tree/tree.infix.t







In
a
postorder
or
postfix
traversal the left and right subtrees are traversed and then the
root is visited afterwards (post).



[C/Tree/Postfix.c]



.so
tree/tree.postfix.t







This
method can be used to generate postfix or
reverse
Polish

code for a stack machine.



Given
the following tree,



|type------|



the-------|



|
|rover-----|



|
| |of--------|



|
| | |motor-----|



|land------|



|
|is--------|



|
| | |car-------|



|
| |a---------|







Example
Tree.







the
three traversals give the following results. The results of yet
another method,
breadth-first
traversal, are included for comparison; see the next section.



infix:
a car is land motor of rover the type



prefix:
the land is a car rover of motor type



postfix:
car a is motor of rover land type the



b-first:
the land type is rover a of car motor







Traversals.







Note
that the method given for printing a tree is a reversed infix
traversal.


Breadth-First
Traversal



A
breadth-first
traversal of a tree starts at the root of the tree. It next visits
the children, then the grand-children and so on.



1



.
.



.
.



2
3



.
. . .



.
. . .



4
5 6 7



.
. . . .



.
. . . .



8
9 10 11 12



.
. .



.
. .



13
14 15







Example:
Breadth-First Order.







The
numbers indicate the order in which the nodes are visited, not the
contents of the nodes. Because children are only accessible from a
parent, they must be stored while the parent's siblings and
cousins are visited. A [
queue]
is used to do this.



[C/Queue/Queue.h]



[C/Tree/QueueElement.h]



[C/Tree/BFirst.c]



.so
tree/tree.bf.t







Note
that the queue is a queue of pointer to node or a queue of tree
type. Initially the queue contains just the root of the tree. At
each iteration of the algorithm, the first element is removed from
the queue. Its children, if any, are pushed onto the end of the
queue and the element is processed. The algorithm terminates when
the queue is empty. See also the chapter on queues.


Recursion



Most
routines on trees are recursive. This is natural because the tree
is a recursive data type. It is possible to write iterative
versions of these operations but it is harder to do so than is the
case for flat lists because the tree type is
binary
recursive
.
The flat list and hence most of its operations are
linear
recursive

and a linear recursive routine usually has a simple iterative
version. It is often necessary to introduce an
explicit
stack into a program when writing a non-recursive tree routine.
This is often not worth the effort as the language implementors
can usually do a better job with the system stack.



The
object-oriented programming language
Java
has the notion of an
Enumeration,
i.e. an iterator, which steps through the elements of a set of
values. An enumerator must implement the predicate
hasMoreElements
and the function
nextElement.
To program an enumerator which will emit the contents of a tree in
one of the standard orders it is useful to a employ a
Stack:



[Java/Tree/egPrefixEnum.txt]






Side-Effects



The
tree operations described so far have no side-effects except for
input, output and manipulation of dynamic storage; they are
pure
tree operations. As is the case with lists, it is often necessary
or desirable to use operations having side-effects on efficiency
grounds. This is particularly natural if a program uses a single
tree as a dictionary or database structure. As before, should
multiple trees share components, changing one tree may change
another and if this is not anticipated it will cause program bugs.



Search Trees



A
binary
search tree

can be created so that the elements in it satisfy an ordering
property. This allows elements to be searched for quickly. All of
the elements in the left subtree are less than the element at the
root which is less than all of the elements in the right subtree
and this property applies recursively to all the subtrees. The
great advantage of this is that when searching for an element, a
comparison with the root will either find the element or indicate
which
one
subtree to search. The ordering is an invariant property of the
search tree. All routines that operate on the tree can make use of
it provided that they also keep it holding true.








Demonstration: There is a demonstration of Binary Search
(& AVL) Trees [here].












.so
tree/tree.search.t







It
takes O(h) time to search a search tree of height h. Since a tree
of height `h' can hold n=2h-1 elements, the search
takes O(log(n)) time under favourable circumstances. The
search-tree and its search routine should be compared with the use
of the binary search algorithm on sorted arrays in the chapter on
tables.



The
use of the tree speeds up the insertion and deletion operations at
the price of the space needed to hold the pointers. The tree has
the speed advantage when the data in the structure changes
rapidly.



The
routine given here to insert an element does so as a side-effect
by changing the tree.



[C/Tree/Insert.c]



.so
tree/tree.insert.t







If
elements are inserted in the order d, b, a, c, e, f, g the
following tree is created:



----------->d



.
.



.
.



.
.



b
e



.
. .



.
. .



a
c f



.



.



g







Note
that an insertion takes O(h) time. Under favourable circumstances,
a
balanced
tree is created, as for b, a, c, giving O(log(n)) search time. If
the input is sorted however an unbalanced tree approximating a
list is created, as for e, f, g, and the search degenerates to an
O(n) linear search. This problem is addressed later.



input:
the land rover is a type of motor car







Search
Tree:







|type------|



the-------|



|
|rover-----|



|
| |of--------|



|
| | |motor-----|



|land------|



|
|is--------|



|
| | |car-------|



|
| |a---------|







Example
Search Tree.







A
new element is added to the tree as a new peripheral, leaf node.
However if an element can also be deleted it is possible for it to
be
internal.
This makes deletion rather more difficult than insertion.



A
leaf element is easily deleted by setting the pointer to it to
emptyTree.



T:-----|
T:emptyTree



|



|



v



x







before
after











Deletion
of a Leaf.



The
node becomes garbage and can be freed.



An
element with one child can be deleted by by-passing it.



T:------|
T:|



|
|



|
|



v
|



x
|



.
emptyTree |



.
v



e
e



.
. . .



.
. . .







before
after











Deletion
of a Single-Child Parent.



An
internal element x with two children cannot easily be bypassed
without loosing one of its subtrees. The solution is to overwrite
x with some other element y of the tree and then to delete the
original copy of y. There are two obvious elements to choose from
- either the largest element `A' less than x or the smallest
element `B' greater than x. Each of these has at most one child!
The sortedness of the tree is maintained if x is overwritten with
either of these.



T:----------|
T:-----|



|
|



|
|



v
v



x
B



.
. . .



.
. . .



L
R L R



.
. . . . . . .



.
. . . . . . .



A
B A C



.
. . . .



.
. . . .



C



.
.



.
.







before
after











Deletion
of a Two-Child Parent.



Both
of A and B fall into one of the cases previously dealt with. A can
be found from x by taking one step left and then as many steps
right as possible. B can be found by taking one step right and
then as many steps left as possible.



.so
tree/tree.delete.t







Deletion takes O(h) time.


Notes




  • [Tries]
    and [PATRICIA
    trees
    ] are an alternatives to binary search trees.



  • The [Suffix
    Tree
    ] is a data-structure for solving the string
    searching problem.



Height-Balanced
Trees



As
mentioned above, if elements are inserted into a search tree in
sorted order, a tree is created that is equivalent to a list. This
will lead to inefficient searches. A
height-balanced
tree or an AVL-tree (after G. M. Adel'son-Velskii and E. M.
Landis) is a search tree in which the height of the right subtree
minus the height of the left subtree equals 1, 0, or -1. This
property also applies recursively to all subtrees. It can be shown
that a height-balanced tree of `n' elements has height O(log(n))
and this guarantees efficient search. Fortunately fast O(log(n))
insertion and deletion is still possible.








Demonstration: There is a demonstration of AVL (&
bin. s.) Trees [here].












.so
tree/tree.AVLtype.t







A
flag indicating the balance of each subtree is added to the node
record.



There
are four crucial cases during insertion. In the first case, the
left subtree L grows too tall on its left:



T
L



.
. . .



.
. . .



L
. -----> . T



.
. | | . .



.
. | | . .



.
. | | . .



|
| * | |



|
| * | |



|
| * | |



*
new elt' rebalanced



*
disturbs



*
balance







Left-2
Rotation.







By
rotating about L and T the height balance is restored and the
ordering in the tree is maintained. In the second case, L grows
too tall on its right:



T
LR



.
. . .



.
. . .



L
. L T



.
. | . . . .



.
. | ----> . . . .



.
LR | . . . .



|
.. | | | | |



|
. . | | | | |



|
. . | | | | |



|
| | | | | | |



|
| | | | | | |



|
| | | | | | |



|
| | | * |



|
| | | * |



|
| | | * |



*



*



*







Left-3
Rotation.







The
rotation restores the balance while maintaining the tree ordering.
In the above example left(LR) grew; an alternative is that
right(LR) grew but the same operations still restore the balance.
There are mirror-image right-2 and right-3 rotations.



Maintaining
the balance significantly complicates the tree insertion routine.
However a fixed amount of work is done at each node that is
visited and so it takes O(h) time.



.so
tree/tree.AVLinsert.t







Note
that if a tree has just grown in height, it can only be perfectly
balanced if it is a single (new) leaf. Otherwise one subtree must
be higher than the other.



input:
the land rover is a type of motor car







|
|type------|



|the-------|



rover-----|



|
| |of--------|



|
|motor-----|



|
| |land------|



|is--------|



|
| |car-------|



|
|a---------|







Height-Balanced
Tree.







Compare
this with the tree given earlier and created by the non-balancing
insert.


Notes




  • G. M. Adelson-Velskii and E. M. Landis [AVL]. An algorithm for
    the organization of information. Soviet Math. Dokl. 3,
    p1259-1263, 1962.



  • N. Wirth. Algorithms and Data Structures. p218,
    Prentice-Hall, 1986.
    AVL-tree insertion and deletion.



  • M. A. Weiss. Data Structures and Algorithm Analysis in C.
    s4.4, pp110. Addison Wesley 1997.



  • [2-3
    trees
    ], [2-3-4
    trees
    ] and [B-trees]



Implementation
of Binary Trees by Arrays



A
binary tree can be implemented as an array of records.



type
TreeT :0..nmax







var
Tree: array 1..nmax of



record
elt :Element_Type



left,
right :TreeT



end
record







The
empty tree is represented by zero. Left and right are indexes to
left and right subtrees.



This
implementation has no advantage in a language supporting dynamic
storage unless random access to nodes is also needed. It is useful
in a language without dynamic storage.


Full
Trees by Arrays



It
is possible to define a
full
or
weight
balanced

tree in contrast to a height balanced tree. The empty tree is a
full tree of zero levels. If T and T' are full binary trees of n
levels then fork(e,T,T') is a full binary tree of n+1 levels.



1



.
.



.
.



.
.



.
.



2
3



.
. . .



.
. . .



4
5 6 7







Full
Binary Tree of 3 Levels.







The
numbering of the nodes corresponds to a breadth-first traversal.
This suggests that such a tree can be stored in an array:



const
n = 2**Levels - 1







array
1..n of Element_Type







type
tree = 0..n







empty(T)
= T=0



left(T)
= 2*T



right(T)
= 2*T+1



parent(T)
= T div 2



leaf(T)
= T >= 2**(Levels-1)







Such
an implementation is very efficient indeed, if the tree is full,
because no space at all is used for pointers. See the chapter on
sorting for an application of this technique in [
heap-sort].


Testing
and Debugging



Programming
techniques for trees share a great deal in common with those for
lists. Pre and post conditions and assertions should be included
where possible. Minimise the use of side-effects in general and
the manipulation of global variables in particular.



Note
that there are really just two kinds of tree - emptyTree and
non-emptyTree. Most operations on a tree follow this pattern:



f(emptyTree)
= ... |



f(fork(e,T,T'))
= ...often f(T)...f(T')...







If
a case is missing it probably indicates an error. The empty case
is usually very simple, often returning a constant. The main case
often operates on one or both subtrees recursively.



When
testing or debugging a tree routine, test cases should cover the
above options. For non-emptyTree trees, it may be appropriate to
try a tree of a "few" nodes and a tree of a single node.
As usual it is invaluable to write the output routine first. The
most common problems are: . unassigned pointers, particularly
those that should be emptyTree, . lost pointer values through a
wrong ordering of operations, . knotted pointers maybe introducing
a cycle, . side-effects through shared pointers, intentional or
otherwise.



The
coding of reasonably complex routines such as those on AVL trees
is prone to small typographical errors. Nothing can beat careful
proof reading but one good testing trick is to use a set of
ascending data and then its reversal in which case the mirror
image tree should be produced. This test is not sufficient on its
own! It is important to exercise all the paths through a complex
routine and this requires a great deal of thought.


Exercises




  1. Classify tree operations (height, weight, min_element,
    max_element, flatten, insert, lookup, delete, pre,- in- and
    post-order traversal) on unordered binary trees and on search
    trees as either linear or binary recursive. Note that a
    linear-recursive operation may contain two or more calls on
    itself but only ever execute at most one of them under the
    control of an if statement. Write iterative
    versions of one or more of the linear operations.



  2. Add a full set of operators +, -, *, / to the expression parser.



  3. Write a routine to traverse the expression parse tree to generate
    reverse Polish code. eg. a*b+c*d ---> a b * c d * +



  4. Write a routine to do symbolic differentiation. Input is the
    parse tree of an arithmetic expression which may contain
    numerals, variables {x, y, ...}, + and *. Output is the tree of
    the expression that is the result of differentiation with respect
    to a given variable.




Use the rules:



diff(x, x) = 1



diff(y, x) = 0 if y is a constant or a variable different from x



diff(A+B, x) = diff(A,x)+diff(B,x)



diff(A*B, x) = A*diff(B,x)+diff(A,x)*B



eg. diff(x*x+3*x+4, x) = x*1+1*x+3*1+0*x+0 = 2*x+3




  1. Write a routine to free all of the nodes in a
    binary tree. Count the number of new and free
    operations and check that they are equal.



  2. Write a routine to collect only the elements in all the leaf
    nodes of a binary tree into a list.



  3. The path of a node in a tree is the sequence of elements
    contained in nodes from the root, down to and including the node
    itself. Give a procedure to print the path of every leaf node in
    a tree.



  4. A height-balanced tree is not necessarily full or
    weight-balanced. How un-weight-balanced can one be? ie. What is
    the minimum possible number `n' of nodes in a height-balanced
    tree for height h=1, 2, ..., 8. Is there a pattern? How does n
    compare with the value 2h-1 for a full tree?



  5. (Hard) Implement a delete operation for height-balanced
    trees.