CS-IT Lecture Notes: January 2009

CSC 112 Data Structure

Binary Trees

Definition: A binary tree is either empty or consists
of a node called the root together with two binary trees
called the left subtree and the right subtree.

If h = height of a binary tree,

max # of leaves = 2^h

max # of nodes = 2^{h + 1} - 1

A binary tree with height h and 2^{h + 1} - 1
nodes (or 2^h leaves) is called a full binary tree

Figure 4.4: Examples of binary trees
Figure 4.5: Level by level numbering of a binary tree
Figure 4.4
shows several examples of binary trees.
The nodes of a binary tree can be numbered in a natural way, level
by level, left to right. For example, see Figure 4.5.
A binary tree with n nodes is said to be complete if
it contains all the first n nodes of the above numbering
scheme. Figure 4.6
shows examples of complete and incomplete binary trees.
Total number of binary trees having n nodes

= number of stack-realizable permutations of length n

= number of well-formed parentheses (with n left parentheses and n
right parentheses)

Figure 4.6: Examples of complete, incomplete binary trees

Binary Tree Traversal:

Preorder

Visit root, visit left subtree in preorder, visit right subtree
in preorder

Postorder

Visit left subtree in postorder, right subtree in postorder, then
the root

Inorder

Visit left subtree in inorder, then the root, then the right
subtree in inorder

Figure 4.7
shows several binary trees and the traversal sequences in preorder,
inorder, and postorder.
We can construct a binary tree uniquely from preorder and inorder or
postorder and inorder but not preorder and postorder.

Figure 4.7: Binary tree traversals

Data Structures for Binary Trees:

Arrays, especially suited for complete and full binary trees

Pointer-based

Pointer-based Implementation:

typedef struct node-tag {

item-type info ;

struct
node-tag * left ;

struct
node-tag * right ;

} node-type ;

node-type * root ; /* pointer to root */

node-type * p ; /* temporary pointer */

void preorder(node-type * root)

{

if
(root) {

visit (root) ;

preorder (root
left) ;

preorder (root
right) ;

}

void inorder (node-type * root)

{

if
(root) {

inorder (root
left) ;

visit (root) ;

inorder (root
right) ;

}
}

void postorder (node-type * root)

{

if (root) {

postorder (root
left) ;

postorder (root
right) ;

visit (root) ;

}

binary tree representation of trees

(data structure)

Definition: A way to represent a multiway
tree as a binary
tree. The leftmost
child,
c, of a node,
n, in the multiway tree is the left child, c', of the corresponding
node, n', in the binary tree. The immediately right sibling
of c is the right child of c'.

Formal Definition: A multiway tree T can be
represented by a corresponding binary tree B. Let {n₁,...,
n_k} be nodes of the multiway tree, T. Let {n'₁,...,
n'_k} be nodes of the corresponding binary tree B. Node n_k
corresponds to n'_k. In particular, nodes n_k and
n'_k have the same labels and if n_k is the root
of T, n'_k is the root of B. Connections correspond as
follows:

If n_l is the leftmost child of n_k,
n'_l is the left child of n'_k. (If n_k
has no children, n'_k has no left child.)
If n_s is the next (immediately right)
sibling of n_k, n'_s is the right child of n'_k.

Also known as first child-next sibling
binary tree, doubly-chained tree, filial-heir chain.

Aggregate parent (I am a part of or used in
...)
multiway
tree, k-ary
tree, Schorr-Waite
graph marking algorithm.

Note: See [Knuth97,
1:333, Sect. 2.3.2].

The binary
tree representation
of a multiway
tree or k-ary
tree is based on
first child-next sibling representation of the tree. In this
representation every node is linked with its leftmost child and its
next (right nearest) sibling.

Let us see one example. Consider the following multiway tree

1
/|\

/ | \
/ | \

2 3 4
/ \ |

5 6 7
/ \

8 9

This tree can be represented in first child-next sibling manner as
follows

1
/

/
/

2---3---4
/ /

5---6 7
/

8---9

Now, if we look at the first child-next sibling representation of
the tree closely, we will see that it forms a binary tree. To see
this better, we can rotate every next-sibling edge 45 degrees
clockwise. After that we get the following binary tree:

1
/

2
/ \

5 3
\ \

6 4
/

7
/

8
\

9

This
is binary tree representation of the given (multiway) tree.

Binary
Trees

Frame1

Trees
are natural structures for representing certain kinds of
hierarchical data. A (rooted) tree consists of a set of nodes
(or vertices) and a set of arcs
(or edges). Each arc links a parent node to one of the parent's
children. A special root
node has no parent. Every other node has exactly one parent. It is
possible to reach any node by following a unique path of arcs from
the root. If arcs are considered bidirectional, there is a unique
path between any two nodes. The simplest kind of tree is a binary
tree where each parent has at most two children.
(We consider
unrooted
trees elsewhere, e.g. in [minimum
spanning trees].)

A
way to think of a binary
tree
is that it is either empty (emptyTree)
or it is a fork
which contains an element and two subtrees which are themselves
binary trees.

Types:

type
tree e = emptyTree | fork e × (tree e) × (tree e)

Operations:

empty:
tree e -> boolean

leaf
: tree e -> boolean

fork
: e × tree e × tree e -> tree e

left
: tree e -> tree e

right:
tree e -> tree e

contents:
tree e -> e

Rules:

empty(emptyTree)
= true

empty(fork(x,
T, T'))= false

leaf(fork(x,
emptyTree, emptyTree)) = true

leaf(fork(x,
T, T'))

=
false, if not empty(T) or not empty(T')

leaf(emptyTree)
= error

left(fork(x,
T, T')) = T

left(emptyTree)
= error

right(fork(x,
T, T')) = T'

right(emptyTree)
= error

contents(fork(x,
T, T')) = x

contents(emptyTree)
= error

The
Tree Abstract Data Type.

(It
is also common to use curried versions of constructors and
functions, such as fork,
especially in functional languages.)

Many
operations can be defined on trees. For example:

height(emptyTree)
= 0 |

height(fork(e,T,T'))
= 1+max(height(T), height(T'))

weight(emptyTree)
= 0 |

weight(fork(e,T,T'))
= 1+weight(T)+weight(T')

Note
that these functions are binary-recursive as is the definition of
the tree type. (Also note that a tree consisting of a single
leaf
is sometimes defined to be of height zero, rather than 1, as
here.)

Trees
have many uses in computing. For example a parse-tree
can represent the structure of an expression:

input:
a+b*c

---------->
+

.
.

a
*

.
.

b
c

Multiplication
has a higher priority
then addition and binds more tightly. This tree shows that a+b*c
is interpreted as a+(b*c) rather than as (a+b)*c. Such trees are
used in compilers and other programs.

Implementation
of Binary Trees by Pointers and Records

A
tree data type can be implemented as a collection of records and
pointers.

[C/Tree/Tree.h]

[C/Tree/TreeElement.h]

.so
tree/tree.type.t

The
basic operations can create new records and manipulate pointers.

[C/Tree/Ops.c]

.so
tree/tree.cons.t

Not
surprisingly this tree implementation is similar to the
implementation of hierarchical
lists.

An
output routine for trees is a virtual necessity. A tree can be
printed in a style like that used for lists but a graphical
two-dimensional layout is more informative. It is difficult to
print trees down the page, because they quickly grow too wide, but
it is relatively easy to print them across the page so that the
root is at the left and the tree grows to the right.

[C/Tree/Write.c]

.so
tree/tree.put.t

For
an example of tree output, see later sections.

Example:
Parse Trees

Parsing
an input language according to a grammar is frequently necessary
in computing. Here we consider a language of simple arithmetic
expressions. A grammar written in Backus Naur form (BNF) is given
below.

<operand>
::= <identifier> | ( <exp> )

<letter>
::= "a" | "b" | ... | "z"

Grammar
for Simple Expression.

The
grammar can be read as a definition of the syntactic structure of
expressions, <exp>. The symbol `::=' can be read as `is' or
`can be replaced by'. The symbol `|' can be read as `or'. The
names in angle brackets, such as '<exp>', are variables for
parts of the language. The string <exp>+1 is not a legal
expression but it stands for many legal expressions such as x+1,
y+1 and (x+y*z)+1. The first line of the grammar can be read as:
an <exp> is an <exp> plus <term> or a <term>.
An expression is either an expression plus a term or just a term.
Note that the syntax rules are recursive. A little thought shows
that an expression is a sequence of terms separated by plus signs.
Similarly, a term is a sequence of operands separated by
multiplication signs. An operand is either an identifier or a
bracketed subexpression. An identifier is a sequence of letters.

A
parser can be written using the recursive-descent
technique. A procedure is written to recognise each class of
object in the grammar - exp, term, operand. These routines are
recursive, as is the grammar, and call each other to implement the
grammar rules. For example, the routine for expression calls the
routine for term. The repetitive nature of expressions and terms
is coded by the use of a loops. A bracketed subexpression is
inherently recursive and so is the parser at that point. The
complete parser is given below. It is moderately long but not
complex, especially if read with the grammar in mind. A lexical
routine, insymbol, skips white space and packages input letters
into identifiers and other kinds of symbol. It is followed by the
parser proper consisting of the routines Operand, Term and Exp.

[C/Tree/Parser.h]

[C/Tree/Parser.c]

.so
tree/tree.parse.t

Note
the use of mutual recursion. Various errors may be detected during
parsing. If the expression is syntactically correct, the tree
representing its structure is finally printed.

input:
a + b*(c+d)

Parse
Tree:

|
| |d---------|

|
|+---------|

|
| |c---------|

|*---------|

|
|b---------|

+---------|

|a---------|

Recursive
Tree Traversal

There
are three classic ways of recursively traversing
a tree or of visiting every one of its nodes once. In each of
these, the left and right subtrees are visited recursively and the
distinguishing feature is when the element in the root is visited
or processed.

In
a preorder
or prefix
traversal the root is visited first (pre) and then the left and
right subtrees are traversed.

[C/Tree/Prefix.c]

.so
tree/tree.prefix.t

In
an infix
traversal, the left subtree is traversed and then the root is
visited and finally the right subtree is traversed.

[C/Tree/Infix.c]

.so
tree/tree.infix.t

In
a postorder
or postfix
traversal the left and right subtrees are traversed and then the
root is visited afterwards (post).

[C/Tree/Postfix.c]

.so
tree/tree.postfix.t

This
method can be used to generate postfix or reverse
Polish
code for a stack machine.

Given
the following tree,

|type------|

the-------|

|
|rover-----|

|
| |of--------|

|
| | |motor-----|

|land------|

|
|is--------|

|
| | |car-------|

|
| |a---------|

Example
Tree.

the
three traversals give the following results. The results of yet
another method, breadth-first
traversal, are included for comparison; see the next section.

infix:
a car is land motor of rover the type

prefix:
the land is a car rover of motor type

postfix:
car a is motor of rover land type the

b-first:
the land type is rover a of car motor

Traversals.

Note
that the method given for printing a tree is a reversed infix
traversal.

Breadth-First
Traversal

A
breadth-first
traversal of a tree starts at the root of the tree. It next visits
the children, then the grand-children and so on.

.
.

2
3

.
. . .

4
5 6 7

.
. . . .

8
9 10 11 12

.
. .

13
14 15

Example:
Breadth-First Order.

The
numbers indicate the order in which the nodes are visited, not the
contents of the nodes. Because children are only accessible from a
parent, they must be stored while the parent's siblings and
cousins are visited. A [queue]
is used to do this.

[C/Queue/Queue.h]

[C/Tree/QueueElement.h]

[C/Tree/BFirst.c]

.so
tree/tree.bf.t

Note
that the queue is a queue of pointer to node or a queue of tree
type. Initially the queue contains just the root of the tree. At
each iteration of the algorithm, the first element is removed from
the queue. Its children, if any, are pushed onto the end of the
queue and the element is processed. The algorithm terminates when
the queue is empty. See also the chapter on queues.

Recursion

Most
routines on trees are recursive. This is natural because the tree
is a recursive data type. It is possible to write iterative
versions of these operations but it is harder to do so than is the
case for flat lists because the tree type is binary
recursive.
The flat list and hence most of its operations are linear
recursive
and a linear recursive routine usually has a simple iterative
version. It is often necessary to introduce an explicit
stack into a program when writing a non-recursive tree routine.
This is often not worth the effort as the language implementors
can usually do a better job with the system stack.

The
object-oriented programming language Java
has the notion of an Enumeration,
i.e. an iterator, which steps through the elements of a set of
values. An enumerator must implement the predicate hasMoreElements
and the function nextElement.
To program an enumerator which will emit the contents of a tree in
one of the standard orders it is useful to a employ a Stack:

[Java/Tree/egPrefixEnum.txt]

Side-Effects

The
tree operations described so far have no side-effects except for
input, output and manipulation of dynamic storage; they are pure
tree operations. As is the case with lists, it is often necessary
or desirable to use operations having side-effects on efficiency
grounds. This is particularly natural if a program uses a single
tree as a dictionary or database structure. As before, should
multiple trees share components, changing one tree may change
another and if this is not anticipated it will cause program bugs.

Search Trees

A
binary
search tree
can be created so that the elements in it satisfy an ordering
property. This allows elements to be searched for quickly. All of
the elements in the left subtree are less than the element at the
root which is less than all of the elements in the right subtree
and this property applies recursively to all the subtrees. The
great advantage of this is that when searching for an element, a
comparison with the root will either find the element or indicate
which one
subtree to search. The ordering is an invariant property of the
search tree. All routines that operate on the tree can make use of
it provided that they also keep it holding true.

Demonstration: There is a demonstration of Binary Search
(& AVL) Trees [here].

.so
tree/tree.search.t

It
takes O(h) time to search a search tree of height h. Since a tree
of height `h' can hold n=2^h-1 elements, the search
takes O(log(n)) time under favourable circumstances. The
search-tree and its search routine should be compared with the use
of the binary search algorithm on sorted arrays in the chapter on
tables.

The
use of the tree speeds up the insertion and deletion operations at
the price of the space needed to hold the pointers. The tree has
the speed advantage when the data in the structure changes
rapidly.

The
routine given here to insert an element does so as a side-effect
by changing the tree.

[C/Tree/Insert.c]

.so
tree/tree.insert.t

If
elements are inserted in the order d, b, a, c, e, f, g the
following tree is created:

----------->d

.
.

b
e

.
. .

a
c f

Note
that an insertion takes O(h) time. Under favourable circumstances,
a balanced
tree is created, as for b, a, c, giving O(log(n)) search time. If
the input is sorted however an unbalanced tree approximating a
list is created, as for e, f, g, and the search degenerates to an
O(n) linear search. This problem is addressed later.

input:
the land rover is a type of motor car

Search
Tree:

|type------|

the-------|

|
|rover-----|

|
| |of--------|

|
| | |motor-----|

|land------|

|
|is--------|

|
| | |car-------|

|
| |a---------|

Example
Search Tree.

A
new element is added to the tree as a new peripheral, leaf node.
However if an element can also be deleted it is possible for it to
be internal.
This makes deletion rather more difficult than insertion.

A
leaf element is easily deleted by setting the pointer to it to
emptyTree.

T:-----|
T:emptyTree

before
after

Deletion
of a Leaf.

The
node becomes garbage and can be freed.

An
element with one child can be deleted by by-passing it.

T:------|
T:|

|
|

v
|

x
|

.
emptyTree |

.
v

e
e

.
. . .

before
after

Deletion
of a Single-Child Parent.

An
internal element x with two children cannot easily be bypassed
without loosing one of its subtrees. The solution is to overwrite
x with some other element y of the tree and then to delete the
original copy of y. There are two obvious elements to choose from
- either the largest element `A' less than x or the smallest
element `B' greater than x. Each of these has at most one child!
The sortedness of the tree is maintained if x is overwritten with
either of these.

T:----------|
T:-----|

|
|

v
v

x
B

.
. . .

L
R L R

.
. . . . . . .

A
B A C

.
. . . .

.
.

before
after

Deletion
of a Two-Child Parent.

Both
of A and B fall into one of the cases previously dealt with. A can
be found from x by taking one step left and then as many steps
right as possible. B can be found by taking one step right and
then as many steps left as possible.

.so
tree/tree.delete.t

Deletion takes O(h) time.

Notes

[Tries]
and [PATRICIA
trees] are an alternatives to binary search trees.
The [Suffix
Tree] is a data-structure for solving the string
searching problem.

Height-Balanced
Trees

As
mentioned above, if elements are inserted into a search tree in
sorted order, a tree is created that is equivalent to a list. This
will lead to inefficient searches. A height-balanced
tree or an AVL-tree (after G. M. Adel'son-Velskii and E. M.
Landis) is a search tree in which the height of the right subtree
minus the height of the left subtree equals 1, 0, or -1. This
property also applies recursively to all subtrees. It can be shown
that a height-balanced tree of `n' elements has height O(log(n))
and this guarantees efficient search. Fortunately fast O(log(n))
insertion and deletion is still possible.

Demonstration: There is a demonstration of AVL (&
bin. s.) Trees [here].

.so
tree/tree.AVLtype.t

A
flag indicating the balance of each subtree is added to the node
record.

There
are four crucial cases during insertion. In the first case, the
left subtree L grows too tall on its left:

T
L

.
. . .

L
. -----> . T

.
. | | . .

|
| * | |

*
new elt' rebalanced

*
disturbs

*
balance

Left-2
Rotation.

By
rotating about L and T the height balance is restored and the
ordering in the tree is maintained. In the second case, L grows
too tall on its right:

T
LR

.
. . .

L
. L T

.
. | . . . .

.
. | ----> . . . .

.
LR | . . . .

|
.. | | | | |

|
. . | | | | |

|
| | | | | | |

|
| | | * |

Left-3
Rotation.

The
rotation restores the balance while maintaining the tree ordering.
In the above example left(LR) grew; an alternative is that
right(LR) grew but the same operations still restore the balance.
There are mirror-image right-2 and right-3 rotations.

Maintaining
the balance significantly complicates the tree insertion routine.
However a fixed amount of work is done at each node that is
visited and so it takes O(h) time.

.so
tree/tree.AVLinsert.t

Note
that if a tree has just grown in height, it can only be perfectly
balanced if it is a single (new) leaf. Otherwise one subtree must
be higher than the other.

input:
the land rover is a type of motor car

|
|type------|

|the-------|

rover-----|

|
| |of--------|

|
|motor-----|

|
| |land------|

|is--------|

|
| |car-------|

|
|a---------|

Height-Balanced
Tree.

Compare
this with the tree given earlier and created by the non-balancing
insert.

Notes

G. M. Adelson-Velskii and E. M. Landis [AVL]. An algorithm for
the organization of information. Soviet Math. Dokl. 3,
p1259-1263, 1962.
N. Wirth. Algorithms and Data Structures. p218,
Prentice-Hall, 1986.
AVL-tree insertion and deletion.
M. A. Weiss. Data Structures and Algorithm Analysis in C.
s4.4, pp110. Addison Wesley 1997.
[2-3
trees], [2-3-4
trees] and [B-trees]

Implementation
of Binary Trees by Arrays

A
binary tree can be implemented as an array of records.

type
TreeT :0..nmax

var
Tree: array 1..nmax of

record
elt :Element_Type

left,
right :TreeT

end
record

The
empty tree is represented by zero. Left and right are indexes to
left and right subtrees.

This
implementation has no advantage in a language supporting dynamic
storage unless random access to nodes is also needed. It is useful
in a language without dynamic storage.

Full
Trees by Arrays

It
is possible to define a full
or weight
balanced
tree in contrast to a height balanced tree. The empty tree is a
full tree of zero levels. If T and T' are full binary trees of n
levels then fork(e,T,T') is a full binary tree of n+1 levels.

.
.

2
3

.
. . .

4
5 6 7

Full
Binary Tree of 3 Levels.

The
numbering of the nodes corresponds to a breadth-first traversal.
This suggests that such a tree can be stored in an array:

const
n = 2**Levels - 1

array
1..n of Element_Type

type
tree = 0..n

empty(T)
= T=0

left(T)
= 2*T

right(T)
= 2*T+1

parent(T)
= T div 2

leaf(T)
= T >= 2**(Levels-1)

Such
an implementation is very efficient indeed, if the tree is full,
because no space at all is used for pointers. See the chapter on
sorting for an application of this technique in [heap-sort].

Testing
and Debugging

Programming
techniques for trees share a great deal in common with those for
lists. Pre and post conditions and assertions should be included
where possible. Minimise the use of side-effects in general and
the manipulation of global variables in particular.

Note
that there are really just two kinds of tree - emptyTree and
non-emptyTree. Most operations on a tree follow this pattern:

f(emptyTree)
= ... |

f(fork(e,T,T'))
= ...often f(T)...f(T')...

If
a case is missing it probably indicates an error. The empty case
is usually very simple, often returning a constant. The main case
often operates on one or both subtrees recursively.

When
testing or debugging a tree routine, test cases should cover the
above options. For non-emptyTree trees, it may be appropriate to
try a tree of a "few" nodes and a tree of a single node.
As usual it is invaluable to write the output routine first. The
most common problems are: . unassigned pointers, particularly
those that should be emptyTree, . lost pointer values through a
wrong ordering of operations, . knotted pointers maybe introducing
a cycle, . side-effects through shared pointers, intentional or
otherwise.

The
coding of reasonably complex routines such as those on AVL trees
is prone to small typographical errors. Nothing can beat careful
proof reading but one good testing trick is to use a set of
ascending data and then its reversal in which case the mirror
image tree should be produced. This test is not sufficient on its
own! It is important to exercise all the paths through a complex
routine and this requires a great deal of thought.

Exercises

Classify tree operations (height, weight, min_element,
max_element, flatten, insert, lookup, delete, pre,- in- and
post-order traversal) on unordered binary trees and on search
trees as either linear or binary recursive. Note that a
linear-recursive operation may contain two or more calls on
itself but only ever execute at most one of them under the
control of an if statement. Write iterative
versions of one or more of the linear operations.
Add a full set of operators +, -, *, / to the expression parser.
Write a routine to traverse the expression parse tree to generate
reverse Polish code. eg. a*b+c*d ---> a b * c d * +
Write a routine to do symbolic differentiation. Input is the
parse tree of an arithmetic expression which may contain
numerals, variables {x, y, ...}, + and *. Output is the tree of
the expression that is the result of differentiation with respect
to a given variable.

Use the rules:

diff(x, x) = 1

diff(y, x) = 0 if y is a constant or a variable different from x

diff(A+B, x) = diff(A,x)+diff(B,x)

diff(A*B, x) = A*diff(B,x)+diff(A,x)*B

eg. diff(x*x+3*x+4, x) = x*1+1*x+3*1+0*x+0 = 2*x+3

Write a routine to free all of the nodes in a
binary tree. Count the number of new and free
operations and check that they are equal.
Write a routine to collect only the elements in all the leaf
nodes of a binary tree into a list.
The path of a node in a tree is the sequence of elements
contained in nodes from the root, down to and including the node
itself. Give a procedure to print the path of every leaf node in
a tree.
A height-balanced tree is not necessarily full or
weight-balanced. How un-weight-balanced can one be? ie. What is
the minimum possible number `n' of nodes in a height-balanced
tree for height h=1, 2, ..., 8. Is there a pattern? How does n
compare with the value 2^h-1 for a full tree?
(Hard) Implement a delete operation for height-balanced
trees.

CS-IT Lecture Notes

Wednesday, January 14, 2009

trees

Binary
Trees

Implementation
of Binary Trees by Pointers and Records

Example:
Parse Trees

Recursive
Tree Traversal

Breadth-First
Traversal

Recursion

Side-Effects

Search Trees

Notes

Height-Balanced
Trees

Notes

Implementation
of Binary Trees by Arrays

Full
Trees by Arrays

Testing
and Debugging

Exercises

Followers

Blog Archive

About Me

CS-IT Lecture Notes

Wednesday, January 14, 2009

trees

Binary Trees

Implementation of Binary Trees by Pointers and Records

Example: Parse Trees

Recursive Tree Traversal

Breadth-First Traversal

Recursion

Side-Effects

Search Trees

Notes

Height-Balanced Trees

Notes

Implementation of Binary Trees by Arrays

Full Trees by Arrays

Testing and Debugging

Exercises

Followers

Blog Archive

About Me

Binary
Trees

Implementation
of Binary Trees by Pointers and Records

Example:
Parse Trees

Recursive
Tree Traversal

Breadth-First
Traversal

Height-Balanced
Trees

Implementation
of Binary Trees by Arrays

Full
Trees by Arrays

Testing
and Debugging