CSC 112 Data Structure
Binary Trees
Definition: A binary tree is either empty or consists
of a node called the root together with two binary trees
called the left subtree and the right subtree.
If h = height of a binary tree,
max # of leaves = 2h
max # of nodes = 2h + 1 - 1
A binary tree with height h and 2h + 1 - 1
nodes (or 2h leaves) is called a full binary tree
Figure 4.4: Examples of binary trees
Figure 4.5: Level by level numbering of a binary tree
Figure 4.4
shows several examples of binary trees.
The nodes of a binary tree can be numbered in a natural way, level
by level, left to right. For example, see Figure 4.5.
A binary tree with n nodes is said to be complete if
it contains all the first n nodes of the above numbering
scheme. Figure 4.6
shows examples of complete and incomplete binary trees.
Total number of binary trees having n nodes
= number of stack-realizable permutations of length n
= number of well-formed parentheses (with n left parentheses and n
right parentheses)
=
|
Binary Tree Traversal:
1.
Preorder
Visit root, visit left subtree in preorder, visit right subtree
in preorder
2.
Postorder
Visit left subtree in postorder, right subtree in postorder, then
the root
3.
Inorder
Visit left subtree in inorder, then the root, then the right
subtree in inorder
Figure 4.7
shows several binary trees and the traversal sequences in preorder,
inorder, and postorder.
We can construct a binary tree uniquely from preorder and inorder or
postorder and inorder but not preorder and postorder.
|
Data Structures for Binary Trees:
1.
Arrays, especially suited for complete and full binary trees
2.
Pointer-based
Pointer-based Implementation:
typedef struct node-tag {
item-type info ;
struct
node-tag * left ;
struct
node-tag * right ;
} node-type ;
node-type * root ; /* pointer to root */
node-type * p ; /* temporary pointer */
void preorder(node-type * root)
{
if
(root) {
visit (root) ;
preorder (root
left) ;
preorder (root
right) ;
}
}
void inorder (node-type * root)
{
if
(root) {
inorder (root
left) ;
visit (root) ;
inorder (root
right) ;
}
}
void postorder (node-type * root)
{
if (root) {
postorder (root
left) ;
postorder (root
right) ;
visit (root) ;
}
}
binary tree representation of trees
(data structure)
Definition: A way to represent a multiway
tree as a binary
tree. The leftmost
child,
c, of a node,
n, in the multiway tree is the left child, c', of the corresponding
node, n', in the binary tree. The immediately right sibling
of c is the right child of c'.
Formal Definition: A multiway tree T can be
represented by a corresponding binary tree B. Let {n1,...,
nk} be nodes of the multiway tree, T. Let {n'1,...,
n'k} be nodes of the corresponding binary tree B. Node nk
corresponds to n'k. In particular, nodes nk and
n'k have the same labels and if nk is the root
of T, n'k is the root of B. Connections correspond as
follows:
If nl is the leftmost child of nk,
n'l is the left child of n'k. (If nk
has no children, n'k has no left child.)
If ns is the next (immediately right)
sibling of nk, n's is the right child of n'k.
Also known as first child-next sibling
binary tree, doubly-chained tree, filial-heir chain.
Aggregate parent (I am a part of or used in
...)
multiway
tree, k-ary
tree, Schorr-Waite
graph marking algorithm.
Note: See [Knuth97,
1:333, Sect. 2.3.2].
The binary
tree representation
of a multiway
tree or k-ary
tree is based on
first child-next sibling representation of the tree. In this
representation every node is linked with its leftmost child and its
next (right nearest) sibling.
Let us see one example. Consider the following multiway tree
1
/|\
/ | \
/ | \
2 3 4
/ \ |
5 6 7
/ \
8 9
This tree can be represented in first child-next sibling manner as
follows
1
/
/
/
2---3---4
/ /
5---6 7
/
8---9
Now, if we look at the first child-next sibling representation of
the tree closely, we will see that it forms a binary tree. To see
this better, we can rotate every next-sibling edge 45 degrees
clockwise. After that we get the following binary tree:
1
/
2
/ \
5 3
\ \
6 4
/
7
/
8
\
9
This
is binary tree representation of the given (multiway) tree.
Binary |
Trees
Implementation |
|
.so
tree/tree.search.t
It
takes O(h) time to search a search tree of height h. Since a tree
of height `h' can hold n=2h-1 elements, the search
takes O(log(n)) time under favourable circumstances. The
search-tree and its search routine should be compared with the use
of the binary search algorithm on sorted arrays in the chapter on
tables.
The
use of the tree speeds up the insertion and deletion operations at
the price of the space needed to hold the pointers. The tree has
the speed advantage when the data in the structure changes
rapidly.
The
routine given here to insert an element does so as a side-effect
by changing the tree.
.so
tree/tree.insert.t
If
elements are inserted in the order d, b, a, c, e, f, g the
following tree is created:
----------->d
.
.
.
.
.
.
b
e
.
. .
.
. .
a
c f
.
.
g
Note
that an insertion takes O(h) time. Under favourable circumstances,
a balanced
tree is created, as for b, a, c, giving O(log(n)) search time. If
the input is sorted however an unbalanced tree approximating a
list is created, as for e, f, g, and the search degenerates to an
O(n) linear search. This problem is addressed later.
input:
the land rover is a type of motor car
Search
Tree:
|type------|
the-------|
|
|rover-----|
|
| |of--------|
|
| | |motor-----|
|land------|
|
|is--------|
|
| | |car-------|
|
| |a---------|
Example
Search Tree.
A
new element is added to the tree as a new peripheral, leaf node.
However if an element can also be deleted it is possible for it to
be internal.
This makes deletion rather more difficult than insertion.
A
leaf element is easily deleted by setting the pointer to it to
emptyTree.
T:-----|
T:emptyTree
|
|
v
x
before
after
Deletion
of a Leaf.
The
node becomes garbage and can be freed.
An
element with one child can be deleted by by-passing it.
T:------|
T:|
|
|
|
|
v
|
x
|
.
emptyTree |
.
v
e
e
.
. . .
.
. . .
before
after
Deletion
of a Single-Child Parent.
An
internal element x with two children cannot easily be bypassed
without loosing one of its subtrees. The solution is to overwrite
x with some other element y of the tree and then to delete the
original copy of y. There are two obvious elements to choose from
- either the largest element `A' less than x or the smallest
element `B' greater than x. Each of these has at most one child!
The sortedness of the tree is maintained if x is overwritten with
either of these.
T:----------|
T:-----|
|
|
|
|
v
v
x
B
.
. . .
.
. . .
L
R L R
.
. . . . . . .
.
. . . . . . .
A
B A C
.
. . . .
.
. . . .
C
.
.
.
.
before
after
Deletion
of a Two-Child Parent.
Both
of A and B fall into one of the cases previously dealt with. A can
be found from x by taking one step left and then as many steps
right as possible. B can be found by taking one step right and
then as many steps left as possible.
.so
tree/tree.delete.t
Deletion takes O(h) time.
Notes
[Tries]
and [PATRICIA
trees] are an alternatives to binary search trees.
The [Suffix
Tree] is a data-structure for solving the string
searching problem.
Height-Balanced
Trees
As
mentioned above, if elements are inserted into a search tree in
sorted order, a tree is created that is equivalent to a list. This
will lead to inefficient searches. A height-balanced
tree or an AVL-tree (after G. M. Adel'son-Velskii and E. M.
Landis) is a search tree in which the height of the right subtree
minus the height of the left subtree equals 1, 0, or -1. This
property also applies recursively to all subtrees. It can be shown
that a height-balanced tree of `n' elements has height O(log(n))
and this guarantees efficient search. Fortunately fast O(log(n))
insertion and deletion is still possible.
|
.so
tree/tree.AVLtype.t
A
flag indicating the balance of each subtree is added to the node
record.
There
are four crucial cases during insertion. In the first case, the
left subtree L grows too tall on its left:
T
L
.
. . .
.
. . .
L
. -----> . T
.
. | | . .
.
. | | . .
.
. | | . .
|
| * | |
|
| * | |
|
| * | |
*
new elt' rebalanced
*
disturbs
*
balance
Left-2
Rotation.
By
rotating about L and T the height balance is restored and the
ordering in the tree is maintained. In the second case, L grows
too tall on its right:
T
LR
.
. . .
.
. . .
L
. L T
.
. | . . . .
.
. | ----> . . . .
.
LR | . . . .
|
.. | | | | |
|
. . | | | | |
|
. . | | | | |
|
| | | | | | |
|
| | | | | | |
|
| | | | | | |
|
| | | * |
|
| | | * |
|
| | | * |
*
*
*
Left-3
Rotation.
The
rotation restores the balance while maintaining the tree ordering.
In the above example left(LR) grew; an alternative is that
right(LR) grew but the same operations still restore the balance.
There are mirror-image right-2 and right-3 rotations.
Maintaining
the balance significantly complicates the tree insertion routine.
However a fixed amount of work is done at each node that is
visited and so it takes O(h) time.
.so
tree/tree.AVLinsert.t
Note
that if a tree has just grown in height, it can only be perfectly
balanced if it is a single (new) leaf. Otherwise one subtree must
be higher than the other.
input:
the land rover is a type of motor car
|
|type------|
|the-------|
rover-----|
|
| |of--------|
|
|motor-----|
|
| |land------|
|is--------|
|
| |car-------|
|
|a---------|
Height-Balanced
Tree.
Compare
this with the tree given earlier and created by the non-balancing
insert.
Notes
G. M. Adelson-Velskii and E. M. Landis [AVL]. An algorithm for
the organization of information. Soviet Math. Dokl. 3,
p1259-1263, 1962.
N. Wirth. Algorithms and Data Structures. p218,
Prentice-Hall, 1986.
AVL-tree insertion and deletion.
M. A. Weiss. Data Structures and Algorithm Analysis in C.
s4.4, pp110. Addison Wesley 1997.
[2-3
trees], [2-3-4
trees] and [B-trees]
Implementation
of Binary Trees by Arrays
A
binary tree can be implemented as an array of records.
type
TreeT :0..nmax
var
Tree: array 1..nmax of
record
elt :Element_Type
left,
right :TreeT
end
record
The
empty tree is represented by zero. Left and right are indexes to
left and right subtrees.
This
implementation has no advantage in a language supporting dynamic
storage unless random access to nodes is also needed. It is useful
in a language without dynamic storage.
Full
Trees by Arrays
It
is possible to define a full
or weight
balanced
tree in contrast to a height balanced tree. The empty tree is a
full tree of zero levels. If T and T' are full binary trees of n
levels then fork(e,T,T') is a full binary tree of n+1 levels.
1
.
.
.
.
.
.
.
.
2
3
.
. . .
.
. . .
4
5 6 7
Full
Binary Tree of 3 Levels.
The
numbering of the nodes corresponds to a breadth-first traversal.
This suggests that such a tree can be stored in an array:
const
n = 2**Levels - 1
array
1..n of Element_Type
type
tree = 0..n
empty(T)
= T=0
left(T)
= 2*T
right(T)
= 2*T+1
parent(T)
= T div 2
leaf(T)
= T >= 2**(Levels-1)
Such
an implementation is very efficient indeed, if the tree is full,
because no space at all is used for pointers. See the chapter on
sorting for an application of this technique in [heap-sort].
Testing
and Debugging
Programming
techniques for trees share a great deal in common with those for
lists. Pre and post conditions and assertions should be included
where possible. Minimise the use of side-effects in general and
the manipulation of global variables in particular.
Note
that there are really just two kinds of tree - emptyTree and
non-emptyTree. Most operations on a tree follow this pattern:
f(emptyTree)
= ... |
f(fork(e,T,T'))
= ...often f(T)...f(T')...
If
a case is missing it probably indicates an error. The empty case
is usually very simple, often returning a constant. The main case
often operates on one or both subtrees recursively.
When
testing or debugging a tree routine, test cases should cover the
above options. For non-emptyTree trees, it may be appropriate to
try a tree of a "few" nodes and a tree of a single node.
As usual it is invaluable to write the output routine first. The
most common problems are: . unassigned pointers, particularly
those that should be emptyTree, . lost pointer values through a
wrong ordering of operations, . knotted pointers maybe introducing
a cycle, . side-effects through shared pointers, intentional or
otherwise.
The
coding of reasonably complex routines such as those on AVL trees
is prone to small typographical errors. Nothing can beat careful
proof reading but one good testing trick is to use a set of
ascending data and then its reversal in which case the mirror
image tree should be produced. This test is not sufficient on its
own! It is important to exercise all the paths through a complex
routine and this requires a great deal of thought.
Exercises
Classify tree operations (height, weight, min_element,
max_element, flatten, insert, lookup, delete, pre,- in- and
post-order traversal) on unordered binary trees and on search
trees as either linear or binary recursive. Note that a
linear-recursive operation may contain two or more calls on
itself but only ever execute at most one of them under the
control of an if statement. Write iterative
versions of one or more of the linear operations.
Add a full set of operators +, -, *, / to the expression parser.
Write a routine to traverse the expression parse tree to generate
reverse Polish code. eg. a*b+c*d ---> a b * c d * +
Write a routine to do symbolic differentiation. Input is the
parse tree of an arithmetic expression which may contain
numerals, variables {x, y, ...}, + and *. Output is the tree of
the expression that is the result of differentiation with respect
to a given variable.
Use the rules:
diff(x, x) = 1
diff(y, x) = 0 if y is a constant or a variable different from x
diff(A+B, x) = diff(A,x)+diff(B,x)
diff(A*B, x) = A*diff(B,x)+diff(A,x)*B
eg. diff(x*x+3*x+4, x) = x*1+1*x+3*1+0*x+0 = 2*x+3
Write a routine to free all of the nodes in a
binary tree. Count the number of new and free
operations and check that they are equal.
Write a routine to collect only the elements in all the leaf
nodes of a binary tree into a list.
The path of a node in a tree is the sequence of elements
contained in nodes from the root, down to and including the node
itself. Give a procedure to print the path of every leaf node in
a tree.
A height-balanced tree is not necessarily full or
weight-balanced. How un-weight-balanced can one be? ie. What is
the minimum possible number `n' of nodes in a height-balanced
tree for height h=1, 2, ..., 8. Is there a pattern? How does n
compare with the value 2h-1 for a full tree?
(Hard) Implement a delete operation for height-balanced
trees.