Project 2, Lazy Binary Search Trees

Due: Tuesday, October 22, 11:59pm

Links: [Project Submission] [Late Submissions] [Project Grading] [Grading Guidelines] [Academic Conduct]

Objectives

The objective of this programming assignment is to have you practice using recursion in your programs and to familiarize you with the binary search tree data structure.

Introduction

In real life, laziness if often disdained. (See Wikipedia article on sloth.) In computer science, however, laziness is sometimes a viable strategy. Why do today what you can put off until tomorrow? especially if there is a chance that you won't actually have to do it tomorrow either?

In contrast to a red-black tree, where we diligently maintain a balance condition to guarantee that the tree has O( log n) height, in a lazy BST, we don't worry about the balancing until things get really out of whack. Insert and delete proceed in the same manner as an unbalanced binary search tree until we notice that at some node of the BST, the left subtree is twice as large as the right subtree (or vice versa). When this happens we rebalance the subtree of the lazy BST rooted at this node.

When a subtree of a lazy BST is rebalanced, we convert the entire subtree into a sorted array. Then we convert the array back into a perfectly balanced BST. Rebuilding is easy because the array is sorted. We can find the middle element of the array in constant time and make it the root of the new subtree. Then, we recursively build the left subtree and the right subtree using, respectively, the portion of the array that has keys smaller than the root and the portion of the array that has keys larger than the root. The result is a binary search tree that is as balanced as possible. (See Project 2 Examples.) The rebalance procedure takes O( t ) on a BST subtree with t elements. However, we don't have to rebalance very often — amortized analysis shows that the insert and delete procedures take O( log n ) amortized time on a lazy BST with n elements.

Since rebalancing is expensive, we add another provision: we won't rebalance a subtree that has height ≤ 3. An unbalanced subtree that has height 3 will not add very much to the height of the overall tree and hence will not contribute very much to the running time of the BST procedures. (We adopt the convention used in the textbook and define the height of a leaf to be 0.) By ignoring, small unbalanced subtrees, we can avoid excessive rebalancing.

One note about the rebalance procedure: it is possible for a lazy BST to have two nodes x and y where rebalancing is needed where x is an ancestor of y. In this situation, we want to do the rebalancing at x since rebalancing the subtree rooted at x will also rebalance the subtree rooted at y. If we rebalanced at y first, the time spent rebalancing at y is completely wasted since all that work is undone when we rebalance at x. (See Project 2 Examples.)


Assignment

Note: Running time is one of the most important considerations in the implementation of a data structure. Programs that produce the desired output but exceed the required running times are considered wrong implementations and will receive substantial deductions during grading.

Your assignment is to implement a lazy BST. You may start with the BST class from the textbook or design your own. Each option has advantages and disadvantages. A primary objective of this programming assignment is to have you use recursion. So, one component of grading will evaluate how elegantly you employ recursion to implement this data structure. (Yes, you are being graded on aesthetics!)

In order to implement a lazy BST efficiently, your data structure must be able to determine the size and height of a subtree in constant time. You must have data members for the height and size of a subtree in the class representing the root of a subtree of a lazy BST. The height and size data members must be updated whenever the height or size of that subtree changes. The update must not affect the asymptotic running time of insert, delete and search. These must still run in time proportional to the height of the tree.

Furthermore, your lazy BST implementation must be generic. In particular, it must work with any class that implements the Comparable interface. The declaration of your LazyBST class must be:

package lazybst; public class LazyBST < T extends Comparable<? super T> > { ... }

This declaration is similar to the one used in the textbook. You must not change the package name because doing so will make your program incompatible with the test programs used for grading and will result in deductions.

Here are the methods you must implement in your LazyBST class. (You will need to implement others for your own coding needs.)

  1. An insert() method that adds an item to the LazyBST that has the following signature:

       public void insert (T x) ;
    

    The insert() method must run in time proportional to the height of the lazy BST. Your LazyBST must not allow duplicates. If the insert() method is invoked with a key value that already appears in the lazy BST, your insert() method should do nothing.

  2. A remove method that finds and removes an item with the given key value. The remove method should return a reference to the item removed. If no such item is in the lazy BST, your remove method should return null. (Do not throw an exception.) The remove method must have the following signature:

       public T remove (T x) ;
    

    For full credit, your remove() method must run in time proportional to the height of the tree.

    Note that the textbook's implementation of the remove method is quite inefficient because in the case where the node to be removed has two children, it first finds the value of the smallest item in the right subtree and then, in a separate step, removes that smallest item from the right subtree. This is inefficient because it uses two traversals of the BST. Your implementation must find and remove the smallest item in the right subtree without traversing the BST a second time.

  3. A find() method that returns the item that has the given key. The signature of the find() method should be:

       public T find (T x) ;
    

    If no such item exists, the find() method should return null. For full credit, your find() method must run in time proportional to the height of the tree.

  4. A method named span() that returns the number of items in the lazy BST with keys within a specified range. I.e., span(x,y) is the number of items with key ≥ x and ≤ y. The signature of the span() method should be:

       public int span (T x, T y) ;
    

    For full credit, your span() method must run in time proportional to the height of the tree. Try to make good use of recursion in the computation of span(x,y).
  5. A method named rebalance() that rebalances a subtree of the lazy BST as described above. The running time of rebalance() must be proportional to the subtree being rebalanced. Note that a proper implementation would require you the keep track of the size and height of the subtree. Read the description above.

  6. A dump() method that prints out each item in the LazyBST in increasing order. Next to each item, the dump() method must also print out the item's height and the size of the subtree rooted at the item.

       public void dump () ;
    

Finally, your lazy BST class must be called LazyBST and must be in a package called lazybst ( NOT proj2 or anything else). Your LazyBST class must be accessible to a main program in a different package. You can check that your code compiles correctly with this sample main program: Proj2Test.java. This test program must be placed in a separate directory named driver (since it belongs to the driver package). The output of Proj2Test.java should look something like this: Proj2Test.txt. Your code must compile with Proj2Test.java without alteration.

Note: "without alteration" means you cannot change even one character of the file — so DO NOT change the package name and DO NOT change the import directives. If your program does not compile with Proj2Test.java on GL using ANT, you must change your program to make it compile. This is the whole point of providing a sample test program.

Addendum: Here are two simple test programs to check that your LazyBST class works with different data types.

Java 6: Here are versions of the test programs for people still using Java 6.


Implementation Notes

Here we list some recommendations and point out some traps and pitfalls.

What to Submit

Read the course project submission procedures. Submission closes by script immediately after midnight. Submit well before the 11:59pm deadline, because 12:00:01 might already be late (the script takes only a few seconds to run).

You should copy over all of your Java source code and have your .java files in their own directories which are in turn under the src directory. You must also supply an ANT build file.

Make sure that your code is in the ~/cs341proj/proj2/ directory and not in a subdirectory of ~/cs341proj/proj2/. In particular, the following Unix commands should work. (Check this.)

cd ~/cs341proj/proj2
ant compile
ant run
ant clean