Project 8: Baby Names, UMBC CMSC313 Spring 2013

CMSC313, Computer Organization & Assembly Language Programming, Spring 2013

Project 8: Baby Names

Due: Tuesday April 23, 2013, 11:59pm

Objective

The objective of this programming exercise is to practice writing C programs for data structures and to gain some experience with the valgrind utility.

Note: binary search trees were discussed in class. If you missed this explanation, please look up binary search trees in a data structures textbook or online.

Note: You can download all of the files referenced in this page as a tar file (proj8.tar) or copy them on the GL file system:

/afs/umbc.edu/users/c/h/chang/pub/cs313/proj8.tar /afs/umbc.edu/users/c/h/chang/pub/cs313/proj8/*

Assignment

For this assignment, you are provided with a basic implementation of binary search trees: bst.c, bst.h. You will modify and add features to this binary search tree ADT.

These demo main programs show that the basic implementation only work with int data:

First demo: demo8a.c, demo8a.txt.
Second demo: demo8b.c, demo8b.txt.

Step 1: Warm-up exercise

Implement a function bst_walk_depth() with this prototype:

void bst_walk_depth (tnode *ptr, int depth) ; This function should indent the output of the inorder walk in the binary search tree (BST) and indent each line according to the depth of the node in the tree. Your function must be recursive. The first call to bst_walk_depth() should have depth 0. See these main programs and sample runs:

main8a.c, main8a.txt.
main8b.c, main8b.txt.

Keep a copy of this implementation for submission.

Step 2: expand tnode

In this step you will modify the type definition of tnode to include a string (char pointer) and an int field. The purpose of these field is to store baby names in the string and the frequency that these names occur in the US population in the int field. This data structure will help new parents pick baby names.

In the new header file, bst2.h, the data fields are deliberately renamed to name and frequency so that if you forget to modify a function in bst.c, your program will not compile.

When you make a new tnode the string in the name field MUST have its own dynamically allocated memory. It cannot simply point to the string given to bst_insert(). (Hint: use strdup().)

Modify the existing BST functions to work with the new tnode definition. The nodes in this new BST ADT should be sorted by the frequency field (NOT by the name field!!!).

You can use the following main program to test your implementation (comment out the call to bst_find_by_name()):

main8c.c, main8c.txt

Step 3: implement two new functions

In this step you will implement two new functions:

tnode *bst_find_by_rank(tnode *ptr, int rank) ; tnode *bst_find_by_name(tnode *ptr, const char *nom) ;

The function bst_find_by_rank() finds a name by popularity. The most popular name has rank 1. The least popular name has the biggest rank. Your implementation of this function must make efficient use of the size field of tnode. This is the number of nodes in the subtree rooted at the current tnode. For example, if your asked to find a name with rank 15, you can first check the size of the right subtree (if it exists). Suppose the size of the right subtree is 25, then you can just recursively find the node with rank 15 in the right subtree. If the size of the right subtree is 14, then the current node is the node with rank 15. On the other hand, if the right subtree is small, say it has size 5, then you should find a node with rank 9 (= 15 − 5 − 1) in the left subtree.

On the other hand, the bst_find_by_name() function cannot be implemented efficiently since the data structure is sorted by frequency rather than by name. To find a tnode by name, you simply traverse the BST and check if there is a tnode that matches the given name. (This should be a pre-order walk: visit current node first, then left subtree then right subtree. You can shortcut and return when a match is found.)

For both bst_find_by_rank() and bst_find_by_name(), the return value should be NULL if no matching node is found.

Use the following main programs to check your implementation:

main8c.c, main8c.txt. (Keep the call to bst_find_by_name() this time.)
main8d.c, main8d.txt.
main8e.c, main8e.txt.

The last two programs require a data file like this one: names1990s.txt. This data was obtained from the Social Security Administration and contains the frequency of girls names on social security card applications from the 1990's.

Step 4: run with valgrind

Test your programs using the valgrind utility. You may not have valgrind installed on your own machine, so this step may require you to run your programs on GL. The syntax for invoking valgrind is very simple:

valgrind a.out The valgrind utility will simulate your program and try to detect out-of-bounds memory references and memory leakage. The sample runs linked above include examples of running programs in valgrind. A clean report from valgrind should look something like this: ==2303== ==2303== HEAP SUMMARY: ==2303== in use at exit: 0 bytes in 0 blocks ==2303== total heap usage: 401 allocs, 401 frees, 5,829 bytes allocated ==2303== ==2303== All heap blocks were freed -- no leaks are possible ==2303== ==2303== For counts of detected and suppressed errors, rerun with: -v ==2303== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 12 from 8)

If valgrind detects an error, you definitely have a bug. On the other hand, a clean report from valgrind does not necessarily mean your program is bug-free since a bug might show up using different data for input.

Implementation Notes

Practice incremental development! Implement one feature at a time. Test the newest feature and debug it before proceeding to the next step.
Your programs must compile and run with the original bst.h and bst2.h. So, do not modify these header files. Otherwise, your programs will not compile with the programs we use for grading.
Remember that you have to use strcmp() for string comparison.
For the new tnode, you must call free() to free up the name field whenever you discard a node.

Extra Credit

The code in main8e.c to print out the top 20 baby names is not very efficient: printf("Top 20 names:\n") ; int i ; for(i=1 ;i <= 20 ; i++) { print_rank(tree, i) ; } Each time through the for loop is a call to bst_find_by_rank(). We should be able to use the size field to walk the binary search tree from the rank 1 node to the rank 20 node.

Similarly, when we wanted to print out all the names from rank 25 to rank 30, we made separate calls to bst_find_by_rank(). This is inefficient and can be replaced by a single walk in the binary search tree from the rank 25 node to the rank 30 node.

For 10 points extra credit, implement a function bst_print_ranks() with the following prototype:

void bst_print_ranks(tnode *ptr, int rank1, int rank2) ; This function should print out the names and frequencies in the nodes with rank between rank1 and rank2 inclusive. For full credit, your implementation must be "efficient". For example, traversing the entire BST and printing out the nodes with appropriate rank is very inefficient because the size of the tree could be much larger than the number of nodes printed. An efficient implemenation would run in time proportional to the height of the BST and the the number of nodes with rank between rank1 and rank2. (Hint: figure out where you have to stop before you begin your walk.)

Write and submit a test program that demonstrates your extra credit implementation.

Turning in your program

Use the UNIX submit command on the GL system to turn in your project.

You should submit a separate file for the basic implementation in Step 1 and another file for the implementation of the BST functions for the new tnode. Call these bst.c and bst2.c.

Record your self running the sample main programs using the script utility. It will be convenient to have separate typescript files, so number these sequentially as: typescript1, typescript2, typescript3, ...

As usual, grading will be done using different main programs, so you will want to create your own test programs. You may submit these if you wish. Please call them: test1.c, test2.c, test3.c, ...

Last Modified: 16 Apr 2013 09:02:10 EDT by Richard Chang

to Spring 2013 CMSC 313 Homepage