Old version

This is the CS 112 site as it appeared on May 8, 2019.

Problem Set 8

due by 11:59 p.m. on Wednesday, May 1, 2018
No late submissions will be accepted after 11:59 p.m. on Thursday, May 2.

Preliminaries
Part I
Part II
Submitting Your Work

Preliminaries

In your work on this assignment, make sure to abide by the collaboration policies of the course.

If you have questions while working on this assignment, please come to office hours, post them on Piazza, or email class instructor.

Make sure to submit your work following the instructions found at the end of the assignment.

Part I

40 points total

Problem 1: Complete trees and arrays

12 pts. total; individual-only

Put all of your work for this problem in a plain-text file named ps8pr1.txt.

Assume that you have a complete tree with 200 nodes, and that you represent it in array form.

Node A of the tree is in position 50 of the array. What are the indices of A’s left child, right child, and parent? Explain how you got your answers.
What is the height of the tree? Explain your answer briefly.
The bottom level of the tree contains some number of leaf nodes. Is the rightmost leaf node in the bottom level the left child of its parent or the right child of its parent? Explain your answer briefly.
In Problem 6, you will implement a method that determines if the complete tree corresponding to an array of values is a heap. What is the efficiency of determining if an array of length n represents a heap in the best case? In the worst case? Use big-O notation, and explain your answers briefly.

Problem 2: Heap basics

6 pts. total; individual-only

Put all of your work for this problem in a PDF file named ps8pr2.pdf. See below for details on how to create the PDF file.

Consider the following heap of integers:

Show the heap as it will appear:
- after one removal (i.e., after one call to the remove() method discussed in lecture)
- after a second removal.
You should include two separate diagrams, one for how the heap will look after each removal. To do so, you should:
- Open the template that we have created for this problem in Google Docs.
- Select File -> Make a copy..., and save the copy to your Google Drive using the name ps8pr2.
- Edit the diagram for part 1 by clicking on it and then clicking the Edit link that appears below the diagram.
- Make the necessary changes to the tree to show the results of the first removal.
- Click the Save & Close button.
- Make a copy of your edited diagram within the Google Doc and edit it to show the results of the second removal. Click the Save & Close button for that new diagram.
Suppose we have the original heap and that 40 is inserted followed by 25. Show the final heap by editing the diagram that we have provided for part 2 in your copy of the Google Doc.

Once you have made all of the necessary edits to your copy of the Google Doc, choose File -> Download as -> PDF Document, and save the PDF file on your machine.

Problem 3: Heaps and heapsort

7 pts. total; individual-only

Put all of your work for this problem in a PDF file named ps8pr3.pdf. See below for details on how to create the PDF file.

Consider the following complete tree of integers:

Turn this tree into a max-at-top heap using the procedure outlined in lecture, and show the heap that is produced.
- Open the template that we have created for this problem in Google Docs.
- Select File -> Make a copy..., and save the copy to your Google Drive using the name ps8pr3.
- Edit the diagram for part 1 by clicking on it and then clicking the Edit link that appears below the diagram.
- Make the necessary changes to the tree to show the final heap.
- Click the Save & Close button.
What is the array representation of the max-at-top heap that you obtained in part 1?
Heapsort begins by turning the array to be sorted into a heap. Assume that your answer to part 2 is the result of this process of turning the original array into a heap.

What will the array look like after one element is put into its final position by heapsort – i.e., at the end of the first iteration of the loop in heapSort()?

What will the array look like after the second element is put into its final position?

Once you have made all of the necessary edits to your copy of the Google Doc, choose File -> Download as -> PDF Document, and save the PDF file on your machine.

Problem 4: Hash tables

9 points total; individual-only

Put all of your work for this problem in a plain-text file named ps8pr4.txt.

The following sequence of keys is to be inserted into an initially empty hash table of size 8 that employs open addressing:

cat, goat, dog, bird, bison, ant, flea, bat, duck

The hash function assigns to each key the number of characters in the key. For example, h("cat") is 3, because "cat" has 3 characters.

Assume that linear probing is used to insert the keys. Determine which key causes overflow, and show the table at that point by copying the following template into your text file and adding the keys at the appropriate positions:
```
0: 
1: 
2: 
3: 
4: 
5: 
6: 
7:
```
Now assume that quadratic probing is used. Determine which key causes overflow, and show the table at that point.
Finally, assume that double hashing is used, with the value of the second hash function based on the first character in the key: a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, etc. Determine which key causes the table to overflow, and show the table at the point at which it does so.

Problem 5: Huffman encoding

6 points; individual-only

We will cover the material needed for this problem during the coming week.

Put all of your work for this problem in a PDF file named ps8pr5.pdf. See below for details on how to create the PDF file.

Consider the following table of character frequencies:

Character	Frequency
`o`	`6`
`f`	`9`
`l`	`13`
`c`	`22`
`a`	`30`

(4 points) Show the Huffman tree that would be constructed from these character frequencies.
- Open the template that we have created for this problem in Google Docs.
- Select File -> Make a copy..., and save the copy to your Google Drive using the name ps8pr5.
- Edit the diagram by clicking on it and then clicking the Edit link that appears below the diagram. It includes the some of the necessary nodes and edges, but you should create more of them as needed, move them into the appropriate positions, and connect them.
- Click the Save & Close button.
(2 points) Using the Huffman tree from part 1, what will be the encoding of the string focal?

Once you have made all of the necessary edits to your copy of the Google Doc, choose File -> Download as -> PDF Document, and save the PDF file on your machine.

Part II

60 points total

Preparing for Part II

Begin by downloading the following zip file: ps8.zip

Unzip this archive, and you should find a folder named ps8, and within it the files you will need for Part II. Make sure that you keep all of the files in the same folder.

Problem 6: Determining if an array is a heap

20 points; individual-only

Make sure to begin by following the instructions given above under Preparing for Part II.

In Problem6.java, we have given you the following pair of methods, one of which still needs to be implemented:

public static <T extends Comparable<T>> boolean isHeap(T[] arr) {
    if (arr == null || arr.length == 0) {
        return false;
    }

    return isHeapTree(arr, 0);
}

private static <T extends Comparable<T>> boolean isHeapTree(T[] arr, int i) {
    // Implement this method.

}

Together, these methods are meant to determine if an array of objects corresponds to a heap. The methods are generic, and they should work with an array of any type T that implement Comparable<T>.

Implement the second method so that it uses recursion to process the complete tree/subtree whose root is at position i in the array arr. The method should return true if that tree/subtree is a heap, and false if it is not a heap. A leaf node is the root of a heap with only one node.

Note that the first method – which is a public wrapper method for the method that you will write – makes the initial call to your method with a value of 0 for i, which means that it will determine if the complete tree represented by the full array is a heap. You should not modify this wrapper method.

In implementing your recursive method, you may find it helpful to review:

the arithmetic rules for navigating through a complete tree that is stored in an array
how the compareTo() method can be used to compare two objects.

We have included a main() method with one unit test. You should add at least two others.

Problem 7: Implementing a hash table using separate chaining

40 points; individual-only

In lecture, we discussed the following interface for a hash table ADT:

public interface HashTable {
    boolean insert(Object key, Object value);
    Queue<Object> search(Object key);
    Queue<Object> remove(Object key);
}

We also examined one implementation of this interface that uses open addressing (the OpenHashTable class that we’ve included in the folder for Part II).

In this problem, you will write your own implementation of this interface that uses separate chaining. Call your class ChainedHashTable, and put it in an appropriately named Java file. Make sure that the class header indicates that it implements the HashTable interface.

Maintaining the state of the hash table
Each ChainedHashTable object should include at least two private fields:

one called numKeys for the number of keys in the hash table
one called table that holds a reference to the array that you will use for the hash table. However, rather than having each element of the array store a single entry as we do in open addressing, each element of the array should serve as a bucket or chain of entries.

More precisely:

Each element of the array should hold a reference to the first node in a linked list of nodes. You may not use our LLList class or any other linked-list class for this purpose. Rather, you should define a private inner class for the nodes, and you should manipulate these nodes yourself within your hash table methods.
Each node should store a single key and the collection of values associated with that key, as well as a reference to the next node in the linked list (if any).
You should use an LLQueue object for the collection of values associated with a given key, just as we do in the OpenHashTable class.

Other tasks

Copy the h1() method from our OpenHashTable class into your new class, and use it as the hash function.
Because separate chaining allows all entries whose keys are hashed to a given position in the table to stay in that position, you will not need to perform any probing. As a result, you won’t need the second hash function h2() because it is only necessary for double hashing.
Include a constructor that takes the size of the hash table as its only parameter and initializes the hash table accordingly. It should throw an IllegalArgumentException if the size is not positive.
Implement each method from the HashTable interface as efficiently as possible. Make sure that each of these methods has the same basic functionality as the corresponding method from the OpenHashTable class discussed in lecture. In particular, they should throw exceptions for the same reasons that the OpenHashTable methods do.

As you add or remove items from the table, make sure that you update numKeys accordingly.

Because a given chain can grow to an arbitrary length, the hash table will never overflow, and thus your insert method can always return true.
Our OpenHashTable class doesn’t include a toString() method, but you should include one in your ChainedHashTable class. It should return a string of the form
```
{table[0], table[1], ..., table[size - 1]}
```
where each position of the table is represented by either:
- the key or keys in that position’s chain of entries, separated by commas and surrounded by square brackets.
- the word null if there are no keys in that position
For example:
```
> ChainedHashTable table = new ChainedHashTable(5);
> table.insert("howdy", 15);
> table.insert("goodbye", 10);
> table.insert("apple", 5);
> table
{[apple, howdy], null, null, [goodbye], null}
```
Note that:
- Position 0 has a chain with two keys, because both "howdy" and "apple" are assigned a hash code of 0 by h1() when the table size is 5.
- Position 3 has a chain with one key, because "goodbye" is assigned a hash code of 3 by h1() when the table size is 5.

Define an accessor method for the number of keys. Call this method getNumKeys(). For example:

> ChainedHashTable table = new ChainedHashTable(5);
> table.insert("howdy", 15);
> table.insert("goodbye", 10);
> table.insert("apple", 5);
> table.getNumKeys()
3
> table.insert("howdy", 25);  # insert a duplicate
> table.getNumKeys()   # duplicates don't change the number of keys
3

Although a hash table that uses separate chaining won’t overflow, it can become too full to offer decent performance. To allow clients to measure the degree of fullness, add a method called load() that takes no parameters and that returns a value of type double that represents the load factor of the table: the number of keys in the table divided by the size of the table.

For example:
```
> ChainedHashTable table = new ChainedHashTable(5);
> table.insert("howdy", 15);
> table.insert("goodbye", 10);
> table.insert("apple", 5);
> table.load()
0.6
> table.insert("pear", 6);
> table.load()
0.8
```

To allow clients to obtain all of the keys in the hash table, add a method called getAllKeys() that takes no parameters and that returns an array of type Object containing all of the keys in the hash table.

For example:

> ChainedHashTable table = new ChainedHashTable(5);
> table.insert("howdy", 15);
> table.insert("goodbye", 10);
> table.insert("apple", 5);
> table.insert("howdy", 25);  # insert a duplicate
> Object[] keys = table.getAllKeys();
> keys
{ apple, howdy, goodbye }

To deal with situations in which the table becomes too full, add a method called resize() that takes an integer representing the new size, and that grows the table to have that new size. It should not return a value.

As discussed in lecture, it is not possible to simply copy every element of the current table into a new, larger table. This is because a given key may belong in a different position in the larger table. As a result, you will need to rehash the current keys in the hash table to ensure that they end up in the correct position in the resized table.

For example:
```
> ChainedHashTable table = new ChainedHashTable(5);
> table.insert("howdy", 15);
> table.insert("goodbye", 10);
> table.insert("apple", 5);
> table
{[apple, howdy], null, null, [goodbye], null}
> table.resize(7);
> table
{null, [apple], null, null, null, [howdy], [goodbye]}
```
Note that in this case, resizing the table causes all three keys to end up in different positions!

Special cases:
- The method should throw an IllegalArgumentException if the specified new size is less than the table’s current size.
- If the specified new size is the same as the table’s current size, the method should return without doing anything.
Add a main() method that performs at least two unit tests for each of the methods in the class. Use the same format as the sample unit test in Problem6.java.

Optional Extra Credit

Problem 8: Analyzing a text document

20 points; individual-only

Overview
Hash tables can be useful when analyzing textual data. In the file TextInfo.java, we’ve given you the beginnings of a class that will serve two purposes:

It will provide a blueprint for TextInfo objects that can be used to analyze a text file and to store information about the words that appear in that file.
It will also serve as program that users can run to analyze a text, and to ask for information about specific words in that text.

Here is the menu of options that the program will provide:

1) Find out where a word appears
2) Lookup a word's frequency
3) List the most common words
4) Quit

Each TextInfo object will maintain two separate hash tables:

one that keeps track of the line numbers on which each word in the text file appears
one that keeps track of each word’s frequency (i.e., how often it appears in the text).

In addition, the TextInfo object will need another data structure that allows you to efficiently implement option 3: listing the k most common words in the text for some integer k. You may use any of the data structures and algorithms that we have studied this semester. We have included the most recent Java classes from lecture in the ps8 folder, but feel free to use code that we gave you for the other problem sets as well.

In general, you should try to implement all of the necessary functionality as efficiently as possible.

Your tasks
Here are the steps you should take:

Define private fields for the two hash tables and for the additional data structure for option 3. Ideally, you should use your ChainedHashTable class from the previous problem for the hash tables. However, if you weren’t able to get that class to work, you may use our OpenHashTable class instead, although you will need to give it some of the additional functionality that we asked you to include in the ChainedHashTable class.
We have given you the beginnings of a constructor for TextInfo objects that takes a string specifying the name of a text file. Complete it so that it initializes the fields of the new object based on an analysis on the text file. In particular, it should add the necessary entries to the hash tables, and it should initialize the additional data structure that you are using for option 3.

Because we haven’t covered file processing in Java, we’ve given you some skeleton code for reading the file line-by-line. The code that we have provided:
- creates a File object for the file whose name is passed in
- determines the number of bytes in the file, and computes a size for the hash tables that is based on that value
- creates a Scanner object to read from the file
- uses a while loop to obtain one line at a time from the file.
You should add whatever code is needed to process each line. You should also add whatever code is needed before and after the while loop.

Notes:
- We have included a special clause in the header of the constructor
  (throws FileNotFoundException) that Java requires because the process of creating a Scanner for a file can produce this exception.
- Make sure that you use the size variable that we have provided when constructing the two hash tables.
- Don’t forget that you can use the split() method to split a String object into an array of strings.
- You do not need to attempt to remove punctuation from the words.
Define a method called getNumWords() that returns the number of unique words in the text file represented by the TextInfo object. You should not need to use a separate field for this value. Rather, you should make use of functionality that is already present in the data structures that you created in the constructor.

Here’s one example:
```
> TextInfo text = new TextInfo("romeo.txt");
> text.getNumWords()
2470
```
Define a method called linesForWord() that takes a string representing a word and that returns a string specifying the line numbers on which that word appears in the text file represented by the TextInfo object. If the word does not appear in the file, the method should return the string "none".

For example:
```
> TextInfo text = new TextInfo("romeo.txt");
> text.linesForWord("thy")
"{102, 109, 152, 161, 163, 279, 325, 463, 520, 596, 605, 612, 630}"
> text.linesForWord("Romeo")
"{924}"
> text.linesForWord("Romeo,")
"{343, 472, 701, 1035}"
> text.linesForWord("Java")
"none"
```
Hint: When you retrieve the values associated with a key in one of our hash table implementations, you get back a queue of values. Our LLQueue class includes a toString() method that you can use to see all of the values in the queue, so you can – and should! – take advantage of that method!
Define a method called frequency() that takes a string representing a word and that returns an integer specifying the number of times that the word appears in the text file represented by the called TextInfo object.

For example:
```
> TextInfo text = new TextInfo("romeo.txt");
> text.frequency("thy")
13
> text.frequency("Romeo")
1
> text.frequency("Romeo,")
4
> text.frequency("Java")
0
```
Define a method called kMostCommon() that takes an integer k as its only parameter, and that returns an array of strings containing the k most common words in the text file represented by the called TextInfo object – beginning with the most common word, then the second most common word, etc.

For example:
```
> TextInfo text = new TextInfo("romeo.txt");
> String[] words = text.kMostCommon(10);
> words
{ the, of, I, and, a, to, in, my, is, you }
> text.kMostCommon(20)
{ the, of, I, and, a, to, in, my, is, you, with, that, ROMEO:, his, not, And, be, BENVOLIO:, it, as }
```
Complete the main() method so that it makes the appropriate method calls and displays the appropriate information. We have given you some of the code for this method. You should fill in the rest. A sample run of the program can be found here.

Submitting Your Work

Submission Checklist:

You have read the Java Style Guide (linked on Course page)and followed all guidelines regarding format, layout, spaces, blank lines, and comments;
You have removed or changed all comments inherited from my templates which do not relate to your solution;
ensured that the signature of the methods specified in the assignment have not been changed.
You have verified that your programs satisfy all the performance tests in the templates;

Follow the instructions and submit the following files:

your ps8pr1.txt file for Problem 1
your ps8pr2.pdf file for Problem 2
your ps8pr3.pdf file for Problem 3
your ps8pr4.txt file for Problem 4
your ps8pr5.pdf file for Problem 5
your modified Problem6.java file containing your work on Problem 6
your ChainedHashTable.java file containing your work on Problem 7;
your modified TextInfo.java file containing your work on Problem 8 (optional)

Warnings

Make sure to name the folder ps8.
Make sure to use these exact file names as specified or we will not be able to find your files and grade them.
Make sure that you do not try to submit a .class file or a file with a ~ character at the end of its name.
If you make any last-minute changes to one of your Java files (e.g., adding additional comments), you should compile and run the file after you make the changes to ensure that it still runs correctly. Even seemingly minor changes can cause your code to become unrunnable.
Please rememner to NOT create a TAR file.