Lab 4
Get started now

Overview

For this assignment you must implement a simple Dictionary class which will hold a collection of correctly spelled words. You are given three text files. You must make predictions about the relative speed at which various tasks will be performed and write a program to collect a number of benchmarking statistics using three text files.

Assignment

Dictionary Class

For this assignment you must implement a simple Dictionary class which will hold a collection of correctly spelled words. The words should be stored in a Collection. It should be easy to change the actual collection used from ArrayList to LinkedList or some other class that implements the Collection interface.

The dictionary has four methods:

  • Constructor — The constructor should accept a Collection<String>. If the argument passed is null, the method should throw a NullPointerException. Otherwise, it should ensure that the collection is empty (emptying it if it is not empty).
  • load(String filename) — This method accepts a filename and adds each word found in the file to the dictionary. The method returns the number of nanoseconds to add all of the words in the file to the dictionary (as a long). This method should add to the words already in the dictionary (if any). It is optional for this method to only add a word to the dictionary if it is not in the dictionary to begin with.
  • contains(String target) – returns true if and only if target is in the dictionary. You should make use of the underlying collection's contains() implementation in order to take advantage of any performance enhancements it may have.
  • clear() — Removes all words from the dictionary. (returns nothing)

Main Program

You will be working with the following three text files:

  • story.txt — A short story containing a bit over one thousand words.
  • words.txt — A list of a bit over one hundred thousand words.
  • kjv10.txt — The contents of the King James Bible containing a bit under one million words.

You will be performing a number of passes with the data. In each pass you will select three things: 1) the dictionary file, 2) the document file, and 3) the collection to be used. For example, with 1) words.txt, 2) story.txt, and 3) ArrayList, you need to load all of the words in words.txt into your dictionary that uses an ArrayList to store words and then read each word out of story.txt and check to see if it is in the dictionary. You may ignore complications like punctuation.

For each pass you should determine the number of misspelled words and benchmark two things: the amount of time to add all the words into the dictionary and the amount of time to check all of the words in the document to see if they are in the dictionary. You should not include file access time in your timings.

You must write a program that can be used to generate the data needed to fill in the following table:

Dictionary Document Collection Number
Misspelled
Time To
Add
Time To
Spell-Check
words.txt story.txt ArrayList      
words.txt story.txt LinkedList      
words.txt story.txt SortedArrayList      
story.txt words.txt ArrayList      
story.txt words.txt LinkedList      
story.txt words.txt SortedArrayList      
Run your tests using the SortedArrayList.java from lab 1.

Your program output must be formatted neatly (consider using printf (see here) or DecimalFormat). For example, your program output may look something like this:

 | Dictionary |  Document |      Collection |   Misspelled |                Time to add |         Time to spell-check |
 |  words.txt | story.txt |       ArrayList |        XXXXX |  X mins XX.XXXXXXXXXX secs |   X mins XX.XXXXXXXXXX secs |
 |  words.txt | story.txt |      LinkedList |        XXXXX |  X mins XX.XXXXXXXXXX secs |   X mins XX.XXXXXXXXXX secs |
 |  words.txt | story.txt | SortedArrayList |        XXXXX |  X mins XX.XXXXXXXXXX secs |   X mins XX.XXXXXXXXXX secs |
 |  story.txt | words.txt |       ArrayList |        XXXXX |  X mins XX.XXXXXXXXXX secs |   X mins XX.XXXXXXXXXX secs |
 |  story.txt | words.txt |      LinkedList |        XXXXX |  X mins XX.XXXXXXXXXX secs |   X mins XX.XXXXXXXXXX secs |
 |  story.txt | words.txt | SortedArrayList |        XXXXX |  X mins XX.XXXXXXXXXX secs |   X mins XX.XXXXXXXXXX secs |

Once you have run all of you tests for the above table, you should use your findings to predict which of following will run fastest/slowest. Place a 1 the location that will run the fastest, a 2 in the next fastest, ..., and 12 in the location that you believe will run the slowest.

Dictionary Document Collection Time To
Add
Time To
Spell-Check
words.txt kjv10.txt ArrayList    
words.txt kjv10.txt LinkedList    
words.txt kjv10.txt SortedArrayList    
kjv10.txt words.txt ArrayList    
kjv10.txt words.txt LinkedList    
kjv10.txt words.txt SortedArrayList    

Instructor Specific Deliverable

Your instructor may require that you run the following tests also. Keep in mind that these tests will require more time to run. Therefore, make sure you budget sufficient time so that you are not waiting on results while your submission deadline passes.

Dictionary Document Collection Number
Misspelled
Time To
Add
Time To
Spell-Check
words.txt kjv10.txt ArrayList      
words.txt kjv10.txt LinkedList      
words.txt kjv10.txt SortedArrayList      
kjv10.txt words.txt ArrayList      
kjv10.txt words.txt LinkedList      
kjv10.txt words.txt SortedArrayList      

Acknowledgements

This assignment was written by Dr. Chris Taylor.

Lab Deliverables

See your professor's instructions for details on submission guidelines and due dates.
Dr. Dennis' students: See Blackboard
See Prof. Hopkins for instructions
Dr. Taylor's class: See below
See Prof. Ung for instructions
If you have any questions, consult your instructor.

Tuesday, 10-Jan-2017 00:48:19 EST