Overview
For this assignment you must implement a simple Dictionary
class which will hold a collection of correctly spelled words. You are given
three text files. You must make predictions about the relative speed at
which various tasks will be performed and write a program to collect a
number of benchmarking statistics using three text files.
Assignment
Dictionary Class
For this assignment you must implement a simple Dictionary
class which will hold a collection of correctly spelled words. The words
should be stored in a Collection
. It should be easy to
change the actual collection used from ArrayList
to
LinkedList
or some other class that implements the
Collection
interface.
The dictionary has four methods:
- Constructor The constructor should accept a
Collection<String>
. If the argument passed isnull
, the method should throw aNullPointerException
. Otherwise, it should ensure that the collection is empty (emptying it if it is not empty). load(File file)
This method accepts aFile
reference and adds each word in the file to the dictionary. The method returns the number of nanoseconds to add all of the words in the file to the dictionary (as along
). This method should add to the words already in the dictionary (if any). It is optional for this method to only add a word to the dictionary if it is not in the dictionary to begin with.contains(String target)
returnstrue
if and only iftarget
is in the dictionary. You should make use of the underlying collection'scontains()
implementation in order to take advantage of any performance enhancements it may have.clear()
Removes all words from the dictionary. (returns nothing)
Main Program
You will be working with the following three text files:
- story.txt A short story containing a bit over one thousand words.
- words.txt A list of a bit over one hundred thousand words.
- kjv10.txt The contents of the King James Bible containing a bit under one million words.
You will be performing a number of passes with the data. In each pass you
will select three things: 1) the dictionary file, 2) the document file, and
3) the collection to be used. For example, with 1) words.txt
,
2) story.txt
, and 3) ArrayList
, you need to load
all of the words in words.txt
into your dictionary that uses an
ArrayList
to store words and then read each word out of
story.txt
and check to see if it is in the dictionary. You may
ignore complications like punctuation.
For each pass you should determine the number of misspelled words and benchmark two things: the amount of time to add all the words into the dictionary and the amount of time to check all of the words in the document to see if they are in the dictionary. You should not include file access time in your timings.
You must write a program that can be used to generate the data needed to fill in the following table:
Dictionary | Document | Collection | Number Misspelled | Time To Add | Time To Spell-Check |
---|---|---|---|---|---|
words.txt | story.txt | ArrayList | |||
words.txt | story.txt | LinkedList | |||
words.txt | story.txt | SortedArrayList | |||
story.txt | words.txt | ArrayList | |||
story.txt | words.txt | LinkedList | |||
story.txt | words.txt | SortedArrayList |
Your program output must be formatted neatly (consider using printf
(see here) or DecimalFormat
). For example, your program output may look something like this:
| Dictionary | Document | Collection | Misspelled | Time to add | Time to spell-check | | words.txt | story.txt | ArrayList | XXXXX | X mins XX.XXXXXXXXXX secs | X mins XX.XXXXXXXXXX secs | | words.txt | story.txt | LinkedList | XXXXX | X mins XX.XXXXXXXXXX secs | X mins XX.XXXXXXXXXX secs | | words.txt | story.txt | SortedArrayList | XXXXX | X mins XX.XXXXXXXXXX secs | X mins XX.XXXXXXXXXX secs | | story.txt | words.txt | ArrayList | XXXXX | X mins XX.XXXXXXXXXX secs | X mins XX.XXXXXXXXXX secs | | story.txt | words.txt | LinkedList | XXXXX | X mins XX.XXXXXXXXXX secs | X mins XX.XXXXXXXXXX secs | | story.txt | words.txt | SortedArrayList | XXXXX | X mins XX.XXXXXXXXXX secs | X mins XX.XXXXXXXXXX secs |
Once you have run all of you tests for the above table, you should use your findings to predict which of following will run fastest/slowest. Place a 1 the location that will run the fastest, a 2 in the next fastest, ..., and 12 in the location that you believe will run the slowest.
Dictionary | Document | Collection | Time To Add | Time To Spell-Check |
---|---|---|---|---|
words.txt | kjv10.txt | ArrayList | ||
words.txt | kjv10.txt | LinkedList | ||
words.txt | kjv10.txt | SortedArrayList | ||
kjv10.txt | words.txt | ArrayList | ||
kjv10.txt | words.txt | LinkedList | ||
kjv10.txt | words.txt | SortedArrayList |
Instructor Specific Deliverable
Your instructor may require that you run the following tests also. Keep in mind that these tests will require more time to run. Therefore, make sure you budget sufficient time so that you are not waiting on results while your submission deadline passes.
Dictionary | Document | Collection | Number Misspelled | Time To Add | Time To Spell-Check |
---|---|---|---|---|---|
words.txt | kjv10.txt | ArrayList | |||
words.txt | kjv10.txt | LinkedList | |||
words.txt | kjv10.txt | SortedArrayList | |||
kjv10.txt | words.txt | ArrayList | |||
kjv10.txt | words.txt | LinkedList | |||
kjv10.txt | words.txt | SortedArrayList |
Acknowledgements
This assignment was written by Dr. Chris Taylor.
Lab Deliverables
Dr. Taylor's class: See below
Dr. Yoder's submission instructions