Reference no: EM132105100
Problem Statement:
In this project, you are asked to write a Java application that utilizes your knowledge about a number of data structures we have been discussing throughout the course of this semester.
The main task of this application is to automatically generate a book index for a given arbitrary text file.
As you know, a traditional book index lists on which page each important/key word occurs. In the application that you will develop, you are required to generate an index for ALL words in the given file.
To standardized testing of all students' submissions, all of you are required to use the given text file posted online next to this project statement.
The file name is "alice30.txt" and it contains the famous Alice in Wonderland book that is freely available via Project Gutenberg. The simple given test file that you MUST use does NOT have page numbers and thus you will use chapters instead in your indexing.
While considering which data structure that can best fit this application, please remember that your index will look similar to the following:
"Keyword"?{2, 3, 7, 9}
Where "keyword" is the word that you are trying to index and {2, 3, 7, 9} is the set of chapters that "keyword" occurs in. In other words, "keyword" occurs in chapters 2, 3, 7, and 9. If a keyword occurs multiple times in the same chapter, your index will ONLY list the chapter one time and thus maintain the set property.
The structure of such index can be implemented using a Map whose keys are the Strings representing the words that you are indexing and the value associated with each key is a set of integer values denoting which chapters a particular key word occurs.
Hint: your main data structure can take the following form. The choice of TreeMap and TreeSet will ensure that the data stored in these structures are sorted.
TreeMap>
Your code is expected to have two files with the following functionalities:
A Driver program that will create a Scanner object to open the given input file and make sure that ALL non-alphabetical characters are skipped. To do that you need to use the appropriate regular expressions with the .useDelimiter() method of the scanner class. Then, all data from the input file will be read and converted to lower case. The driver program will then invoke the appropriate methods from the MainIndexingClass to generate the desired index then display the generated index on the monitor.
A MainIndexingClass file that defines the selected data structure then provides appropriate constructor to initialize that TreeMap. This class also will provide all needed functionalities to generate and maintain the required index structure.