Reference no: EM133697712
Programming
Assignment Brief
This part of the assignment is an extension of part 1. Whereas for part 1 we considered the correctness only, we are concerned with the efficiency for this part.
The brief for Assignment Part 1 is copied below for your convenience:
In many situations, we may need to find a word based on one or more letters in it. For example, when making or completing a crossword you may want to find a word that has 4 letters, starts with J and ends with A. In this assignment, the program you will write will create a lexicon of words from various text files.
A word is a sequence of letters. Words will be treated in a case-insensitive manner. A simple option is to store all the words in lower case. Initially, a word is obtained as a sequence of characters separated by whitespace characters. Then numbers and punctuation characters, if any, must be removed. More specifically:
If the word "and" occurs twice in the file, it is only stored once in the lexicon.
If the word "you" also occurs as "YOU", only one version is stored.
Punctuation, such as commas, quotes, hyphens, full stops and question marks, are removed.
Numeric characters are removed.
If the word "don't" or the word "second-hand" appears in the input texts, they will be stored without punctuation (i.e. "dont" and "secondhand" respectively).
The 'word' "1", for example, will be ignored as it contains all numeric characters.
For each word, we also keep the following information:
The frequency: How many times the word appears in the input files.
The list of neighbours: A neighbour of a word w is a word that is of the same length and differs from w by only one letter. For example, if the word w refers to "cat", then:
"cathy" is not w's neighbour, for they are not of the same length.
"man" is not w's neighbour, for they differ by more than one letter.
"can" is a neighbour of w
"fat" is a neighbour of w
"cot" is a neighbour of w.
In this part of the assignment, you will need to:
Read one text file and construct a lexicon that contains words from the file
Write the words (including spellings, frequencies and neighbours) from the lexicon to a text file
The format of the text file MUST be the same as in Part 1 of the assignment
You should only include one solution. It is entirely up to you which data structure(s) and/or sorting algorithm(s) will be used, as long as they are included in the subject; however, you must take the program's efficiency in consideration, as the test file is much larger in this part.
The steps required to complete this part of the assignment should be similar to what you did in part 1
To simplify the structure of this notebook, sections have been created for you to populate with parts of your solution. With that said, you are more than welcome to create any additional code blocks as you see fit, just make sure it is clear how it is relevant to your final solution. As you create your solution, make sure you test it as you go to ensure all components work.
Once you have completed this assignment, you will also need to write a brief report describing the data structures and/or sorting algorithms you selected.
Instructions
In this task you to write a function, build_lexicon(), that has two parameters, an input_filename and output_filename.
This function should:
Read all words from the file described by input_filename, constructing a lexicon with all words
Write all words, including spellings, frequencies and neighbours from the lexicon to the file described by output_filename.
This functionality should exactly match what was required in part 1 of the assignment. That is, when writing words to the text file,
The words must be written in sorted alphabetical order.
The information on each word, which is also written to the text file, includes the frequency and the list of its neighbours.
The list of neighbours must also be in alphabetical order.
Neighbours in the list of neighbours can either be written with or without quotation marks around them (refer to part 1 of the assignment).
To output each word, first the spelling of the word is displayed, followed by the frequency, and then followed by the word's neighbours (in alphabetical order, inside a pair of square brackets).
Use the provided sample files to check if the output of your program is correct.
You may implement as many additional functions, etc. as you see fit to implement this task (e.g. similar to part 1 of the assignment where you had functions to read the data, write the data, etc.).
We have provided code that will call your function and create an output text file. Calling your function in this way is how we will test if your solution works as expected. You may not modify this function call.
Purpose/Rationale
The purpose of this assignment is to get practice implementing an efficient solution to a task using data structures and algorithms that you have learned in the subject so far. This is an extension of Part 1 of the assignment.
Task Details
This assignment focusses on content relating to algorithms and data structures you have covered in this subject so far. You are expected to use what you have learned about data structures and algorithms to solve this assignment. In addition, it is expected that you have prior Python programming knowledge to successfully complete this assignment.
Instructions
This assignment takes the form of a notebook, much like those which you have already encountered during your lab sessions. You are required to read through the tasks in the notebook and place your solutions in the code cells as instructed in the notebook itself. In addition to the notebook. you are also required to submit a report that addresses some questions as detailed in the notebook.