Reference no: EM132211506
Assignment
Write a program using NLTK python that that contains the following (please use python 2.7)
Part 1
Given no parameters, create a function that returns a list of tuples where the first element in the tuple is a list of brown words for a given fileid and the second element is a category name.
Part 2
Given no parameters, create a function that creates a frequency distribution of brown words and returns a list of the top 1000 words.
Part 3
Given no parameters, create a function that returns a list containing the first file name from each brown category. Note: the built-in function returns a list file ids for a category. To select the first file name, you will need to use the following selection rule to extract the first file id.
[:1][0]
Part 4
Given a parameter containing a document, create a function that returns a dictionary of features. Note: you will not be able to test this function until more of the program is completed.
Part 5
Given a parameter containing a dictionary of features, create a function that returns an inverse dictionary of features.
Part 6
Given a parameter containing file name, first create a features dictionary using the function in Part 4. Second, create and inverse dictionary of the features dictionary using the function in Part 5. Finally return a list of True features for the file. Note: You must use the word True to extract the correct contents. Do not enclose the word in quotes or apostrophes. Capitalize as shown.
Part 7
Given a parameter containing a list of the first files from each brown category, create a function that returns a list of tuples with three elements. The first element is the category. The second element is the file name. The third element is a list of True features for a given file. Use the function from Part 6.
Part 8
Given a parameter containing a list of documents, create a function that returns a list of tuples containing a dictionary of features for the first element and a category name for the second element.
Part 9
Writing the main program
After you create the document list, remember to do a random shuffle
pprint the True features list.
After you create the featuresets, split the list into a training and testing list.
Create a classifier using a Naive Bayes Classifier.
Print the results from an accuracy test. Note: the result may change for each run due to the random shuffle.
Display the 10 most informative features. Note: the result may change for each run due to the random shuffle.