Compare the similarity of any two documents

Assignment Help JAVA Programming
Reference no: EM132509407

Programming Assignment

Task 1:

Write a script or a program that reads a text file, pre-processes it and saves the results into a new file.

The text file contains documents, one document per line. Each document is one or several sentences. Your program should take three parameters:
input file name, output file name, stopword list
It should pre-process documents so that they can be later used to create an inverted index. Basic pre-processing should consider:
punctuation tokenization
lower-casing/upper-casing / punctuation / numbers

stop word removal (a list will be provided, one word per line in a file called stopwords.txt)

stemming must use one of the Porter stemmer libraries you can find here:

Task 2:

Write a script or a program that reads a text file of pre-processed documents, creates a Term Document Incident Matrix and an inverted index.
and saves each to files called TDIM.TXT and InvIndex.txt Your program should take one parameter:
input file name and the input file should be assumed to be in the same directory as the application.

The Input file is made of pre-processed documents, Each document will have an Identifier/Title separated from the content by a TAB character.

InvIndex.txt: Each line of the output file should at least contain the term and title of all documents that the term occurs in.

TDIM.txt: Each column must be separated by a TAB character. (see sample file)

Task 3

Write a script or a program that reads a text file containing a TDIM as defined in the previous task, and uses it to produce a TF.IDF weighted matrix which it can then use the vector space model (VSM) to compare the similarity of any two documents and return the Cosine Similarity Measurement.

Your application should take the name of the TDIM file as an argument and the document identifiers of the two documents to be compared.

It is up to you how you manage the logic of this process but the speed of your script/application will be measured as the total time for three separate runs and comparisons using the SAME CORPUS.

Attachment:- Programming Assignment.rar

Reference no: EM132509407

Questions Cloud

Determining the recapitalization plan : Your firm is currently 100% equity financed. The CFO is considering a recapitalization plan under which the firm would issue long-term debt with a yield
What are the repercussions for the patients : It tells the story of three intensive care unit (ICU) nurses at Sanai-Grace Hospital in Detroit. This is a large city ICU where the ideal case load is one nurse
Managerial finance cost capital : Adams, Incorporated would like to add a new line of business to its existing retail business. Construct annual incremental operating cash flow statements.
Determine the market value of a comparable firm : Determine the market value of a "comparable" firm based on the following information: value of target firm = $4,000,000
Compare the similarity of any two documents : Write a script or a program that reads a text file of pre-processed documents, creates a Term Document Incident Matrix and an inverted index.
Describe what happens to the organizational climate : Resistance to change is a normal everyday aspect in the workplace. Note what happens to the organizational climate when this resistance occurs and any tactic.
Compute the amount of overhead cost allocated : Compute the amount of overhead cost allocated to each product and the profitability of each product using the activity based costing approach.
Discuss the given statement related to gates suggestions : Watch the video from Mr. Bill Gates, The next outbreak? We are not ready and Discuss Mr. Gates' suggestions for making us better prepared for the next epidemic.
How much did you borrow for house : How much did you borrow for your house if your monthly mortgage payment for a 30 year mortgage at 6.65% APR is $1,200?

Reviews

Write a Review

JAVA Programming Questions & Answers

  Recursive factorial program

Write a class Array that encapsulates an array and provides bounds-checked access. Create a recursive factorial program that prompts the user for an integer N and writes out a series of equations representing the calculation of N!.

  Hunt the wumpus game

Reprot on Hunt the Wumpus Game has Source Code listing, screen captures and UML design here and also, may include Javadoc source here.

  Create a gui interface

Create GUI Interface in java programing with these function: Sort by last name and print all employees info, Sort by job title and print all employees info, Sort by weekly salary and print all employees info, search by job title and print that emp..

  Plot pois on a graph

Write a JAVA program that would get the locations of all the POIs from the file and plot them on a map.

  Write a university grading system in java

University grading system maintains number of tables to store, retrieve and manipulate student marks. Write a JAVA program that would simulate a number of cars.

  Wolves and sheep: design a game

This project is designed a game in java. you choose whether you'd like to write a wolf or a sheep agent. Then, you are assigned to either a "sheep" or a "wolf" team.

  Build a graphical user interface for displaying the image

Build a graphical user interface for displaying the image groups (= cluster) in JMJRST. Design and implement using a Swing interface.

  Determine the day of the week for new year''s day

This assignment contains a java project. Project evaluates the day of the week for New Year's Day.

  Write a java windowed application

Write a Java windowed application to do online quiz on general knowledge and the application also displays the quiz result.

  Input pairs of natural numbers

Java program to input pairs of natural numbers.

  Create classes implement java interface

Interface that contains a generic type. Create two classes that implement this interface.

  Java class, array, link list , generic class

These 14 questions covers java class, Array, link list , generic class.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd