Examine the term-document matrix

Assignment Help Computer Engineering
Reference no: EM131926008

Problem

Classifying Classified Ads Submitted Online. Consider the case of a website that caters to the needs of a specific farming community, and carries classified ads intended for that community. Anyone, including robots, can post an ad via a web interface, and the site owners have problems with ads that are fraudulent, spam, or simply not relevant to the community. They have provided a file with 4143 ads, each ad in a row, and each ad labeled as either -1 (not relevant) or 1 (relevant). The goal is to develop a predictive model that can classify ads automatically.

• Open the file farm-ads.csv, and briefly review some of the relevant and non-relevant ads to get a flavor for their contents.

• Following the example in the chapter, preprocess the data in R, and create a term document matrix, and a concept matrix. Limit the number of concepts to 20.

a. Examine the term-document matrix. i. Is it sparse or dense? ii. Find two non-zero entries and briefly interpret their meaning, in words (you do not need to derive their calculation)

b. Briefly explain the difference between the term-document matrix and the concept document matrix. Relate the latter to what you learned in the principal components chapter (Chapter 4).

c. Using logistic regression, partition the data (60% training, 40% validation), and develop a model to classify the documents as ‘relevant' or ‘non-relevant.' Comment on its efficacy.

d. Why use the concept-document matrix, and not the term-document matrix, to provide the predictor variables?

Reference no: EM131926008

Questions Cloud

Explain the contents of some or all of the given clusters? : What other external information can explain the contents of some or all of these clusters? Remove all records with missing measurements from the dataset.
What is nashs thesis : What is he trying to convince you is true about people Colonial America and the reasons they might have participated in the American Revolution?
What happens after the jury has returned a verdict : What happens after the jury has returned a verdict. The discussion will cover motions for a new trial, motions in arrest of judgment, as well as the appeal.
How many natural clusters appear : Perform hierarchical clustering and inspect the dendrogram. From the dendrogram, how many natural clusters appear?
Examine the term-document matrix : Examine the term-document matrix. i. Is it sparse or dense? ii. Find two non-zero entries and briefly interpret their meaning, in words.
Which specific details in agee description of boyhood : Which specific details in Agee's description of his boyhood in Knoxville suggest his attitude toward the people and the rhythm
Calculate the equivalent annual net costs : Make comparisons of these projects to establish which has higher/lower present values of their costs. Use the "rollover" method to establish equal project.
What are the elements of poetry : What are the elements of poetry, and how can poetry stimulate the imagination in children? 150 please
What is black lives matters : What is black lives matters? When did this organization come about and why?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Plot the average lengths versus m

(a) For a binary source with probabilities P(0) = 0.9, P(l) = 0.1, design a Huffman code for the source obtained by blocking m bits together, m = 1, 2, . . ., 8. Plot the average lengths versus m. Comment on your result.

  Which type of network devices would be supported

You have been asked to generate a paper that compares and contrasts guided versus wireless media for inclusion in your company's knowledge database.

  Complexity of the human brain

Discuss the major strengths, weaknesses, and ethical issues on developing computing chips based off the human brain.

  Compare and contrast the useradd

Compare and contrast the useradd and adduser commands in Linux. What is their purpose? Which one would you use? What other processes besides using these two commands might you employ to accomplish the same task?

  Draw negative edge triggered t flip flop

Draw a negative edge triggered T flip flop. Circuit has two inputs, T(toggle) and C(clock) and output Q and Q'. Output state is complemented if T=1 and clock C changes from 1 to 0.

  Program to find smaller between num1 and num2 to output

Program to find smaller between num1 and num2 to output

  Status of 3g and 4g network technologies

Describe the status of 3G and 4G network technologies in detail and also carry out some of the research work on the future of 5G technology.

  Effective communication is a very essential tool in

effective communication is a very essential tool in leading people. describe a time when a leader communicated an

  Questiontranslate the following c program to mips

questiontranslate the following c program to mips assembly.int fibint nif n 0 return 0 else if n 1 return 1 else

  Suggest the other protection scheme which can be used more

for this assignment you will choose one of the following optionsoption 1 file access write a 2-page paper that

  Explain what is a reflective cross-site scripting attack

What is a cross-site scripting attack? Explain in your own words. What is a reflective cross-site scripting attack? Which Web application attack is more likely to extract privacy data elements out of a database?

  Implement a slice method for the unordered list class

Implement a slice method for the Unordered List class. It should take two parameters, start and stop, and return a copy of the list starting at start position.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd