What is the best model to classify the data

Assignment Help Computer Engineering
Reference no: EM133371018

The spam datafile contains 4601 emails, 1813 of which are spam. The file has 57 features that include indicators for the presence of 54 keywords (e.g. free, deal, ! etc), counts for capitalized characters etc., and a numeric spam variable for whether each email is tagged as spam by a human reader (spam column is 1 for spam, 0 for important emails).

You have to predict the probability that a message is spam or not.

1) Partition the data into a training set (with 70% of the observations), and testing set (with 30% of the observations) using the random state of 12345 for cross validation.

2) On the partitioned data, build the best KNN model. Show the accuracy numbers. (Hint: What is the best value of k? How do you decide the 'best k'?)

3) On the partitioned data, build the best logistic regression model. Show the accuracy numbers.

4) Based on the results of k-nearest neighbor, and logistic regression, what is the best model to classify the data? Provide explanation to support your argument

Reference no: EM133371018

Questions Cloud

How would software configuration management vary : How would software configuration management vary between organizations, depending on project complexity, software process (agile vs waterfall), and degree
Who is expected to comply with the gcps : Who is expected to comply with the GCPs? Provide examples of the specific roles and their responsibility within clinical research. What happens if a member
Discuss your own moral decision making and how it relates : Discuss your own moral decision making and how it relates to Kohlberg's stages. Do you make moral decisions at a different stage now than you did earlier
Electronic immigration system : Describe four ways United States Citizenship and Immigration Services failed during the modernization of ELIS (Electronic Immigration System).
What is the best model to classify the data : what is the best model to classify the data? Provide explanation to support your argument - On the partitioned data, build the best KNN model. Show the accuracy
Differences between each of purposes of punishment : Explain the differences between each of the purposes of punishment.
Describe how money obtained through tax is used in funding : Describe how money obtained through tax is used in funding healthcare services . this should include an introduction, how the money is generated
Write a thorough explanation of what it is : Write a thorough explanation of what it is, in plain English - approach to pseudo-code on the videos above (from section 2 "A Procedural View of the World")
Describe one of the tools or best practices of strategic : Briefly describe one of the tools or best practices of strategic planning or execution (implementation) (SWOT, Service-Value Chain, Appreciative Inquiry, etc.).

Reviews

Write a Review

Computer Engineering Questions & Answers

  A function that removes all occurrences of the integer

A function (myRemove num list) that removes all occurrences of the integer num from a simple list of integers, returning list with num removed.

  How many bits are in the net id

How many bits are in the net id? What is the mask in binary, dotted decimal, and slash notation?

  Prepare a report that address various system irregularities

In a Microsoft Word document, prepare an 8- to 10-page report that addresses the various system irregularities.

  Write a program to populate an array of size ten

Write a program to populate an array of size ten with ten numbers obtained through user input. Cycle through the array and generate two sums.

  Make a third single dimension array to hold a sum

design a third single dimension array to hold a sum. Your main program will take the two arrays of float and pass them to the function addfloat() Inside the function, add the first array to the second array and put the total into the third array. ..

  Plot the mutual information between the input and output

A binary nonsymmetric channel is characterized by the conditional probabilities p(O I 1) = 0.2 and p(l I 0) = 0.4. Plot the mutual information I(X; Y).

  Describe an example of a very poorly implemented database

Describe an example of a very poorly implemented database that you've encountered (or read about) that illustrates the potential for really messing things up.

  Doctor patient appointment specialty patient

Implement at least the following relations: Doctor Patient Appointment Specialty PatientMedicine Medicine PatientAllergy Allergy

  Expalain a programming language is machine independent

What does it mean that a programming language is strongly typed.

  Explain the purpose of the schottky diode

Sketch a modified ECL circuit in which a Schottky diode is incorporated in the collector portion of the circuit. - Explain the purpose of the Schottky diode.

  What is association rule mining

What is Association Rule Mining? And explain how Market-basket analysis helps retail business to maximize the profit from business transactions?

  Describing the lcg

Provide the value of a? Specify the restrictions that are required on the seed?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd