Detecting Spam Email

Assignment Help Basic Computer Science
Reference no: EM133300734

Detecting Spam Email (from the UCI Machine Learning Repository):

A team at Hewlett Packard collected data on a large amount of email-messages from their postmaster and personal email for the purpose of finding a classifier that can separate email-messages that are spam vs. non-spam (AKA "ham"). The spam concept is diverse: it includes advertisements for products or websites, "make money fast" schemes, chain letters, pornography, etc. and so on. The definition used here is "unsolicited commercial e-mail". The file Spambase.xls contains information on 4601 email-messages, among which 1813 are tagged "spam". The predictors include 57 attributes, most of them are the average number of times a certain word (e.g., mail, George) or symbol (e.g., #, !) appears in the email. A few predictors are related to the number and length of capitalized words.

1. To reduce the number of predictors to a manageable size, examine how each predictor differs between the spam and non-spam emails by comparing the spam-class average and non-spam-class average. Which are the 11 predictors that appear to vary the most between spam and non-spam emails? From these 11, which words/signs occur more often in spam?

2. Partition the data into training and validation sets, then perform a discriminant analysis on the training data using only the 11 predictors.

3. If we are interested mainly in detecting spam messages, is this model useful? Use the confusion matrix, lift chart, and docile chart for the validation set for the evaluation.

4. In the sample, almost 40% of the email-messages were tagged as spam. However, suppose that the actual proportion of spam messages in these email accounts is 10%. Compute the constants of the classification functions to account for this information.

5. A spam filter that is based on your model is used, so that only messages that are classified as non-spam are delivered, while messages that are classified as spam are quarantined. In this case mis-classifying a non-spam email (as spam) has much heftier results. Suppose that the cost of quarantining a non-spam email is 20 times that of not detecting a spam message. Compute the constants of the classification functions to account for these costs (assume that the proportion of spam is reflected correctly by the sample proportion).

Reference no: EM133300734

Questions Cloud

What is included in a personal budget : What values or issues are important enough to get you to participate in a boycott? What are the main costs associated with higher education?
Implementing a shorter workweek for increased productivity : Report about an article about the problems facing organizations and managers on the topic "Implementing a Shorter Workweek for Increased Productivity".
Analysis of social networks within organizations : Conduct a cost benefit analysis of social networks within organizations and provide your overall view point on social networks.
Define mandatory spending and discretionary spending : Define mandatory spending.State which category within mandatory spending is most important and why.Define discretionary spending
Detecting Spam Email : A team at Hewlett Packard collected data on a large amount of email-messages from their postmaster and personal email for the purpose of finding a classifier
Why people behave certain way in organizational environment : Why do people behave a certain way in an organizational environment? What factors affect job performance, employee interaction, job commitment.
Consider alternate policies to minimize carbon emissions : Consider alternate policies to minimize carbon emissions, such as a levy on methane emissions, tax credits for buying electric cars, and support for the clean
Advantages of placing functionality in device controller : What are three advantages of placing functionality in a device controller, rather than in the kernel?
Explain increased national income that gets spent on health : explain the increased national income that gets spent on health care.Regarding the Handbook of Health Economics by Mark V. Pauly, Thomas G. McGuire

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Developing effective ethics programs and audits

What challenges may arise with developing effective ethics programs and audits? Why do you say such?

  Communication technologies of guided media

Compare and contrast the data communication technologies of guided media and unguided media.

  Implementing security policies

For leaders, implementing security policies is all about working through others to gain their support and adhere to the policies.

  Majority of population associates blockchain

The vast majority of the population associates Blockchain with cryptocurrency Bitcoin; however, there are many other uses of blockchain;

  Majority of population associates blockchain

The vast majority of the population associates Blockchain with cryptocurrency Bitcoin; however, there are many other uses of blockchain; such as Litecoin

  Create high-level information security management plan

you have been asked to create a high-level information security management plan to be presented to the senior management of your latest client.

  Determine the mean value

Referring to the previous exercise, use the result of Part (a) along with the fact that a carton contains 12 eggs to determine the mean value

  Expansionary monetary policy

Discuss some the difficulties associated with expansionary monetary policy?

  Customer relationship management systems

A variety of Customer Relationship Management (CRM) systems have integrated social media networks in an effort to be competitive.

  Challenges facing mobile payment systems

What are the challenges facing mobile payment systems like that from Technology, Business, and Users prospectives?

  Scope of practice of all healthcare professionals

"Protection of patient data has become a critical part of the scope of practice of all healthcare professionals.

  Company business continuity plan training plans

Describe the company's Business Continuity Plan (BCP) staffing plans. Describe the company's Business Continuity Plan (BCP) training plans.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd