Run the naïve bayes classifier in weka on the data

Assignment Help Other Subject
Reference no: EM131875377

Questions -

Q1. The following dataset is created based on the fraud detection data discussed in class. An extra record (the last one) is added to the dataset. Also added is another predictor, AccountAge, which has three categories, <10, 10~30 and >30, referring to the number of days the account created. Using the Naïve Bayes method, calculate by hand the probabilities of the last record being truthful or fraudulent. Does the Naïve Bayes correctly classify this new record? Use all of the 11 records in your calculation. Show calculation steps similar to those in the Naïve Bayes lecture notes.

Transaction Time

Transaction Amount

Account Age

Class

night

small

>30

truthful

day

small

10~30

truthful

day

large

<10

truthful

day

large

>30

truthful

day

small

<10

truthful

day

small

>30

truthful

night

small

<10

fraudulent

night

large

10~30

fraudulent

day

large

>30

fraudulent

night

large

10~30

fraudulent

day

small

10~30

fraudulent

Q2. Download the data file CongressVote.arff. Open it with Notepad or WordPad and read the information about the data. Our task is to classify each record (i.e., a House member) to either a democrat or a republican based on his/her voting records. Note that this dataset has many missing values, labeled by '?'.

a. Run the Naïve Bayes classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate? Show the output screen with the error rate and confusion matrix.

b. Run the k-nearest neighbor classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate when k = 5? With all attributes categorical, how can the distances between records be measured? Explain this question using the following three records (which are records 27, 28 and 29 of the dataset). Which of the two records are closer to each other? Why?

y,n,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,y,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,n,n,y,y,n,y,y,y,n,n,y,y,y,n,y,republican

Q3. Download the BostonHousing2.xls file and read the data description. The dataset in the FullData sheet is taken from the BostonHousing.xls file used in Assignment 1. The target attribute is CATMEDV, which is a binary attribute converted from MEDV (which was removed).

a. Consider the data in the SmallData sheet, which includes the first 10 records of the full data and a subset of the original predictors. Calculate in Excel to classify record 6 (row 7, highlighted), using 1-NN and 3-NN respectively, based on the other 9 records. Show your results with Excel in a format similar to the screenshot on page 2 of the Nearest Neighbors lecture notes. Do 1-NN and 3-NN classify the record correctly?

b. Now, work on the FullData sheet. Within Excel, save the FullData sheet as a CSV file. Run k-NN in Weka on the CSV data file using the default parameters (10-fold cross-validation, k = 1). Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

c. Run the C4.5 (J48) decision tree algorithm in Weka on the CSV data file created for Part (b) above. Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

d. Which technique do you believe is better, k-NN or decision trees? Why? Please consider factors other than the error rates, which are about the same for the two techniques. (This is an open-ended question. It is more important to justify your choice than the choice itself.)

e. Now, back to the small data set with 10 records again. Save the data as a CSV file. Write and run R commands to classify record 6 (row 7), using 1-NN and 3-NN respectively, based on the other 9 records. Show the R commands and results (similar to those in the Nearest Neighbors lecture notes for the Admission example).

Attachment:- Assignment Files.rar

Reference no: EM131875377

Questions Cloud

What about mc donald in other countries : Given your answers to question 1, can you see exceptions or do you believe your answer applies to all restaurants? What about Mc Donald's in other countries?
Why stock price may exhibit head and shoulder pattern : Use the Fibonacci sequence to explain why a stock price may exhibit a head and shoulder pattern.
What is the nominal pre-tax terminal value in three years : What is the real pre-tax terminal value in three years? What is the nominal pre-tax terminal value in three years?
Explain the component involved in making ethical decisions : Give a definition of ethical business behavior, explain the component involved in making ethical decisions, and give and example from your personal experience.
Run the naïve bayes classifier in weka on the data : Run the Naïve Bayes classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate
Describe the discomfort you felt for incongruity : Describe the discomfort you felt when you could not resolve this incongruity. What did you do to resolve it? How did you feel once it was resolved?
Jim to spend on monthly mortgage payment : Given the back end DTI constraint, what is the most they will allow Jim to spend on a monthly mortgage payment?
What is the firm cost of debt : A firm has a weighted average cost of capital of 9.6 percent and a cost of equity of 14.5 percent. What is the firm's cost of debt?
Examining the gfi production budget : GFI manufacturers ping pong tables and has a JIT policy that ending inventory must equal 10 percent of the next month's sales.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd