Run the naïve bayes classifier in weka on the data

Assignment Help Other Subject
Reference no: EM131875377

Questions -

Q1. The following dataset is created based on the fraud detection data discussed in class. An extra record (the last one) is added to the dataset. Also added is another predictor, AccountAge, which has three categories, <10, 10~30 and >30, referring to the number of days the account created. Using the Naïve Bayes method, calculate by hand the probabilities of the last record being truthful or fraudulent. Does the Naïve Bayes correctly classify this new record? Use all of the 11 records in your calculation. Show calculation steps similar to those in the Naïve Bayes lecture notes.

Transaction Time

Transaction Amount

Account Age

Class

night

small

>30

truthful

day

small

10~30

truthful

day

large

<10

truthful

day

large

>30

truthful

day

small

<10

truthful

day

small

>30

truthful

night

small

<10

fraudulent

night

large

10~30

fraudulent

day

large

>30

fraudulent

night

large

10~30

fraudulent

day

small

10~30

fraudulent

Q2. Download the data file CongressVote.arff. Open it with Notepad or WordPad and read the information about the data. Our task is to classify each record (i.e., a House member) to either a democrat or a republican based on his/her voting records. Note that this dataset has many missing values, labeled by '?'.

a. Run the Naïve Bayes classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate? Show the output screen with the error rate and confusion matrix.

b. Run the k-nearest neighbor classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate when k = 5? With all attributes categorical, how can the distances between records be measured? Explain this question using the following three records (which are records 27, 28 and 29 of the dataset). Which of the two records are closer to each other? Why?

y,n,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,y,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,n,n,y,y,n,y,y,y,n,n,y,y,y,n,y,republican

Q3. Download the BostonHousing2.xls file and read the data description. The dataset in the FullData sheet is taken from the BostonHousing.xls file used in Assignment 1. The target attribute is CATMEDV, which is a binary attribute converted from MEDV (which was removed).

a. Consider the data in the SmallData sheet, which includes the first 10 records of the full data and a subset of the original predictors. Calculate in Excel to classify record 6 (row 7, highlighted), using 1-NN and 3-NN respectively, based on the other 9 records. Show your results with Excel in a format similar to the screenshot on page 2 of the Nearest Neighbors lecture notes. Do 1-NN and 3-NN classify the record correctly?

b. Now, work on the FullData sheet. Within Excel, save the FullData sheet as a CSV file. Run k-NN in Weka on the CSV data file using the default parameters (10-fold cross-validation, k = 1). Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

c. Run the C4.5 (J48) decision tree algorithm in Weka on the CSV data file created for Part (b) above. Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

d. Which technique do you believe is better, k-NN or decision trees? Why? Please consider factors other than the error rates, which are about the same for the two techniques. (This is an open-ended question. It is more important to justify your choice than the choice itself.)

e. Now, back to the small data set with 10 records again. Save the data as a CSV file. Write and run R commands to classify record 6 (row 7), using 1-NN and 3-NN respectively, based on the other 9 records. Show the R commands and results (similar to those in the Nearest Neighbors lecture notes for the Admission example).

Attachment:- Assignment Files.rar

Reference no: EM131875377

Questions Cloud

What about mc donald in other countries : Given your answers to question 1, can you see exceptions or do you believe your answer applies to all restaurants? What about Mc Donald's in other countries?
Why stock price may exhibit head and shoulder pattern : Use the Fibonacci sequence to explain why a stock price may exhibit a head and shoulder pattern.
What is the nominal pre-tax terminal value in three years : What is the real pre-tax terminal value in three years? What is the nominal pre-tax terminal value in three years?
Explain the component involved in making ethical decisions : Give a definition of ethical business behavior, explain the component involved in making ethical decisions, and give and example from your personal experience.
Run the naïve bayes classifier in weka on the data : Run the Naïve Bayes classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate
Describe the discomfort you felt for incongruity : Describe the discomfort you felt when you could not resolve this incongruity. What did you do to resolve it? How did you feel once it was resolved?
Jim to spend on monthly mortgage payment : Given the back end DTI constraint, what is the most they will allow Jim to spend on a monthly mortgage payment?
What is the firm cost of debt : A firm has a weighted average cost of capital of 9.6 percent and a cost of equity of 14.5 percent. What is the firm's cost of debt?
Examining the gfi production budget : GFI manufacturers ping pong tables and has a JIT policy that ending inventory must equal 10 percent of the next month's sales.

Reviews

Write a Review

Other Subject Questions & Answers

  Discuss how the aspects of differentiation and positioning

Create a slogan for the business, less than 20 words, which captures the essence of the brand. Explain your rationale for the slogan. Then show how it should be incorporated within your brand image development and integrated marketing communicatio..

  Describe the preservation and collection of the firearms

You are on a team of crime scene investigators. Your team was instructed to collect the physical evidence at a crime scene. Arriving at the crime scene your team observes the following: Shell casings.Three sets of footprints (two muddy sets and on..

  Personality theory and impression management

Describe the Sufficient relationship between implicit personality theory and impression management Accurately connects implicit personality theory to impression management with a sufficient level of detail

  Scheduling and sequencing

Scheduling and sequencing are typically viewed from a technical perspective; that is, they are focused on minimizing quantitative measures such as lateness or cost. However, schedules also have intangible effects on customers, employees, and the perc..

  What are they like in terms of personality and goals

The target audience: who are they? What are they like in terms of personality, goals, and so on? What television programs do they watch? How does this audience shape the program both technically and narratively

  Equilibrium price and quantity of lobsters

Suppose that the supply schedule of Maine lobsters is as follows: Draw the demand curve and the supply curve for Maine lobsters. What is the equilibrium price and quantity of lobsters?

  What are the fundamental steps in program development

what are the fundamental steps in program development? Discuss program testing and debugging in detail

  Discuss the selected social issues

Select one of the social issues discussed in this course, or another social issue (for examples, hunger or food insecurity, poverty or economic security.

  The strategic marketing process

The strategic marketing process, How do companies mobilize plans into actions?

  Problems of statistic computer project

Determine a 98% confidence for the difference for the meanprice4-bedroom houses and 3-bedroom houses.

  Graphic literacy

While there certainly have been graphics used since prehistoric times, 50, 25, even 10 years ago, they were relatively stable in comparison with today's instantaneous publication via electronic billboards.

  Measures and scales in quantitative research

Measures and Scales in Quantitative Research .Distinguish between Correlation and Regression.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd