Which model should be selected and why

Assignment Help Other Subject

Reference no: EM132506583

Question 1. Pete, owner of Pistol Pete's Diamond Emporium, is investing in a diamond classification system due to his deteriorating eyesight. Pete buys and sells diamonds of varying quality: Low ($1,000-$3,000), Medium ($4,000-$7,000), and High ($8,000-$10,000). It is very important to Pete that his classifier properly classifieshis diamonds so that he can not only have a profitable business, but also, so that his customers willcontinue to trust him as a business owner.

Using the possible cost matrix values given below, fill out the cost matrix that most accurately reflects Pete's needs for his diamond classifier model. After completing the cost matrix, justify your proposed cost matrix.

Possible cost matrix values: -1, -1, 0, 20, 20, 20, 20, 100, 100

Actual class		Predicted class
		High	Medium	Low
	High
	Medium
	Low

Question 2. Weka recently added the fictitious Super Happy Terrific Classifier (SHTC) algorithm to its suite of available classifiers and you would like to use it in your analysis. Upon reading the SHTC documentation, you realize that it only accepts discrete attributes as input. However, many of the attributes in your data set are continuous. Can you still use the SHTC algorithm in your analysis? If yes, explain how. If no, explain why not.

Question 3. You have decided to use J48 as a classifier in Weka for your data set. After your analysis, you have found that the accuracy of J48 for your data set is greater than that of ZeroR, but less than the accuracy of OneR. Should you continue to use J48 as a classifier for your data set? Why or why not?

Question 4. You have performed an unsupervised k-means clustering on a data set with two attributes and the results indicate a k of 2. Later, you determine the class values for each data instance (there are four class values) and a supervised clustering results in a k of 4. Provide a possible explanation for why the two clustering methods disagree on a k value and a draw a sketch of the two clusterings to go along with your explanation.

Question 5. You are using a 3-nearest neighbor classifier with Euclidean distance as the metric. Determine the class value of the data point Q (7, 2, 6) using the known data points with associated class values, below. Recall the general form for calculating Euclidean distance is

d(p, q) = √Σ_i(p_i - q_i)²

P1 (-4, 9, 3), class value 1
P2 (8, -2, 1), class value 1
P3 (6, 1, 5), class value 0
P4 (10, 8, 4), class value 0
P5 (-1, 0, -1), class value 1

Question 6. Run the Nearest Neighbor classifier with a k-value of 7 and a Support Vector Machine with default values using 10-folds cross validation on the diabetes data set (diabetes.arff in Assignment 3 on myCourses) in Weka. Fill in the confusion matrices for the models in the tables below and use the cost matrix to compute the cost for each model. Based upon the cost, which model should be selected and why?

Nearest Neighbor (k=7) Confusion Matrix

	Tested Negative	Tested Positive
Tested Negative
Tested Positive

Support Vector Machine Confusion Matrix

	Tested Negative	Tested Positive
Tested Negative
Tested Positive

Cost Matrix

	Tested Negative	Tested Positive
Tested Negative	0	50
Tested Positive	100	-1

Attachment:- Final Exam.rar

Reference no: EM132506583

Questions Cloud

What are outsourcing and sweatshop labor : How do global wealth inequality and global poverty compare to wealth inequality and poverty in the U.S.?

Difference between three interest rates : List and explain the difference between these three interest rates:

What new insights have you obtained about at-risk children : What new insights have you obtained about at-risk children and families overall? What specific concepts have you learned that have left a more lasting.

Explain the basic economics of google : Googlenomics. Steven Levy wrote the following in a Wired magazine article on "Google-nomics" - the economics of Google:

Which model should be selected and why : Should you continue to use J48 as a classifier for your data set? Why or why not - Recall the general form for calculating Euclidean distance

How strategies you suggested engaging to diverse learners : Explain how the strategies you've suggested are engaging to diverse learners. Cite scholarly sources to support how your strategies are examples.

What happens to real interest rates : Draw a correctly labeled loanable funds graph that shows what happens to real interest rates for each of the following situations: (You will have 3 graphs)

What is the amount of gain belmont should recognize : 7000 cash in the exchange. The exchange lacks commercial substance. What is the amount of gain belmont should recognize on this exchange?

What is the cost of deuces ending inventory on july : What is the cost of Deuces ending inventory on July 31 using the LIFO periodic method? On july 1, Deuce company had an inventory of 300 gas grills cost

User Account

All Pages