Explain the meaning of the probability distributions

Assignment Help Other Subject
Reference no: EM132313022

Machine Learning Laboratory

Question 1: Classiftcation, Decision Trees, Na¨ive Bayes, k-NN, Weka

Consider the dataset postoperative-patient-data_simplified.arff available on moodle. This dataset contains health-status attributes of post-operative patients in a hospital, with the target class being whether the patients should be discharged (S) or remain in the hospital (A). Additional documentation regarding these attributes appears in the arff file.

1. Before you run the classifters, use the weka visualization tool to analyze the data, and report briefly on the types of the different variables and on the variables that appear to be important.

2. Run J48 (=C4.5, decision tree), Na¨ive Bayes and IBk (k-NN) to learn a model that predicts whether a patient should be discharged. Perform 10-fold cross validation, and analyze the results obtained by these algorithms as follows.

Note: Click on the "Choose" bar to select relevant parameters. Explanations of parameters you should try appear below. You should report on performance of at least two varia- tions of the operational parameters, e.g., minNumObj and unpruned for J48, and KNN and distanceWeighting for k-NN (the parameters debug and saveInstanceData are not operational). J48
• binarySplits: whether you use binary splits on nominal attributes when building the trees.
• minNumObj: the minimum number of instances per leaf.
• unpruned: whether pruning is performed (try TRUE and FALSE).
• debug: if set to TRUE, the classifier may output additional information.
• saveInstanceData: whether to save the training data for visualization.
Na¨ive Bayes (parameter variations are not relevant to this lab)
k- NN (IBk) (under lazy in weka)
• KNN: the number of neighbours to use.
• crossValidate: whether leave-one-out X-validation will be used to select the best k value between 1 and the value specified in the KNN parameter.
• distanceWeighting: specifies the distance weighting method used (when k > 1).
• debug: if set to TRUE, the classifier may output additional information. (a) J48 (=C4.5)
i. Examine the decision tree and indicate which are the main variables.
ii. What is the accuracy of the decision tree? Explain the results in the confusion matrix.

(b) Na¨ive Bayes

i. Explain the meaning of the "probability distributions" in the output, illustrating it with reference to the BP STBL attribute.

ii. Calculate (by hand) the probability that a person with the following attribute values would be discharged.
L-CORE = mid L-SURF = low L-O2 = good L-BP = high
SURF-STBL = stable CORE-STBL = stable BP-STBL = mod-stable

iii. What is the probability that a person with these attributes will remain in hospital and that s/he will be discharged? What would the Na¨ive Bayes classifier predict for this person?

iv. What is the accuracy of the Na¨ive Bayes classifier? Explain the results in the confusion matrix.

(c) k-NN
i. Find three instances in the dataset that are similar to the above patient, and use the Jaccard coefficient to calculate (by hand) the predicted outcome for this patient. Show your calculations.
ii. What is the accuracy of the k-NN classifier for different values of k? Explain the results in the confusion matrix.

3. Draw a table to compare the performance of J48, Na¨ive Bayes and IBk using the summary measures produced by weka. Which algorithm does better? Explain in terms of weka's summary measures. Can you speculate why?

Question 2: Classiftcation, Decision Trees, Na¨ive Bayes, k-NN, Weka

Consider the dataset tic-tac-toe.arff available on moodle. Each example in this dataset rep- resents a different game of tic-tac-toe, where the player writing crosses ("x") has the first move. Only those games that don't end in a draw are included, with the positive class representing the case where the first player wins and the negative class the case where the first player loses. The features encode the status of the game at the end, so each square contains a cross "x", a nought "o" or a blank "b".

1. Before you run the classifters, use the weka visualization tool to analyze the data.

(a) Which attributes seem to be the most predictive of winning or losing? (hint: if you were the "x" player, where would you put your first cross and why?)

(b) What can you infer about the advantage (or otherwise) of being the first player?

2. Run J48 (=C4.5, decision tree), Na¨ive Bayes and IBk (=k-NN) to learn a model that predicts whether the "x" player will win. Perform 10-fold cross validation, and analyze the results obtained by these algorithms as follows.

Note: When using IBk, click on the "Choose" bar to set the value of k (default is 1). Consider different values of k.

(a) J48 (=C4.5)

i. Examine the decision tree and indicate the main variables.

ii. Trace the decision tree for the following game. What would it predict?

289_figure.jpg

iii. What is the first split in the decision tree? Calculate (by hand) the Information Gain obtained from the first split in the tree. Show your calculations.

iv. What is the accuracy of the decision tree? Explain the results in the confusion matrix.

(b) Na¨ive Bayes

i. Calculate (by hand) the predicted probability of a win for the following game. Show your calculations.

289_figure.jpg

ii. What is the probability that a player with this configuration will win? What would the Na¨ive Bayes classifier predict for this game?

iii. What is the accuracy of the Na¨ive Bayes classifier? Explain the results in the confusion matrix.

(c) k-NN

i. Find three instances in the dataset that are similar to the following game, and use the Jaccard coefficient to calculate (by hand) the predicted outcome for this game. Show your calculations.

289_figure.jpg

ii. What is the accuracy of the k-NN classifier? Explain the results in the confusion matrix.

3. Draw a table to compare the performance of J48, Na¨ive Bayes and IBk using the summary measures produced by weka. Which algorithm does better? Explain in terms of weka's summary measures. Can you speculate why?

Question 3: Regression

Consider the dataset abs.arff available on moodle. This dataset contains continuous-valued eco- nomic attributes of a country, with the target variable being the unemployment rate. Additional documentation regarding these attributes appears in the arff file.

1. Perform a linear regression (under functions in weka) to learn a linear model of the unemploy- ment rate as a function of the other variables. You can use the default parameters given in weka. What is the resultant regression function?

2. Using the resultant regression function, calculate by hand the Absolute Error for the year 1986.

3. Calculate the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) obtained by the regression function (you can use the excel spreadsheet provided on moodle). How is MAE different from RMSE? (do these functions emphasize different aspects of performance?)

Reference no: EM132313022

Questions Cloud

Federal restrictions on mortgage lending practices : Can you explain the Keynesian view on enacting tighter federal restrictions on mortgage lending practices including requiring higher down
What new event or attraction would be suitable : HOS804 Visitor Behaviour and Management Assignment - Report, International College of Management, Australia. What new event or attraction would be suitable
Sense of purpose and time and space : There are five important areas that leaders should focus on. They are: Communication, Innovation, Sense of purpose and Time and space.
Commonwealth handling equipment pool : CHEP which stands for Commonwealth Handling Equipment Pool is a platform that improves supply chain by enabling profitability
Explain the meaning of the probability distributions : FIT5047 - Intelligent systems - Monash University - Machine Learning Laboratory - analyze the data, and report briefly on the types of the different variables
Variation from the national economic situation : Are there any unique characteristics of Georgia's state economy that would explain its variation from the national economic situation?
Contribution before the price and cost changes : Their current volume is 1,000 units. Fixed costs are $1,400. What is unit (per shoe) contribution BEFORE the price and cost changes?
Website is a very informative media : The Triple Pundit website is a very informative media hub when it comes to business and the triple bottom line approach along with informing
Bureau of economic analysis website : The Bureau of Economic Analysis (BEA) website includes quarterly data from the last few years of the National Accounts.

Reviews

len2313022

5/28/2019 3:44:17 AM

This is Intelligent systems assignment worth 10%. I am attaching the file for your review. I want it to be done as per requirements with right answers. Submission instructions: 1. You are allowed to do the lab with one friend. In this case, include his/her name and yours at the top of the submission – MAKE ONLY ONE SUBMISSION FOR BOTH OF YOU. 2. At the end of the lab, upload your answers to Question 1 to moodle in a zip file named MLlab- StudentID.zip, where StudentID is your Student ID number. 3. Upload your report to moodle by midnight of the second day after the completion of your lab at the latest. For example, if your lab is on Wednesday, you should upload your report by Friday 11:59 pm; if your lab is on Friday, you should upload your report by Sunday 11:59 pm at the latest. Important: You may be interviewed about your work in order to determine your mark for this lab. Late submission policy: 10% of the maximum mark will be deducted for every day a submission is late.

Write a Review

Other Subject Questions & Answers

  What would have to change to make you more suited

Is your personality more suited to a position in the front of the house, the back of the house, or the office Explain your answer. Describe three responsibilities associated with your chosen position. What would have to change to make you more sui..

  Discuss criminal activities generate illicit proceeds

Organized crime large-scale criminal activities generate illicit proceeds that must be disguised as lawful

  Discuss the value of health information technology systems

Discuss the value of health information technology systems in helping the Kingdom of Saudi Arabia solve its major health issues.

  Explain what intelligence testing accomplishes

Provide a brief overview of at least two theories of intelligence, Explain what intelligence testing accomplishes, Discuss methods of measuring intellectual functioning and Discuss how the concept of intelligence is different from academic achievemen..

  Explain how the department might ensure greater security

Explain why the health department collects this information conveying the idea of how it serves the greater good.

  Find whether afterschool depot should include wholesalers

Determine whether Afterschool Depot should include wholesalers as channel members or sell directly to retailers. Prepare an instructional explanation that will help Beth understand the role of vertical integration in channel design and present your..

  What about college students who cheat

What about college students who cheat? How important is it to turn in another student who is cheating? Using the positions in our reading for the text, make a that whistleblowing on cheating students is either good or bad for the university.

  What does the outcome say about your county

What does the outcome say about your county? Did your county turnout support or not support your political culture hypothesis? Why or why not? Where you surprised by the support or rejection of a measures? Why

  What are the ties between the brain and memory

My topic for this research paper is memory. what are the ties between the brain and memory? What do Dr. Loretta Loftus' studies suggest about memory

  Discuss and compare the two methods used by economists to

one of the key determinants of the demand for education is the rate of return to schooling compared to other

  What is role conflict and role strain

Many businesses and corporations are organized with a structure. The military also uses a structure such as “ranks” and people strive to work their way through the ranks. With each promotion or rank increase, there is a different status and role a..

  Discuss the factors that affect health behavior

Discuss the factors that affect health behavior. For each factor, include an example that you have seen demonstrated. Provide an example (or examples) of this model in action. Briefly describe what can be improved upon within the example(s) you prov..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd