Reference no: EM133746801
Assessment
You may discuss your answers with others in your group, but your Orange implementation file and report must be prepared and submitted individually.
Decision Tree Training
Using the Orange Data Mining software, implement a decision tree to predict whether Sara will go sailing given weather, company, and boat size. Import the dataset named ‘Sailing' from Orange and specify the features and prediction target. Which column(s) correspond to features and which column(s) to the prediction target? Perform stratified, replicable train-test splitting on the dataset using the split ratio specified by your class facilitator. How many examples are there in the resulting training and test sets? Now, train a binary decision tree, with at least 2 instances in leaves, not splitting subsets smaller than 2, limiting the maximal tree depth to 3, and stopping splitting when the majority class reaches 90%. Visualize the resulting tree model as a tree graph.
Prediction and Testing
Using the tree graph, manually predict whether Sara will go sailing for the weather, company and boat size given in the first two test examples. Construct a confusion matrix for your tree model on the test set. Calculate the classification accuracy, precision, recall, and F1 score for each outcome. You may check your answers against Orange's Test and Score output.
Logistic Regression
Using the Orange Data Mining software, build a workflow to classify Iris flowers given petal length and width, and sepal length and width. Import the dataset named ‘Iris' from Orange and specify the features and prediction target. Prepare a scatter plot of petal width against petal length. Roughly speaking, how do the probabilities of different Iris species given petal width and petal length change as the two measurements change? Now, perform a stratified, replicable train-test splitting on the data using the split ratio specified by your class facilitator. How many examples are there in the training and testing sets? Train a logistic regression model with L2 regularization (using regularization strength of
Question 1). Write down the resulting logistic regression model equations for the probabilities of Iris Setosa, Iris Virginica and Iris Versicolor given the flower measurements. Hint: the probability of Setosa is given by
Prediction and Testing
Using the logistic regression model equations from 1.3, compute the probabilities of Iris Setosa, Iris Virginica and Iris Versicolor for the measurements given in the first four test examples. Given the probabilities, what is the logistic regression model's prediction for each example?
After Class
The following problems are to be solved individually after class and submitted as part of the report by the assessment submission deadline.
Confusion Matrix
Table 1 lists test examples used for a Wine Type classification task, and predictions made by decision tree and logistic regression models. Complete the confusion matrices for the two models by hand. The use of the Orange Data Mining software is not required.
Table 1. Test dataset used for a Wine Type classification task, and predictions made by decision tree and logistic regression models.
Predicted Predicted
Decision Tree Logistic Regression
Classification Accuracy, Precision and Recall (6 marks)
Calculate the classification accuracy for the decision tree and logistic regression models used in
Additionally, calculate the precision, recall and F1 score per wine type for each model. Show all working. Suppose the decision tree predicts that a given wine is of Type 2. How likely (in %) is this wine actually to be Type 1? Also, what % of Type 3 wines can the logistic regression model correctly classify as Type 3?
Report
In no more than 1000 words, and in no more than 2 pages, summarize your work, answers, and other findings from Sections 1-2 above. Include screen shots, tables, and other figures to help illustrate your understanding of the prediction workflow and machine learning concepts.