Reference no: EM133228055
Part 2: Exploratory analysis and classification
Background
You are required to engage in exploratory analysis and classification in relation to the dataset Diabetes (CSV 24 KB)
compiled by the US National Institute of Diabetes and Digestive and Kidney Disease.
The object of the dataset is to diagnostically predict whether a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old and of Pima Indian heritage.
The dataset consists of several medical predictor variables and one target variable, Outcome. Predictor variables include the number of pregnancies the patient has had, her BMI, insulin level, age, and so on.
Step 1:
Complete the following
- Using the KNIME platform examine Summary Statistics.
- Build a Decision Tree Workflow in KNIME.
- Make a validation set: Split you dataset into two parts-'Train' and 'Test'.
- Train and build a Decision Tree Classification model for your dataset.
- Evaluate the Performance of your Decision Tree Model using the Confusion Matrix and Determine Accuracy rate.
Step 2:
Having completed tasks 1-5 above, make a report built on your analysis and classification. The report must be completed in a Word document. The report must contain the following:
- The summary statistics of your dataset, including:
- Validation: The Confusion Matrix results for your Train decision tree model and its interpretation.
- A list of rules and their explanations (e.g. if condition1 and condition2 and condition3 then outcome).
- The KNIME Workflows file for your project.