Reference no: EM133163320
ITEC325 Applied Data Mining and Big Data - Australian Catholic University
Assessment Artefact: RapidMiner File
The primary purpose of this assessment is to provide students with an opportunity to develop data mining skills for finding human-interpretable patterns that describe the data analysis skills.
Context
Heart disease is one of the leading causes of death for people of most races in the world. According to the CDC, about half of all Americans (47%) have at least 1 of 3 key risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetic status, obesity (high BMI), not getting enough physical activity or drinking too much alcohol. Detecting and preventing the factors that have the greatest impact on heart disease is very important in healthcare.
Instructions
Task 0 Download the data set from LEO.
Task 1 Conduct an exploratory data analysis of the data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set. Summarise the findings of your exploratory data analysis in terms of describing key characteristics of each of the variables in the data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships with other variables if relevant in a table.
Hint: Statistics Tab and Chart Tab in RapidMiner provide a lot of descriptive statistical information and useful charts like Bar charts, Scatterplots etc. You might also like to look at running some correlations and chi square tests. Indicate in Task 1 Table which variables are contributing the most to determining the risk rating of heart disease.
Briefly discuss the key results of your exploratory data analysis and the justification for selecting your five top variables for predicting the risk of heart disease based on the results of your exploratory data analysis and a review of the relevant literature about assessing the risk of heart disease (About 250 words)
Task 2 Build and evaluate two predictive models for determining the risk rating of heart disease using appropriate data mining models in RapidMiner using two appropriate data mining methods you learned in this unit.
Briefly explain your predictive model process, justify your choice of the data mining method, and discuss the results of predictive model drawing on the key outputs. This discussion should be based on the contribution of each of the top five variables to the Final Decision Tree Model and relevant supporting literature (at least 3 credible sources) on the interpretation of the selected data mining models (About 250 words).
Task 3 Discuss and compare the accuracy of the two data mining models (methods). Use a table here to compare the key results of the confusion matrix (About 250 words).
Note the important outputs from your data mining analyses conducted in RapidMiner should be included in your Assignment 3 report to provide support for your conclusions regarding each analysis conducted. Export the important outputs from RapidMiner as jpg image files and insert these screenshots in the relevant parts of your Assignment 3 Report.
Task 4 Based on relevant supporting literature (at least 3 credible sources), briefly discuss the ethical perspectives in data mining and identify the possible ethical issues in the context of this case study (250 words).
Task 5 Use Zoom to record a short video presentation (4-5 minutes). In your presentation turn on your webcam (to include your face in the presentation) and share your screen to show your predictive process/models in RapidMiner. Briefly explain the steps you have followed to create the Rapid Miner processes, run the process, and present the results.
Attachment:- Applied Data Mining.rar