Reference no: EM133544383 , Length: word count:3000
Business Intelligence
Learning objective 1: Analyse and apply strategies processes and underlying technologies for effective management of data to make evidence based decisions;
Learning objective 2: critically analyse organisational and societal problems using descriptive and predictive analysis and internal and external data sources to generate insight, create value and support evidence based decision making;
Learning objective 3: examine legal, ethical and privacy dilemmas that arise from the use of business intelligence, analytics and evidence based decisions making to comply with legal and regulatory requirements;
Learning objective 4: communicate effectively in a clear and concise manner in written report style for both senior and middle management with correct and appropriate acknowledgment of the main ideas presented and discussed.
Task 1 Predictive Analytics Case Study
The goal of the Predictive Analytics Case Study is to predict whether a patient is likely to have a stroke or not (see Table 1 Data Dictionary for stroke-data.csv data set below). In completing Task 1 you will apply business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Task 1 and four sub tasks.
Table 1 Data dictionary for stroke-data.csv
Variable Name
|
Description
|
Data Type
|
id
|
unique identifier
|
Numeric
|
gender
|
gender of the patient
|
Categorical
"Male", "Female" or "Other"
|
age
|
age of the patient
|
Numeric
|
hypertension
|
patient has hypertension
|
Binary
0 = No = the patient does not have hypertension
1 = Yes = the patient has hypertension
|
heart_disease
|
patient has heart disease
|
Binary
0 = No = the patient does not have heart disease
1 = Yes = the patient has heart disease
|
Exploratory data analysis and data preparation
Conduct an exploratory data analysis and data preparation of stroke-data.csv data set using RapidMiner to understand the characteristics of each variable and relationship of each variable to other variables. Summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the stroke- data.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc. and relationships with other variables, transformation of existing variables, creation of new variables in a table named Table 1.1 Results of Exploratory Data Analysis and Data Preparation.
Hint: Statistics Tab and Chart Tab in RapidMiner provide a lot of descriptive statistical information and useful charts like Barcharts, Scatterplots required for Task 1.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 1.1 which variables contribute most to predicting whether a patient is likely to have a stroke or not. You could also consider transforming some variables and creating new variables and converting target/label variable into a binominal variable to facilitate analysis in Tasks 1.2, 1.3 and 1.4.
Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict whether a patient is likely to have a stroke or not (500 words).
Decision Tree Model
Build a Decision Tree model for predicting whether a patient is likely to have a stroke or not, on the stroke-data.csv data set using RapidMiner and a set of data mining operators in part determined by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram and (3) Decision tree rules. Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether a patient is likely to have a stroke or not based on key contributing variables and relevant supporting literature on interpretation of decision trees (150 words).
Logistic Regression Model
Build a Logistic Regression model for predicting whether a patient is likely to have a stroke or not using RapidMiner and an appropriate set of data mining operators and stroke-data.csv data set determined in part by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Logistic Regression Model process (2) Key outputs from Logistic Regression Model.
Hint for Task 1.3 Logistic Regression Model you may need to change data types of some variables. Briefly explain your final Logistic Regression Model Process and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Standardised Coefficients, Odds Ratios, P Values etc) for predicting whether a patient is likely to have a stroke or not based on key contributing variables and relevant supporting literature on interpretation of logistic regression models (150 words).
Model Validation and Performance
You will need to validate your Final Decision Tree Model and Final Logistic Regression Model using the Cross-Validation Operator, Apply Model Operator and Performance Operator in your data mining processes. Discuss and compare the performance of the Final Decision Tree Model with the Final Logistic Regression Model for predicting whether a patient is likely to have a stroke or not based on key results of the confusion matrix presented in Table 1.4 Model Performance Metrics (Decision Tree vs Logistic Regression). Table 1.4 will compare the Final Decision Tree Model with the Final Logistic Regression Model using following model performance metrics - (1) accuracy (2) sensitivity (3) specificity and (4) F1 score (200 words).
Note 1: the important outputs from the data mining analyses conducted in RapidMiner for Task 1 must be included in your Report 3 to provide support for your conclusions reached regarding each analysis conducted for 1.1, 1.2, 1.3 and 1.4. Note you can export important outputs from RapidMiner as jpg image files and include these screenshots in the relevant Task 1 parts of your Assessment 3 Report.
Note 2: you will find the North Textbook and RapidMiner Tutorials useful references for the data mining process activities conducted in Task 1 in relation to the exploratory data analysis and data preparation, decision tree analysis, logistic regression analysis and evaluation of the performance of the Final Decision Tree model and the Final Logistic Regression model. These concepts are covered in Module RapidMiner Practicals and Chapters 3, 4, 9, 10 and 13 of North Textbook and RapidMiner Tutorials contained within RapidMiner.
Research and critically review the study materials and other relevant literature to provide a suitable written response to each of the following tasks 2, 3 and 4 supported with an appropriate level of in-text referencing:
Task 2 Customer Relationship Management Analytics (500 words)
Explain why customer relationship management (CRM) analytics is such an important activity for business (250 words)
Choose and describe a widely used application of customer relationship management (CRM) analytics and explain how the impact of CRM analytics can be measured in this application area (250 words)
Task 3 Visual Analytics Technologies (500 words)
Explain why visual analytics is such an important concept in business intelligence, and illustrate your answer with a real-world application of visual analytics (250 words).
Discuss the key technology building blocks of visual analytics in the context of the same real-world application described in Task 3.1 (250 words).
Task 4 Automated Driving of Road Vehicles - Transforming Work and Ethical Considerations of AI Technologies (1000 words)
Identify and discuss the AI technologies used in automated driving of road vehicles (500 words).
Identify and discuss the ethical implications for transportation companies regarding (1) privacy, (2) transparency, (3) bias and discrimination, and (4) governance and accountability when using AI technologies in automated driving of road vehicles to replace human drivers for goods delivery ( 500 words).
Report Quality: structure presentation writing and referencing
Structure and presentation: Cover page, table of contents, page numbers, headings, subheadings, tables and diagrams, use of formatting, spacing, paragraphs.
Writing quality: Use of English, report written in a clear and concise manner for an intended management audience (Correct use of language and grammar. Also, is there evidence of spelling-checking and proofreading?)
Quality of research evident by correct and appropriate use of referencing: Appropriate level of referencing in text, reference list provided, used Harvard Referencing Style correctly.
Report 3 must be structured as follows:
Report 3 Cover page Table of Contents
Task 1 Heading - Sub headings for Tasks 1.1, 1.2, 1.3 and 1.4 Task 2 Heading - Sub headings for Task 2.1 and 2.2
List of References List of Appendices.
Attachment:- stroke-data.rar