Reference no: EM132594234
Learning objective 1: apply knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehousing and big data architecture, data mining process, data visualisation and performance management) and resulting organisational change and understand how these apply to the implementation of business intelligence in organisation systems and business processes
Learning objective 2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence-based decision making and sustainable business performance management can effectively address real- world problems
Learning objective 3. comprehend and address complex ethical dilemmas that arise from evidence-based decision making and business performance management
Learning objective 4. communicate effectively in a clear and concise manner in written report style for senior management with the correct and appropriate acknowledgment of the main ideas presented and discussed.
Task 1
The goal of Task 1 is to predict the likelihood of customer churn for a telecommunication company based on a number of different categories of variables. Customers who left within the last month column called Churn; Services that each customer has signed up for - phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies; Customer account information - how long they've been a customer, contract, payment method, paperless billing, monthly charges, and total charges; and Demographic information about customers - gender, age range, and if they have partners and dependents. Full details of Telco.csv data are provided in Table 1 Data Dictionary. In completing Task 1 you will apply the business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process.
Task 1.1 Conduct an exploratory data analysis and data preparation of Telco.csv data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set. Summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each of the variables in the Telco.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc and relationships with other variables, transformation of existing variables, creation of new variables in a table named Task 1.1 Results of Exploratory Data Analysis and Data Preparation.
Task 1.2 Build a Decision Tree model for predicting whether a customer is likely to churn using RapidMiner and a set of data mining operators and a reduced Telco.csv data set in part determined by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) Decision tree rules.
Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether a customer is likely to churn based on the key contributing variables in terms important services, account and demographics variables and relevant supporting literature on interpretation of decision trees (150 words).
Task 1.3 Build a Logistic Regression model for predicting whether a customer is likely to churn using RapidMiner and an appropriate set of data mining operators and a reduced Telco.csv data set determined in part by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Logistic Regression Model process and (2) Coefficients, and (3) Odds Ratios. Hint for Task 1.3 Logistic Regression Model you may need to change data types of some variables.
Briefly explain your final Logistic Regression Model Process, and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Odds Ratios) for predicting whether a customer is likely to churn based on key contributing variables in terms of important services, account and demographics variables and relevant supporting literature on interpretation of logistic regression models (150 words).
Task 1.4 You will need to validate your Final Decision Tree Model and Final Logistic Regression Model using the Cross-Validation Operator, Apply Model Operator and Performance Operator in your data mining processes.
Discuss and compare the performance of the Final Decision Tree Model with the Final Logistic Regression Model for predicting whether a customer is likely to churn based the results of the confusion matrix, and ROC charts for each final model. You should use a table here to compare the key results of the confusion matrix for the Final Decision Tree Model and Final Logistic Regression Model using the model performance metrics - (1) accuracy (2) sensitivity (3) specificity and (4) F1 score (200 words).
Task 2 Data Assets and Governance
Research and critically review the relevant literature to determine how organisations can place a value on their data assets and maintain governance over their data assets in order to provide a suitable written response to each of the following sub-tasks supported with an appropriate level of in-text referencing:
Task 2.1 Describe how organisations can place a valuation on their data assets using an infonomics approach. In your answer make sure you provide a comprehensive definition of infonomics and identify and explain two different approaches that organisations could use for placing a valuation on data assets
Task 2.2 Choose a large organisation within Australia that is publicly listed on Australia Stockmarket and that you believe is already actively engaged in the Information Age.
Conduct desktop research by analysing their security and privacy policy statements available on their website and discuss how governance of security and privacy of data is addressed in this organisation drawing on the nine core principles of the Australian Data Governance Draft Code of Practice :
1. No-harm rule
2. Honesty & transparency
3. Fairness
4. Choice
5. Accuracy and access
6. Accountability
7. Stewardship
8. Security
9. Enforcement
to guide your analysis and discussion
Attachment:- Specifications.rar