Reference no: EM133727189
Business Intelligence
Assessment - Report
This assessment task sheet provides you with information about the requirements for your assessment. Detailed instructions and resources are included for completing the task. The Criterion Reference Assessment (CRA) Rubric that markers use to grade the assessment task is included.
Task overview
Learning outcome 1: Analyse and apply strategies and technologies for effective data management that supports evidence-based decisions.
Learning outcome 2: Research organisational and societal problems using descriptive, predictive and prescriptive analytics models drawing on both internal and external data sources to generate insight, create value and support evidence-based decision making.
Learning outcome 3: Communicate effectively in a clear and concise written manner for both senior and middle management with correct and appropriate acknowledgment of the main ideas presented and discussed.
Task description 1: You are required to prepare a report that undertakes three tasks to analyse data-driven decision-making and descriptive and predictive analytics. Task 1 examines Google's data-driven practices, focusing on meticulous data analysis for search algorithms and advertising strategies. Task 1.1 evaluates Google's data management strategies, while Task 1.2 explores the technologies used and assesses their overall contribution. Task 2 involves a detailed exploration of the Melbourne_housing.csv dataset, encompassing exploratory data analysis (Task 2.1) and building a Linear Regression model (Task 2.2) for predicting residential property prices. Finally, Task 3 focuses on predictive analytics, predicting the income levels (<= 50K or >50K) of a population. Task 3.1 involves exploratory data analysis and data preparation, and Task 3.2 constructs a Decision Tree model. Together, these tasks contribute to a comprehensive understanding of data analytics in diverse business scenarios.
Task details
Task 1 Case Study Analysis
In the business world, Google is a prime example of effective data-driven decision-making. The company relies heavily on data to enhance its search algorithms, advertising strategies, and user experience. Google's success in providing relevant search results, targeted advertisements, and personalized user experiences is attributed to its use of data. The company analyses vast amounts of data to understand user behaviour, preferences, and trends.
You are expected to read the following papers as a starting point to look for other relevant references to investigate data management strategies and technologies employed for data management at Google to support your investigation of Tasks 1.1 and 1.2:
Task 1.1 Investigate the data management strategies employed by Google and evaluate and discuss the effectiveness of these strategies in supporting evidence-based decisions. (10 marks 400 words)
Task 1.2 Investigate the technologies employed for data management at Google and assess and discuss their contribution to the overall effectiveness of data management. (10 marks 400 words)
Task 2 Exploratory Data Analysis and Linear Regression Analysis (40 Marks)
Carefully study Melbourne_housing.csv data set (See Appendix A Data Dictionary for Melbourne Housing Price Data Set) and accompanying description of each variable. Each record in the Melbourne_housing.csv data set contains twenty-one variables that determine Price (fifth variable). You should conduct some research to identify determinates/ drivers of the selling price of residential properties to fully understand and interpret the key findings of your exploratory data analysis (EDA) and Linear Regression Model for the Melbourne_housing.csv data set.
Task 2.1 Conduct and report on exploratory data analysis (EDA) of the Melbourne_housing.csv data set using Altair AI Studio data mining tool. (20 marks 800 words) (CLO2, CLO5)
You are required to provide the following:
a screen capture of your final EDA process, briefly describe your EDA process diagram.
summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for Melbourne_housing.csv. Table
2.1 should include key characteristics of each variable in Melbourne_housing.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.
Discuss the key results of the exploratory data analysis presented in Table 2.1 and provide a rationale for selecting the top 5 variables to predict the selling price of residential properties (Price). Focus on the relationships among independent variables, as well as their connections with the dependent variable (Price). Draw insights from the results of the EDA analysis and relevant literature on determinants affecting the selling price of residential properties.
Hint: Statistics Tab and Chart Tab in Altair AI Studio provide a lot of descriptive statistical information and the ability to create useful charts like Bar charts, Scatterplots, Boxplot charts etc. for EDA analysis. You might also like to look at running correlations and/or chi square tests as appropriate to determine which variables contribute most to predicting the selling price of residential properties (Price).
Task 2.2 Build and report on your Linear Regression model for predicting the selling price of residential properties (Price) using Altair AI Studio data mining process and appropriate set of data mining operators and a reduced set of variables from Melbourne_housing.csv data set. (20 marks 800 words) (CLO2, CLO5)
You are required to provide the following:
A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process diagram.
Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for Melbourne_housing.csv data set.
Discuss the results of Final Linear Regression Model for Melbourne_housing.csv data set drawing on key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc.) for predicting the selling price of residential properties (Price) and relevant supporting literature on interpretation of a Linear Regression Model. Include all appropriate outputs such as Altair AI Studio Processes, Graphs and Tables that support key aspects of exploratory data analysis and linear regression model analysis of the Melbourne_housing.csv data set in your Report.
Task 3 Predictive Analytics Case Study
The goal of the Predictive Analytics Case Study is to predict the income of a given population, which is labeled as <= 50K and >50K (refer to Appendix B Data Dictionary for Income Dataset). The study aims to identify the variables that are most likely to predict the income of the population. You will apply business understanding, data understanding, data preparation, modelling, and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Tasks 3.1 and 3.2.
Task 3.1 Conduct an exploratory data analysis (EDA) and data preparation of income.csv data set and summarise key findings of EDA and data preparation in a Table and discuss key findings. (800 words) (CLO2, CLO5).
You are required to summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the income.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc. and relationships with other variables, transformation of existing variables, creation of new variables in a table named Table 3.1 Results of Exploratory Data Analysis and Data Preparation. Hint: Statistics Tab and Chart Tab in Altair AI Studio provide a lot of descriptive statistical information and useful charts like Bar charts, Scatterplots required for Task 3.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 3.1 which variables contribute most to predicting the income of a given population, which is labelled as <= 50K and >50K. You could also consider transforming some variables and creating new variables and converting target/label variable into a binominal variable to facilitate analysis in Tasks 3.2. Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict the income of a given population, which is labelled as <= 50K and >50K.
Task 3.2 Build a Decision Tree model for predicting the income of a given population, which is labelled as <= 50K and >50K, on the income.csv data set using Altair AI Studio; provide following outputs: (1) Decision Tree process, (2) Decision Tree diagram, (3) Decision Tree rules; discuss key results of Decision Tree model drawing on these outputs. (800 words) (CLO2, CLO5)
You are required to briefly explain your final Decision Tree Model Process and discuss the results of the Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting the income of a given population, which is labelled as <= 50K and >50K based on key contributing variables and relevant supporting literature on interpretation of decision trees.