Conduct an exploratory data analysis and data preparation

Assignment Help Other Subject

Reference no: EM133727189

Business Intelligence

Assessment - Report

This assessment task sheet provides you with information about the requirements for your assessment. Detailed instructions and resources are included for completing the task. The Criterion Reference Assessment (CRA) Rubric that markers use to grade the assessment task is included.

Task overview

Learning outcome 1: Analyse and apply strategies and technologies for effective data management that supports evidence-based decisions.

Learning outcome 2: Research organisational and societal problems using descriptive, predictive and prescriptive analytics models drawing on both internal and external data sources to generate insight, create value and support evidence-based decision making.

Learning outcome 3: Communicate effectively in a clear and concise written manner for both senior and middle management with correct and appropriate acknowledgment of the main ideas presented and discussed.

Task description 1: You are required to prepare a report that undertakes three tasks to analyse data-driven decision-making and descriptive and predictive analytics. Task 1 examines Google's data-driven practices, focusing on meticulous data analysis for search algorithms and advertising strategies. Task 1.1 evaluates Google's data management strategies, while Task 1.2 explores the technologies used and assesses their overall contribution. Task 2 involves a detailed exploration of the Melbourne_housing.csv dataset, encompassing exploratory data analysis (Task 2.1) and building a Linear Regression model (Task 2.2) for predicting residential property prices. Finally, Task 3 focuses on predictive analytics, predicting the income levels (<= 50K or >50K) of a population. Task 3.1 involves exploratory data analysis and data preparation, and Task 3.2 constructs a Decision Tree model. Together, these tasks contribute to a comprehensive understanding of data analytics in diverse business scenarios.

Task details

Task 1 Case Study Analysis
In the business world, Google is a prime example of effective data-driven decision-making. The company relies heavily on data to enhance its search algorithms, advertising strategies, and user experience. Google's success in providing relevant search results, targeted advertisements, and personalized user experiences is attributed to its use of data. The company analyses vast amounts of data to understand user behaviour, preferences, and trends.

You are expected to read the following papers as a starting point to look for other relevant references to investigate data management strategies and technologies employed for data management at Google to support your investigation of Tasks 1.1 and 1.2:

Task 1.1 Investigate the data management strategies employed by Google and evaluate and discuss the effectiveness of these strategies in supporting evidence-based decisions. (10 marks 400 words)

Task 1.2 Investigate the technologies employed for data management at Google and assess and discuss their contribution to the overall effectiveness of data management. (10 marks 400 words)

Task 2 Exploratory Data Analysis and Linear Regression Analysis (40 Marks)

Carefully study Melbourne_housing.csv data set (See Appendix A Data Dictionary for Melbourne Housing Price Data Set) and accompanying description of each variable. Each record in the Melbourne_housing.csv data set contains twenty-one variables that determine Price (fifth variable). You should conduct some research to identify determinates/ drivers of the selling price of residential properties to fully understand and interpret the key findings of your exploratory data analysis (EDA) and Linear Regression Model for the Melbourne_housing.csv data set.
Task 2.1 Conduct and report on exploratory data analysis (EDA) of the Melbourne_housing.csv data set using Altair AI Studio data mining tool. (20 marks 800 words) (CLO2, CLO5)

You are required to provide the following:

a screen capture of your final EDA process, briefly describe your EDA process diagram.

summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for Melbourne_housing.csv. Table

2.1 should include key characteristics of each variable in Melbourne_housing.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.

Discuss the key results of the exploratory data analysis presented in Table 2.1 and provide a rationale for selecting the top 5 variables to predict the selling price of residential properties (Price). Focus on the relationships among independent variables, as well as their connections with the dependent variable (Price). Draw insights from the results of the EDA analysis and relevant literature on determinants affecting the selling price of residential properties.

Hint: Statistics Tab and Chart Tab in Altair AI Studio provide a lot of descriptive statistical information and the ability to create useful charts like Bar charts, Scatterplots, Boxplot charts etc. for EDA analysis. You might also like to look at running correlations and/or chi square tests as appropriate to determine which variables contribute most to predicting the selling price of residential properties (Price).

Task 2.2 Build and report on your Linear Regression model for predicting the selling price of residential properties (Price) using Altair AI Studio data mining process and appropriate set of data mining operators and a reduced set of variables from Melbourne_housing.csv data set. (20 marks 800 words) (CLO2, CLO5)

You are required to provide the following:

A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process diagram.

Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for Melbourne_housing.csv data set.

Discuss the results of Final Linear Regression Model for Melbourne_housing.csv data set drawing on key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc.) for predicting the selling price of residential properties (Price) and relevant supporting literature on interpretation of a Linear Regression Model. Include all appropriate outputs such as Altair AI Studio Processes, Graphs and Tables that support key aspects of exploratory data analysis and linear regression model analysis of the Melbourne_housing.csv data set in your Report.

Task 3 Predictive Analytics Case Study

The goal of the Predictive Analytics Case Study is to predict the income of a given population, which is labeled as <= 50K and >50K (refer to Appendix B Data Dictionary for Income Dataset). The study aims to identify the variables that are most likely to predict the income of the population. You will apply business understanding, data understanding, data preparation, modelling, and evaluation phases of the CRISP DM data mining process. It is important that you understand this data set to complete Tasks 3.1 and 3.2.

Task 3.1 Conduct an exploratory data analysis (EDA) and data preparation of income.csv data set and summarise key findings of EDA and data preparation in a Table and discuss key findings. (800 words) (CLO2, CLO5).

You are required to summarise the findings of your exploratory data analysis and data preparation in terms of describing key characteristics of each variable in the income.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc. and relationships with other variables, transformation of existing variables, creation of new variables in a table named Table 3.1 Results of Exploratory Data Analysis and Data Preparation. Hint: Statistics Tab and Chart Tab in Altair AI Studio provide a lot of descriptive statistical information and useful charts like Bar charts, Scatterplots required for Task 3.1 etc. You might also like to look at running some correlations and/or chi square tests depending on whether a variable is a categorical variable or a numeric variable. Indicate in Table 3.1 which variables contribute most to predicting the income of a given population, which is labelled as <= 50K and >50K. You could also consider transforming some variables and creating new variables and converting target/label variable into a binominal variable to facilitate analysis in Tasks 3.2. Briefly discuss the key findings of your exploratory data analysis and data preparation and justification for variables most likely to predict the income of a given population, which is labelled as <= 50K and >50K.

Task 3.2 Build a Decision Tree model for predicting the income of a given population, which is labelled as <= 50K and >50K, on the income.csv data set using Altair AI Studio; provide following outputs: (1) Decision Tree process, (2) Decision Tree diagram, (3) Decision Tree rules; discuss key results of Decision Tree model drawing on these outputs. (800 words) (CLO2, CLO5)
You are required to briefly explain your final Decision Tree Model Process and discuss the results of the Final Decision Tree Model drawing on key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting the income of a given population, which is labelled as <= 50K and >50K based on key contributing variables and relevant supporting literature on interpretation of decision trees.

Reference no: EM133727189

Questions Cloud

How and why your answers may have changed : How and why your answers may have changed? How you think these changes affect your actions with regard to dealing with health risks?

Case worker for adult protective services : You are working as a case worker for adult protective services. You are asked to go investigate a potential case of self-neglect in a 78-year-old man.

Initial outpatient psychiatry evaluation : Patient is a, seen today for an initial outpatient psychiatry evaluation.He reports past medical history of HTN on losartin 50 mg QD.

Analyze interplay between socialization media and education : Analyze the interplay between socialization, media, education, religion, family structures, and other pertinent factors in this context.

Conduct an exploratory data analysis and data preparation : CIS6008 Business Intelligence T2, 2024, University of Southern Queensland - Conduct an exploratory data analysis (EDA) and data preparation of income.csv data

Outpatient mental health service states : A new client at an outpatient mental health service states, When I have to face new people or situations-any situations in public

What are acceptable confidence level for thing like your car : Nothing is for certain. What are acceptable confidence levels for things like your car starting or your paycheck showing up on time?

Write conclusion on relationship between religion and music : Write a conclusion about the relationship between religion and music, especially religion and these types of music jazz, Latin pop, hip-hop/rap, and country.

Comprehensive strategic plan : To address the challenges posed by current EHR systems, a comprehensive strategic plan is needed. To improve interoperability,

User Account

All Pages