Reference no: EM133155902
FIT5196 Data wrangling - Monash University
For this assessment, you are required to write Python code to integrate several datasets into one single schema and find and fix possible problems in the data. Input and output of this assessment are shown below:
Task 1: Data Integration
In this task, you are required to integrate the input datasets from several sources into one dataset with the following schema.
Task 2: data reshaping
In this task, you need to study the effect of different normalization/transformation methods (i.e. standardization, minmax normalization, log, power, box-cox transformation) on the columns scrapped and observe and explain their effect assuming we want to develop a linear model to predict the "House_quarterly_growth" using "Median_house_price", "House_twelve_month_growth", "House_average_annual_growth" attributes. When reshaping the data, we have two main criteria. First, we want our features to be in the same scale and second, we want our features to have as much linear relationship as possible with the target variable (i.e., House_quarterly_growth). You need to first explore the data to see if any scaling or transformation is necessary (if yes why? and if not, also why?) and then perform appropriate actions and document your results and observations.
Task 3: Documentation
The main focus of the documentation would be on the quality of your explanation on task 2 but similar to the previous assignments, your notebook file should be in a decent format with proper sections and subsections.
Attachment:- Python code.rar