Reference no: EM132310412
Data wrangling and Data exploration and visualisation Assignment -
Assignment 1 - Data wrangling
For this assessment, you are required to write Python (Python 2/3) code to integrate several datasets into one single schema and find and fix possible problems in the data. Input and output of this assessment are shown below:
Table 1. The input and output of the task
Inputs
|
Output
|
Jupyter notebook
|
vic_suburb_boundary.zip,
gtfs.zip
Crimebylocation.xlsx
<student_no>.csv
|
<student_no>_solution.csv
|
<student_no>_ass3.ipynb
|
You are given multiple datasets in various formats and the task is about creating housing information in Victoria, Australia. Your assessment is to perform the following tasks.
Task 1: Data Integration
In this task, you are required to integrate these datasets into one with the following schema. Table for Description of the final schema is in attached file.
Task 2: data reshaping
In this task, you need to study the effect of different normalization/transformation methods (i.e. standardization, min-max normalization, log, power, and root transformation) on Rooms?, crime_C_average, travel_min_to_CBD?, and property_age attributes. You need to observe and explain their effect assuming that we want to build a linear model on price using these attributes as the predictors of the linear model and recommend which one(s) do you think would work better on this data. When building the linear model, the same normalization/transformation method can be applied to each of these attributes.
Task 3: Documentation and Methodology
The main focus on the documentation would be on the quality of your explanation on finishing these tasks. Your notebook file should be on a decent format with proper sections and subsections.
Assignment 2 - Data exploration and visualisation - Visualisation Project
In this continuation of the Data Exploration Project you get a chance to create an interactive visualisation that communicates some of your findings from the Data Exploration Project.
Learning outcomes -
- Choose an appropriate data visualisation;
- Implement interactive data visualisations using python, R and other tools.
Details of task:
1. Identify which findings from the Data Exploration Project you wish to communicate and who the intended audience is. Be selective, you do not need to and should not communicate everything you found. The intended audience might be your classmates, general public or politicians or whoever you like.
2. Design a narrative visualisation to communicate your findings to the intended audience. It should allow some viewer interaction and be designed using the five-sheet design methodology.
3. Implement your visualisation as a web-based presentation using R or JavaScript and D3. In unusual cases you may use other tools but you need to obtain prior permission from your tutor.
Report & Final Product: At the start of the Exam Period you need to submit (through Moodle) a directory containing the implementation code for your narrative visualization and a written report of no more than 15 pages excluding Appendix. It must be organised with section headings as follows:
1. Introduction: Precise description of what message you wanted your narrative visualisation to convey and who the intended audience is.
2. Design: Description of the visualization design process. This should summarise the 5 design sheets, detailing the alternatives you considered and the justification for choosing your final design.
3. Implementation: Description of the implementation including libraries used and reasons for the implementation decisions for your narrative visualisation.
4. User guide: Instructions for viewing and exploring the narrative visualisation using a standard web browser and images showing how the visualization works.
5. Conclusion: Summarise what you achieved and a reflection on what you learnt in this project and what in hindsight you might have done differently to improve the result.
6. Appropriate references and bibliography.
7. Appendix: Your 5 design sheets (images are perfect).
Your report should contain images of the final product as well as pointing out any reasons why your project was difficult, e.g. large data set, use of D3 etc.
Attachment:- Assignment Files.rar