Find and fix possible problems in the data

Assignment Help Python Programming
Reference no: EM132310412

Data wrangling and Data exploration and visualisation Assignment -

Assignment 1 - Data wrangling

For this assessment, you are required to write Python (Python 2/3) code to integrate several datasets into one single schema and find and fix possible problems in the data. Input and output of this assessment are shown below:

Table 1. The input and output of the task

Inputs

Output

Jupyter notebook

vic_suburb_boundary.zip,

gtfs.zip

Crimebylocation.xlsx

<student_no>.csv

<student_no>_solution.csv

<student_no>_ass3.ipynb

You are given multiple datasets in various formats and the task is about creating housing information in Victoria, Australia. Your assessment is to perform the following tasks.

Task 1: Data Integration

In this task, you are required to integrate these datasets into one with the following schema. Table for Description of the final schema is in attached file.

Task 2: data reshaping

In this task, you need to study the effect of different normalization/transformation methods (i.e. standardization, min-max normalization, log, power, and root transformation) on Rooms?, crime_C_average, travel_min_to_CBD?, and property_age attributes. You need to observe and explain their effect assuming that we want to build a linear model on price using these attributes as the predictors of the linear model and recommend which one(s) do you think would work better on this data. When building the linear model, the same normalization/transformation method can be applied to each of these attributes.

Task 3: Documentation and Methodology

The main focus on the documentation would be on the quality of your explanation on finishing these tasks. Your notebook file should be on a decent format with proper sections and subsections.

Assignment 2 - Data exploration and visualisation - Visualisation Project

In this continuation of the Data Exploration Project you get a chance to create an interactive visualisation that communicates some of your findings from the Data Exploration Project.

Learning outcomes -

  • Choose an appropriate data visualisation;
  • Implement interactive data visualisations using python, R and other tools.

Details of task:

1. Identify which findings from the Data Exploration Project you wish to communicate and who the intended audience is. Be selective, you do not need to and should not communicate everything you found. The intended audience might be your classmates, general public or politicians or whoever you like.

2. Design a narrative visualisation to communicate your findings to the intended audience. It should allow some viewer interaction and be designed using the five-sheet design methodology.

3. Implement your visualisation as a web-based presentation using R or JavaScript and D3. In unusual cases you may use other tools but you need to obtain prior permission from your tutor.

Report & Final Product: At the start of the Exam Period you need to submit (through Moodle) a directory containing the implementation code for your narrative visualization and a written report of no more than 15 pages excluding Appendix. It must be organised with section headings as follows:

1. Introduction: Precise description of what message you wanted your narrative visualisation to convey and who the intended audience is.

2. Design: Description of the visualization design process. This should summarise the 5 design sheets, detailing the alternatives you considered and the justification for choosing your final design.

3. Implementation: Description of the implementation including libraries used and reasons for the implementation decisions for your narrative visualisation.

4. User guide: Instructions for viewing and exploring the narrative visualisation using a standard web browser and images showing how the visualization works.

5. Conclusion: Summarise what you achieved and a reflection on what you learnt in this project and what in hindsight you might have done differently to improve the result.

6. Appropriate references and bibliography.

7. Appendix: Your 5 design sheets (images are perfect).

Your report should contain images of the final product as well as pointing out any reasons why your project was difficult, e.g. large data set, use of D3 etc.

Attachment:- Assignment Files.rar

Reference no: EM132310412

Questions Cloud

What would you include in a crisis management plan : Pick a type of organization (hospital, school, manufacturing facility, etc.). What would you include in a crisis management plan for that organization?
What are some of the factors both cost cutting : What are some of the factors both cost cutting, saving money and revenue enhancing that you could consider?
What control chart should i use : You've collected 10 samples with a subgroup of 8. What control chart should I use?
Upcoming conversation with pat : What is your plan for trying to achieve your goal(s)? In other words, how do you intend to approach this conversation? Be brief but specific-mention specific
Find and fix possible problems in the data : FIT5196 - Data wrangling and FIT5147 - Data exploration and visualisation Assignment, Monash University, Australia. Find and fix possible problems in the data
Organizational redesign for general electric : What are some evidence of ongoing organizational redesign for 'General Electric'? Like structures, processes, roles or skills?
What is cellular manufacturing : What is Cellular Manufacturing? why is it useful? Why should implement this into your company?
How does written communication change in academic settings : How is your communication style different in the classroom than when you are in other contexts? How does written communication change in academic settings.
Bizops performance productivity plan and targets : Developing, monitoring and review performance system for BizOps performance productivity plan and targets

Reviews

len2310412

5/22/2019 11:33:54 PM

I have got two new assignments one is Data Wrangling and other is Data Visualization. Both need to be done as per specifications. For Data Visualization I will let you know the topic on which I want Five design sheet and perfect visualization for presentation. For Wrangling, I will provide all relevant files needed to finish that assignment. Note:- Data Wrangling and Data visualization are two different assignments. There is no connection between them.

len2310412

5/22/2019 11:33:48 PM

Data Wrangling - This is an individual assessment and worth 30% of your total mark. Note 1: the output csv file must have the exact same columns as specified on the schema. If you decide not to calculate any of the required attributes, then you must have a column for that attribute in your final data-frame with the default value as the value of all the rows. Please note that output file which is not in a correct format, as specified in the integrated schema, won't be marked.

len2310412

5/22/2019 11:33:41 PM

Note 2: the radius of the earth is still 6378 km! Note 3: In table 2, numbers in front of some of the rows in the format of (a/b) are the allocated mark associated with that attribute. For example, the "suburb" attribute carries 20% of the total mark of task 1. Please note that 10% of the total marks for task 1 is marked on any other issue that may occur during the data integration process. Note 4: You can only use the vic_suburb_boundary.zip file to extract the suburb name of the property. Using other external datasets or packages (e.g., geopy) to directly get the suburb information will be penalized (this will result in 0 marks for the suburb attribute).

len2310412

5/22/2019 11:33:35 PM

Data visualization - It is an individual assignment and worth 40% of your total mark. Your report should contain images of the final product as well as pointing out any reasons why your project was difficult, e.g. large data set, use of D3 etc. The uploaded code must contain all data and files required to run your visualisation.

len2310412

5/22/2019 11:33:30 PM

Marking Rubric - Design [15%]- Appropriate use of five design sheet methodology and evaluation of alternatives. Quality of final design: clear signposting of messages and intended narrative, provision of appropriate context for reader, good use of colour, references to data sources and appropriateness for intended audience. Justification of final design in terms of human perceptual system and human communication assumptions.

len2310412

5/22/2019 11:33:25 PM

Implementation [7%] - Correctness and robustness, speed, accessibility and Comments and code quality. Difficulty [10%] - Degree of difficulty, e.g use of non-tabular data, large dataset, D3 programming, sophisticated user interaction. Presentation [3%] - Quality of oral presentation (confidence, speed, voice)? and quality of slides (legibility, design, images etc), Logical structure and Choice of content (completeness, appropriate level, discussion of design and implementation alternatives). Written report [5%] - Quality of writing, referencing, images, logical structure and Completeness.

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd