Reference no: EM132313907 , Length: word count:2000
Introduction to Data Science Assignment -
The purpose of this data analysis report is to demonstrate your data processing skills and your ability to analyse real-world data. It helps to develop a deeper understanding of the importance of data and information in business.
Assignment Task - A research team planned to study Australian road transport crash fatalities from 2010 to 2018 (inclusive). As a team member, you were given the dataset about Australian Road Death Fatalities, and were requested to analyse the data and prepare a report about your work and findings.
The dataset can be downloaded from Blackboard or the above website. The dataset contains basic demographic and crash details of Australian road crashes between 1989 and 2019. As the team does not have any specific goal for the analysis, you have the freedom to explore the data, and dig out anything you feel interesting or significant. However, you are to limit your research and analysis to the years 2010 to 2018.
The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.
To prepare the report, please include the following sections:
1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from, what are the dimensions and structures of the data.
2. Data Setup
Describe how to load the data, and how the pre-processing is performed.
The original dataset is not ready for analysis and it is different from the data forms that we are familiar with in previous practices. This means we need to do some pre-processing, either for the whole dataset, or for a subset of the dataset required for each sub task described later.
Once you have some ideas of exploratory or advanced analysis, you need to adjust the form of dataset. This can be achieved either by manipulating records in R by transposition or subsetting, or with other tools (e.g. notepad or excel) before reading them into R. Please explain your solution in this section.
3. Exploratory Data Analysis
3.1 - One-variable analysis - One-variable analysis studies one variable (one row or one column) each time. For example, we can select a particular Australian state or year to get a column of numbers and the histogram can be used.
Perform 2 one-variable analyses. Plot one graph for each variable. Explain the finding for each graph.
3.2 - Two-variable analysis - Two-variable analysis studies the relation between two variables. For example, we can select "Diseases of the nervous system" and "Year", then a time series (scatter) plot can be drawn. Or, we can select "2015" and "Causes".
Perform 2 two-variable analysis. Plot one graph for each variable. Explain the finding for each graph.
4. Advanced Analysis
4.1 - Clustering - Briefly explain the concept of clustering and k-means.
Perform 1 clustering analysis to group years according to a selected cause.
4.2 - Linear Regression - Briefly explain the concept of linear regression.
Perform 2 linear regression analysis. Plot the learned models.
5. Conclusion
6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.
For the data analysis (Section 3 & 4), you need to provide both R code, the explanation to the code, and the result. Please represent each R code snippet in a box with some comments.
Report Format - Your report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. Text in R code snippets are not counted.
The report MUST be formatted using the following guidelines:
- Title Page - Must not contain headers, footers, or page numbering. Include your name as the report's author.
- Header - Report title
- Footer - your name and the page number
- Paragraph text - 12 point Calibri single line spacing
- Headings - Arial in an appropriate type size
- Margins - 2.5cm on all margins
- Page numbering
- Executive summary to the last page of Table of Figures to use roman numerals (i, ii, iii, iv)
- Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1 from the introduction.
- The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.
Please follow the conventions detailed in: Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.
Attachment:- Assignment Files.rar