Briefly explain the concept of clustering and k-means

Assignment Help Computer Networking
Reference no: EM132313907 , Length: word count:2000

Introduction to Data Science Assignment -

The purpose of this data analysis report is to demonstrate your data processing skills and your ability to analyse real-world data. It helps to develop a deeper understanding of the importance of data and information in business.

Assignment Task - A research team planned to study Australian road transport crash fatalities from 2010 to 2018 (inclusive). As a team member, you were given the dataset about Australian Road Death Fatalities, and were requested to analyse the data and prepare a report about your work and findings.

The dataset can be downloaded from Blackboard or the above website. The dataset contains basic demographic and crash details of Australian road crashes between 1989 and 2019. As the team does not have any specific goal for the analysis, you have the freedom to explore the data, and dig out anything you feel interesting or significant. However, you are to limit your research and analysis to the years 2010 to 2018.

The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.

To prepare the report, please include the following sections:

1. Introduction

Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from, what are the dimensions and structures of the data.

2. Data Setup

Describe how to load the data, and how the pre-processing is performed.

The original dataset is not ready for analysis and it is different from the data forms that we are familiar with in previous practices. This means we need to do some pre-processing, either for the whole dataset, or for a subset of the dataset required for each sub task described later.

Once you have some ideas of exploratory or advanced analysis, you need to adjust the form of dataset. This can be achieved either by manipulating records in R by transposition or subsetting, or with other tools (e.g. notepad or excel) before reading them into R. Please explain your solution in this section.

3. Exploratory Data Analysis

3.1 - One-variable analysis - One-variable analysis studies one variable (one row or one column) each time. For example, we can select a particular Australian state or year to get a column of numbers and the histogram can be used.

Perform 2 one-variable analyses. Plot one graph for each variable. Explain the finding for each graph.

3.2 - Two-variable analysis - Two-variable analysis studies the relation between two variables. For example, we can select "Diseases of the nervous system" and "Year", then a time series (scatter) plot can be drawn. Or, we can select "2015" and "Causes".

Perform 2 two-variable analysis. Plot one graph for each variable. Explain the finding for each graph.

4. Advanced Analysis

4.1 - Clustering - Briefly explain the concept of clustering and k-means.

Perform 1 clustering analysis to group years according to a selected cause.

4.2 - Linear Regression - Briefly explain the concept of linear regression.

Perform 2 linear regression analysis. Plot the learned models.

5. Conclusion

6. Reflections

In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.

For the data analysis (Section 3 & 4), you need to provide both R code, the explanation to the code, and the result. Please represent each R code snippet in a box with some comments.

Report Format - Your report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. Text in R code snippets are not counted.

The report MUST be formatted using the following guidelines:

  • Title Page - Must not contain headers, footers, or page numbering. Include your name as the report's author.
  • Header - Report title
  • Footer - your name and the page number
  • Paragraph text - 12 point Calibri single line spacing
  • Headings - Arial in an appropriate type size
  • Margins - 2.5cm on all margins
  • Page numbering
  • Executive summary to the last page of Table of Figures to use roman numerals (i, ii, iii, iv)
  • Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1 from the introduction.
  • The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.

Please follow the conventions detailed in: Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.

Attachment:- Assignment Files.rar

Reference no: EM132313907

Questions Cloud

Distinguish operating system component from architecture : Distinguish Operating System (OS) component from architecture.
Briefly describe computer forensics : What are some main differences between the methods used in a Forensics approach to discovering information vs Industrial Espionage?
Technology and malware-software : Please describe an example or two from each (people, technology, malware) and how you as a network manager would respond to address the risk
Create a templated vector class : Create a templated vector class and compare it with the std::vector class. Copy this vector.hpp file and try_vector.cpp file.
Briefly explain the concept of clustering and k-means : Introduction to Data Science Assignment - Briefly explain the concept of clustering and k-means. Briefly explain the concept of linear regression
Impacted the role of infosec and it organizations : Describe how policies and regulations have impacted the role of InfoSec and IT organizations in any two of these four contexts
Describe three tools used primarily by attackers : Describe three tools used primarily by attackers (ethical or malicious), three tools used primarily by defenders, and one tool useful to both.
Site admins and staff are also bidding for work : It is impossible to work in this site when you are ordinary person. This site's admins and staff are also bidding for work. Our registration is absolutely free.
Finding threats with stride : Discuss the three ways to judge whether you are done finding threats with STRIDE.

Reviews

len2313907

5/30/2019 12:06:13 AM

Report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. Text in R code snippets are not counted. The purpose of this data analysis report is to demonstrate your data processing skills and your ability to analyse real-world data. It helps to develop a deeper understanding of the importance of data and information in business. Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1 from the introduction. . The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.

Write a Review

Computer Networking Questions & Answers

  Networking and types of networking

This assignment explains the networking features, different kinds of networks and also how they are arranged.

  National and Global economic environment and ICICI Bank

While working in an economy, it has a separate identity but cannot operate insolently.

  Ssh or openssh server services

Write about SSH or OpenSSH server services discussion questions

  Network simulation

Network simulation on Hierarchical Network Rerouting against wormhole attacks

  Small internet works

Prepare a network simulation

  Solidify the concepts of client/server computing

One-way to solidify the concepts of client/server computing and interprocess communication is to develop the requirements for a computer game which plays "Rock, Paper, Scissors" using these techniques.

  Identify the various costs associated with the deployment

Identify the various costs associated with the deployment, operation and maintenance of a mobile-access system. Identify the benefits to the various categories of user, arising from the addition of a mobile-access facility.

  Describe how the modern view of customer service

Describe how the greater reach of telecommunication networks today affects the security of resources which an organisation provides for its employees and customers.

  Technology in improving the relationship building process

Discuss the role of Technology in improving the relationship building process Do you think that the setting of a PR department may be helpful for the ISP provider? Why?

  Remote access networks and vpns

safekeeping posture of enterprise (venture) wired and wireless LANs (WLANs), steps listed in OWASP, Securing User Services, IPV4 ip address, IPV6 address format, V4 address, VPN, Deploying Voice over IP, Remote Management of Applications and Ser..

  Dns

problems of IPV, DNS server software, TCP SYN attack, Ping of Death, Land attack, Teardrop attack, Smurf attack, Fraggle attack

  Outline the difference between an intranet and an extranet

Outline the difference between an intranet and an extranet A programmer is trying to produce an applet with the display shown in Figure 1 below such that whenever one of the checkboxes is selected the label changes to indicate correctly what has..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd