STAT6001 Data Wrangling and Visualisation Assignment

Assignment Help Basic Statistics
Reference no: EM132634293

STAT6001 Data Wrangling and Visualisation - University of Newcastle

Section A - Space Race (all launches since 1957)

For Section A of this assignment you will use Excel and/or PowerBI to prepare the dataset "A1A space race.csv" and to create visualisations that help answer questions about the data. The dataset was sourced from kaggle which was scraped from and contains data on all space missions since 1957.

Question 1

a) Create a variable for Country based on the launch location. Document any decisions you make regarding the country of any launches conducted at sea or on islands.

Show a table of the total number of launches by country. Which two countries have the highest number of launches?

b) Create a line graph showing the number of launches per year since 1957. According to the graph, what year was the peak?

c) Filter the data to launches in the USA only. Is there any seasonal trend in the timing of launches throughout the year?

d) Create a graph that shows the status of rockets and a graph that shows the status of missions. What proportion of rockets are active and what proportion of missions have been successful?

e) Create a table that shows the number of and total cost of rocket launches by country. Which are the two countries that have spent the most on rocket launches? What issues are there with this comparison?

Question 2

a) Dichotomise mission status into "Successful" and "Not successful". Create a stacked bar chart with heights set to 100% that shows the mission success rates of Russia and the USA.

b) Compare the annual number of launches over time for Russia and the USA. What periods of high activity and/or trend(s) do you see in terms of mission launches for the two nations?

Hint: the time period of the ‘space race' is generally considered to be 1955-1975 and the Cold War between US and Soviet union spanned from 1947 to 1991.

c) Collapse "Russia" and "Kazakhstan" into a single category called "USSR/Russia". How would this affect the results of previous parts of Questions 1 and 2?

Section B - Earthquakes 1965-2016

Import the dataset ‘A1B earthquakes.csv' into SAS to answer the following questions. The dataset contains the date, time, location, size, and source of significant earthquakes (magnitude 5.5 or higher) recorded by seismograph networks between 1965 and 2016. The data were recorded by the National Earthquake Information Center (NEIC) and made available online by the United States Geological Survey (USGS).

Description of variables
• Latitude - number of degrees north or south of the equator (negative values for southern hemisphere, positive values for northern hemisphere), -90 to +90
• Longitude - number of degrees east or west of the prime meridian (negative values indicate west, positive values indicate east), -180 to +180
• Type - type of seismic event
• Depth - in kilometres, vertical distance below mean sea level
• Depth seismic stations - number of seismic stations that supplied data for the depth measurement
• Magnitude - best available estimate of the size of the seismic event at its source, measured on a (base 10) logarithmic scale
• Magnitude type - algorithm type used to calculated magnitude
• Magnitude seismic stations - number of seismic stations that supplied data for the magnitude measurement
• Azimuthal gap - in degrees (0-360), gap between seismic stations. Larger values indicate higher uncertainty in depth and location measurements
• Horizontal distance - in kilometres, indicates uncertainty in the horizontal location measurement
• Status - indicates whether the event has been reviewed for validity by a human or automatically processed by the system.

Questions
For each question part your answer should only include necessary SAS output (tables, graphs). You should include brief sections of SAS code.

Question 1
a) Explore the variables in the dataset and complete the table below.

For each variable in the table, list the type (e.g., continuous, discrete, ordinal, categorical) and the number of rows missing an entry for that variable. If the variable is categorical or ordinal list the number of levels; if the variable is continuous or discrete list the minimum and maximum values.

Variable

Variable type

N levels

(if categorical)

Min, Max

(if numeric)

N missing

Latitude

 

 

 

 

Longitude

 

 

 

 

Type

 

 

 

 

Depth

 

 

 

 

Depth seismic

stations

 

 

 

 

Magnitude

 

 

 

 

Magnitude_type

 

 

 

 

Magnitude seismic

stations

 

 

 

 

Azimuthal_gap

 

 

 

 

Horizontal_distance

 

 

 

 

Status

 

 

 

 

b) Are there any range errors for the numeric variables? Explain why/why not.

c) Use an appropriate graph and summary statistics to describe the distribution of magnitude.

d) Create a formatted numeric variable that categorises magnitude according to the following classes:

Show your SAS code and a frequency table of magnitude class.

e) Examine the distribution of depth using a histogram.

The depth of earthquakes can be categorised into three zones. Shallow earthquakes are between 0 and 70km deep; intermediate earthquakes, 70-300 km deep; and deep earthquakes, 300-700 km deep.

Create a formatted numeric variable that categorises depth for earthquakes only (not other seismic events that are recorded in the dataset). Show your SAS code and a frequency table of depth zone.

What proportion of earthquakes occur in the Deep zone?

f) Examine the relationship between depth zone and magnitude class for Earthquakes using a contingency table.

Does magnitude differ by depth zone? Use appropriate summary table(s) and graph(s) to support your conclusion.

Question 2 - Own question

Propose your own question that can be answered by this dataset and investigate the answer using tables and/or charts. Write a summary of your findings (approximately 100-200 words).

For example, you might like to investigate one of these topics:
• Create maps in PowerBI showing the location of earthquakes. Show the depth zones and then the magnitude classes.
• Has the annual number of earthquakes changed over time? What about the average magnitude?
• What are the characteristics of the events that were not earthquakes?
• Investigate the bump in the tail of the distribution of depth.
• Compare one of the error variables (e.g., azimuthal gap, horizontal distance) by whether or not the measurements were verified by a human.

1. Describes a question or topic of interest as it relates to variables in the dataset.
2. Statements are supported by relevant tables or charts as evidence from the data.
3. Refers to specific quantities (counts, percentages, statistics) as part of written answer.
4. Communicates clearly regarding filtered/grouped data or categories when summarising data or making comparisons.
5. Writes with clarity and organisation using report-style language.

Attachment:- Data Wrangling and Visualisation.rar

Reference no: EM132634293

Questions Cloud

How has your construct been tested in the past : How has your construct been tested in the past? Include a focused review of the literature in this area, discussing information on specific measures.
Big data analytics in e-healthcare industry : There are several benefits as well as challenges associated with the use of Big Data Analytics in the e-Healthcare industry.
Find what was the total manufacturing cost assigned to job : What was the total manufacturing cost assigned to Job 0? Sweeten Company had nojobs in progress at the beginning of March and no beginning inventories.
Which course topics would you have liked to have covered : Which course topics would you have liked to have covered in more depth or have added to this course? What was the most important thing you learned in this.
STAT6001 Data Wrangling and Visualisation Assignment : STAT6001 Data Wrangling and Visualisation Assignment Help and Solution, University of Newcastle - Assessment Writing Service
What is the selling price for job : What is the selling price for Job 408 if the total number of machine- hours in the Assembly Department increases from 3,000 machine-hours to 5,000 machine-hours
Identify the necessary case management roles : Individuals or families who need case management services do so because they have a number of problems. Like Lonnie and Dorothy, they may "get by" for a period.
What was the cost of unused capacity during the month : Prepare an income statement for the month. Your income statement should include the cost of unused capacity as a period expense.
What impact does racism have on student behavior issues : Based on what you have learned from your readings in Comparative Approaches to Program Planning pertaining to the "when" and "which" questions.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Confidence interval-difference in proportion

Borst et.al. investigated the relation of ego development, age, gender, and diagnosis to suicidality among adolescent psychiatric inpatients.

  Calculate the value of the test statistic

a. Specify the competing hypotheses to test whether the recent proportions differ from those cited in the study. b. Calculate the value of the test statistic.

  What is the probability that the total number of meals

What is the probability that the total number of meals for the two Americans during a week - What property does this suggest about a Poisson random variable

  Describe the types of graphs

Give an example of a study-real or hypothetical-in the social sciences that might display its data using the following types of graphs.

  Find confidence interval for mean health cost per worker

Tthe standard deviation was $32. Calculate a 95% confidence interval for the mean health cost per worker per month for all small companies.

  What is the probability that mr johnson can take newspaper

What is the probability that Mr. Johnson can take the newspaper with him to work? Use computer simulation to find the probability.

  Quantitative environmental learning project reports

The Quantitative Environmental Learning Project reports on a study of river velocity versus depth: The data "were acquired at a station below Grand Coulee Dam

  Determine the type of experiment

This is an example of a Binomial probability experiment; This is neither a Poissonnor a Binomial probability experiment; Not enough information to determine the type of experiment.

  What is the net present value of project

What is the net present value of this project if the spot rate of the Australian dollar for the two years is forecasted to be $.55 and $.60, respectively?

  Right circular cone of radius r

Suppose that a right circular cylinder of radius r and height h is inscribed in a right circular cone of radius R and height H.

  Discrete-continuous probability distribution

Give an example representing a discrete probability distribution and a continuous probability distribution. Explain why your choice is discrete and continuous.

  Making about the relationship between two variables

If we conclude that the first initial of a student's last name is NOT related to whether the person owns an iPhone

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd