Create a numeric variable for last eruption date

Assignment Help Other Subject
Reference no: EM133705010

Data Wrangling and Visualization Assignment

Section A - Volcanos of the Holocene

The Global Volcanism Program (GVP) at the Smithsonian Institution maintains documentation on global volcanic activity. In this section you will use the GVP's database that catalogues volcanoes that have erupted in the last 10,000 years.

The supplied dataset is:
A2A volcanos.csv

Tasks:

1) Mount Vesuvius is a stratovolcano located in Italy that has erupted dozens of times. Does this dataset contain a list of all eruptions in the Holocene epoch and how do you know?

2) Examine the variables Latitude, Longitude, Elevation, and Last Eruption Date and comment on whether there are any range errors or missing data.

3) Create a numeric variable for last eruption date with negative values for BCE dates and positive values for CE dates.

4) Examine the frequency distribution of the numeric date variable that you created in part (c). Is the interpretation that eruptions are becoming more frequent over time valid?

Hint: try PROC HPBIN to perform ‘bucket' binning rather than PROC FREQ.

5) Categorise the variable tectonic_setting into two new variables:
Platetype: Intraplate, Rift zone, Subduction zone
Crusttype: Continental crust, Intermediate crust, Ocean crust
Your new variables should be coded numeric variables with custom formats applied. Show the frequency tables. What are the most common types of tectonic setting for volcanic eruptions?

6) Use an appropriate graph to show the frequency of tectonic settings (crust type and plate type).

7) Describe the distribution of elevation with appropriate summary statistics and a histogram.

8) Compare the distribution of elevation by plate type with summary statistics and a boxplot.

9) Inter-plate earthquakes are responsible for around 90% of the total seismic energy produced globally each year. Is our data consistent with this value? Why/why not?

Note: ‘Inter-plate' includes rift zone and subduction zone (activity occurring at the boundaries of tectonic plates) as opposed to Intraplate activity which occurs inside plates.

Section B - Global happiness index
The World Happiness Report ranks countries on the perceived happiness of their citizens. The happiness scores are based partly on data from the Gallup World Poll which is a nationally representative annual survey of each country's population aged 15 and over. There are five domains of happiness: social support, healthy life expectancy at birth, freedom to make life choices, perceptions of corruption, and generosity. Another strong predictor of happiness is a country's GDP per capita (Global Domestic Product per one hundred thousand population).
In this section you will explore, visualise, and interpret global happinessscores over the period 2015- 2019. The supplied datasets are:
A2B happiness_2015.csv
A2B happiness_2016.csv
A2B happiness_2017.csv
A2B happiness_2018.csv
A2B happiness_2019.csv

Tasks:

[A|C|E]
Prepare a new combined dataset for 2015-2019 using below instructions. Briefly document your decisions using code snippets and/or dot-points and tables.
Create a long format dataset for 2015-2019 containing the following variables:
Year
Country
Happiness_rank
Happiness_score
GDP_per_capita
Social_support
Healthy_life_expectancy
Freedom_to_make_life_choices
Perceptions_of_corruption
Generosity
You will need to harmonise the variable names since they are named differently in the older datasets (2015-2017) than in the newer ones (2018-2019). For example, the variable ‘Family' is the same as ‘Social support'.
You will need to edit some country names. For example, "Macedonia" is now called "North Macedon". Check that each country only has one observation per year.

[A]
Show a frequency table for the variable Year for the long dataset.
[A|C]
List the happiness ranks and happiness scores for Australia and the countries that were in the top 5 or the bottom 5 in 2019.

[A|C]
Create two graphs for the change in happiness index over time for the two groupings identified in part (a) respectively: the top five plus Australia and then the bottom five. Let SAS choose the natural scale for the vertical axis. Create the third graph with all eleven countries together.

[E]
How do we interpret what the above two types of plots are telling us in terms of the scale?

[A|C]
Create a scatterplot of life expectancy by GDP per capita for the year 2019.

[E]
How do we interpret the position of Saudi Arabia relative to Hong Kong in the scatterplot?
[A|C|E]
How well does economy (measured in terms of GDP per capita) correlate with the other four measures in the happiness index? Show the necessary workings and justify.

[A|C]
Use the geographical region variable from the 2015 dataset and merge this information into the stacked dataset. Show a frequency table of region by year.

[A|C]
Create a single plot that shows the distribution of happiness score by region. Only include regions in Asia, Europe, and Africa and make sure the regions appear grouped within these three super regions.

Section C - Australian road safety
Over the last 50 years Australia has implemented a variety of successful interventions for reducing the number of fatalities and serious injuries on our roads including: seatbelts, licensing schemes, and targeted campaigns against drink driving and speeding. Along with safer vehicles and improved roads, these interventions have been shown to be effective in reducing road fatalities over time despite our growing population. National data on fatal crashes on Australian roads are recorded in the Australian Road Deaths Database (ARDD) maintained by the Bureau of Infrastructure and Transport Research Economics (BITRE).
In this question you will investigate data on fatal crashes from January 1989 to October 2020. The files provided are:
A2C road crashes.csv
Details on the timing, location, and setting of crashes involving at least one fatality.
A2C road fatalities.csv
Basic demographic information on people killed on roads.
A2C ARDD data dictionary.pdf
Data dictionary for the variables in the road crashes and fatalities datasets.
A2C ABS population data.xls
Quarterly population data for Australian states from June 1981 to March 2020 from the ABS.

Tasks:
[C|E]

Merge the crashes and fatalities datasets. Describe the relationship between these datasets and what checks you need to make to ensure the merge performs correctly.

[A|E]
Using the provided ABS time series data for the state populations, compare the number of fatalities and fatality rates per capita (deaths per 100,000 persons) by state over time. Describe the steps you have done to produce the results. Are Australian State roads becoming safer according to these statistics? Include appropriate graphs to support your answer.
HINT: for the ABS population dataset you can delete unnecessary rows or columns in Excel before reading it into SAS.

[E]
The below table contains the summary statistics of the road crashes and fatalities. Provide a concise (max 500 words) yet comprehensive report of relationships between different variables for the Royal Commission on Road Safety. Your report should describe the data from different dimensions/perspectives and refer to the specific numbers from the table, so that the Royal Commission is able to understand as many aspects of road safety as possible.

Do not include any recommendations.

 

Crashes

Fatalities

Number

Percent

Number

Females

Males

%Male

Factors

Total

46,631

100%

51,833

14,743

37,063

72%

Crashtype

Multiple

19,804

42%

23,151

7,401

15,729

68%

Single

26,827

58%

28,682

7,342

21,334

74%

Businvolved

No

45,823

98%

50,838

14,363

36,449

72%

Yes

788

1.7%

973

375

597

61%

Heavy rigid truckinvolved

No

27,070

58%

29,776

8,263

21,492

72%

Yes

1,386

3.0%

1,544

437

1,105

72%

Articulated

truckinvolved

No

42,204

91%

46,559

13,410

33,129

71%

Yes

4,407

9.5%

5,252

1,328

3,917

75%

Timeofday

Day

26,546

57%

29,499

9,906

19,570

66%

Night

20,085

43%

22,334

4,837

17,493

78%

Timeofweek

Weekday

27,622

59%

30,463

9,309

21,134

69%

Weekend

19,009

41%

21,370

5,434

15,929

75%

*percentages that don't add up to 100% are due to missing/unknown data.

Reference no: EM133705010

Questions Cloud

Discuss the various leadership concepts : Discuss the various leadership concepts and theories identified in existing literature.
What information did you gather and how did you get it : A clear opening statement of your recommendation for or against the project. What information did you gather, and how did you get it?
Correctly sorted contextual factors in ethical dilemma : Show one paragraph describing at least three correctly sorted contextual factors in the ethical dilemma.
Unexplained regional variations in heart disease mortality : Cardiovascular disease (CVD) is the leading cause of death in Canada with wide, unexplained regional variations in heart disease mortality.
Create a numeric variable for last eruption date : Examine the variables Latitude, Longitude, Elevation, and Last Eruption Date and comment on whether there are any range errors or missing data
About vaginal orange discharge with odor for three days : A is 37-year-old white female who comes to the clinic with concerns about vaginal orange discharge with odor for three days.
HCPCS codes for Patient presents for her usual headache : What are the HCPCS codes for Patient presents for her usual headache. She has recurrent headaches, approximately every month or so.
Simulate a real-world task that you may have to undertake : Provide a brief overview of your research proposal and methodology and Find a relevant and applicable time series dataset - Structured such that the reader
Patient presents for her usual headache : What are the 4 HCPCS codes for Patient presents for her usual headache. She has recurrent headaches, approximately every month or so.

Reviews

Write a Review

Other Subject Questions & Answers

  Analyze the stories for literary devices

Analyze the stories for literary devices, and record one example of the following devices from any of her stories that are exceptionally well-done

  What gender differences exist regarding jealousy of sexual

What gender differences exist regarding jealousy of sexual versus emotional connections?

  How concurrences address the corporate personhood issue

In Hobby Lobby vs Burwell Compare and contrast how the majority, dissent, and concurrences address the corporate personhood issue?

  Define stereotypes can harm the helping process

Discuss how personal values, beliefs, prejudices and stereotypes can help or harm the helping process

  How does that impact your work as a counselor

What are some of the contributing factors in your cultural context? As Christian psychotherapists and counselors, how can we speak to this culture and invite

  Describe some of the pragmatic issues

Describe some of the pragmatic issues that researchers face when using the developmental designs and how these issues can threaten the validity of your results.

  Do parent-child interactions differ by family structure

Why might we expect the relationship between living in a single parent family and the likelihood that children experience a negative outcome be related to the quality of parenting? Do parent-child interactions (e.g., time together, supervision, et..

  Describe the political party system

Describe the political party system. Do you believe there should only be two main parties? What are benefits and the disadvantages of the political party system

  Identify ethical perspectives in the global organization

Identify ethical perspectives in the global organization

  Explain component of national development programmes

According to Trutnev, Vidyasova, and Chugunov (2015), informational and analytical activities and forecasting for the process of socioeconomic development.

  Explain three secondary agents of political socialization

identify and explain three secondary agents of political socialization. Do you think a person's political opinion can change over time

  Pros and cons of adopting this technology

Identify 1 major technological trend (such as data analytics, decision support systems, telemedicine, Discuss the pros and cons of adopting this technology.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd