Calculate an estimate of an award winners age

Reference no: EM133685008

Data Wrangling and Visualization Assignment

Section A - Academy Award winners

Section B - Rogue waves

For Section A you may use any software of your choice. For Section B you must use SAS. There are two datasets (one for each Section), which are available on the Assessment page:
A1A Academy awards.xls
A1B Rogue waves.csv
The marks are awarded for (note, not all questions require all three components):
[A] Answer / output.
[C] Code that you used to generate output.
[E] Justification / explanation / discussion.
It is assumed that you have read Modules 1, 2, and 3 (sections 1-6), and worked through the corresponding examples and exercises.

If you need to clarify the wording of any of the questions or if you have technical issues you may post in the Discussion Forum on Canvas, but your final submission must consist of your own work in accordance with the Academic Integrity Policy.

Section A - Academy Award winners

The Academy Awards, or "Oscars", are international awards given to meritorious achievement in the film industry. The dataset ‘A1A Academy awards.xls' contains demographic data on award winners up until the year 2014 and was compiled by kaggle user fmejia21. In this section you will investigate the demographics of Oscar winners. You may use any software of your choice.
[A]
Complete the complementary Quiz.

[A|E]
Investigate the answer to ONE of the following questions using appropriate tables and/or charts. Write a summary of your findings (approximately 100-200 words).

Age - Calculate an estimate of an award winner's age from their date of birth and year of award. What is the average age and age range of winners? Are there any age differences across minority groups or award categories?

Birthplace - Categorise the place of birth for winners as born in USA or born overseas. What proportion of winners were not born in the USA? Is there any difference in the proportion of winners born in the USA across minority groups or award categories?

Your answer will be assessed on the following elements:
Describes results/findings that answer the question being asked.
Statements are supported by relevant tables or charts as evidence from the data.
Refers to specific quantities (counts, percentages, other statistics) as part of written answer.
Communicates clearly regarding filtered/grouped data or categories when summarising data or making comparisons.
Writes with clarity and organisation using report-style language.

Section B - Rogue waves

A ‘rogue' or ‘freak' wave is an abnormally large wave relative to the conditions. They are surface waves occurring due to gravity and are not to be confused with tsunamis which are caused by sudden impacts or shifting of the sea floor. The existence of rogue waves was confirmed in 1995 by the measurement of the Draupner wave off the coast of Norway. There have since been few empirical studies of rogue waves. One study [1] analysed six years of time series data from the South Indian Ocean offshore from South Africa over 1998-2003 and identified over 1500 potential rogue waves, 15 of which were unexpectedly large. The authors hypothesise that these outliers may be actual wave measurements rather than errors.

Due to the growing evidence for rogue waves, the "linear" model is suspected to be insufficient for predicting the likelihood of particularly large rogue waves. The model posits a linear relationship between the maximum wave height in a wave series, hmax, and the significant wave height, hs, (which is the average of the top third highest waves in a series) [2]. The study defines "typical" rogue waves as those whose ratio hmax/hs is greater than 2 and less than 4 and "uncommon" rogue waves as those with hmax/hs > 4.

For Section B of this assignment, you will analyse the dataset "A1B Rogue waves.csv". The dataset contains measurements from buoys off the coast of Mooloolaba, Queensland, Australia over the period 2017-2019 and was sourced from the Queensland Government website [2]. A time series of waves is measured every half hour. This dataset contains the processed data including variables for hmax and hs.

In this section you must use SAS to clean and analyse the dataset.

Liu, P.C. and Machutchon, K.R., Are there different kinds of rogue waves?, Proceedings of OMAE2006 25th International Conference on Offshore Mechanics and Arctic Engineering, June 4-9, 2006, hamburg, Germany

Are there any range errors for the numeric variables? Explain why/why not and explain how you would deal with them.

Remove any erroneous values from the variables hmax and hs. Show a table of appropriate descriptive statistics for these variables.

Generate a new variable for the ratio of hmax/hs. Show the histogram and describe its distribution.

Categorise ratio according to the classification of rogue waves:

Show the frequency table (make sure you use a custom format to label the categories of your new variable). What percentage of these observations are rogue waves?

Create a scatterplot of hmax vs hs and show a reference line that indicates the cutoff for rogue waves (ratio of 2:1). Make sure your graph and axes have titles.

The Raleigh distribution approximately describes the frequency distribution of wave height. For a given ratio of hmax/hs, the expected frequency is 1 in every ?? waves, where

According to this formula, what is the expected frequency of the wave in the Mooloolaba dataset that has the highest ratio?

Create a scatterplot showing the line of best fit. What is its slope? Explain your workings.

hint: you can either estimate the no-intercept linear model (e.g., look up "PROC REG") or approximate it with mean(hmax)/mean(hs).

Reference no: EM133685008

Questions Cloud

What did you learn in your schools about history of racism : What did you learn in your schools about the history of racism? If you did not learn about racism in school write how you feel about it now.

Preponderance of evidence standard : What are the advantages and disadvantages of the preponderance of evidence standard? When it comes to sexual assault cases, how could the university process.

What are the three levels of management found in a company : Explain how finding different ways to travel the same road relates to automation, streamlining, and business process reengineering

What does india facing as it responds to both imperialism : Analyze the argument that Tagore is making in his piece on nationalism in India. What challenges does he see India facing as it responds to both imperialism.

Calculate an estimate of an award winners age : STAT6001 Data Wrangling and Visualization Assignment, University of Newcastle - Calculate an estimate of an award winner's age from their date of birth and year

How is agency important resource for children and families : CFS 101- Consider, from your research, what have you come to understand about this agency? How is this agency an important resource for children and families?

How the french revolution broke from the age of absolutism : How the French Revolution broke from the Age of Absolutism and affected the future development of European armies.

What was your thoughts after the presentation : Have you ever had a speaker use PowerPoint slides, and use some of the don'ts in the slides? What was your thoughts after the presentation?

How is reflected in specific portions of the constitution : How is the Constitution a reflection of political divisions going on when it was drafted? How is this reflected in specific portions of the Constitution?

User Account

All Pages