Create histograms QQ-norm and box-whisker plots for ELO

Assignment Help Other Subject
Reference no: EM132345133

Graphs Assignment -

General Instructions - There are 5 exercises. You are required to solve at least one exercise in R, and at least one in SAS.

Experimental - Again, you will be allowed to provide one solution using Python. Elaborate on the similarities and differences between Python function definitions and R or IML or Macro language.

Exercise 1 -

Part a - Load the ncaa2018.csv data set and create histograms, QQ-norm and box-whisker plots for ELO. Add a title to each plot, identifying the data.

Part b - A common recommendation to address issues of non-normality is to transform data to correct for skewness. One common transformation is the log transform.

Transform ELO to log(ELO) and produce histograms, box-whisker and qqnorm plots of the transformed values. Are the transformed values more or less skewed than the original? (Note - the log transform is used to correct skewness, it is less useful for correcting kurtosis).

Exercise 2 -

Review Exercise 4, Homework 6, where you calculated skewness and kurtosis. We will reproduce the histograms, and add qqnorm and box-whisker plots.

Part a - Use the code below from lecture to draw 1000 samples from the normal distribution.

norm.sample <- rnorm(1000, mean=0, sd=1)

Look up the corresponding r* functions in R for the Cauchy distribution (use location=0, scale=1), and the Weibull distribution (use shape = 1.5). For the double exponential, use you can use the *laplace functions from the rmutil library, or you can use rexp(1000) - rexp(1000)

Draw 1000 samples from each of these distributions. Calculate skewness and kurtosis for each sample. You may use your own function, or use the moments library.

Part b - Plot the histograms for each distribution. Use par(mfrow=c(2,2)) in your code chunk to combine the four histogram in a single plot. Add titles to the histograms indicating the distribution. Set the x-axis label to show the calculated skewness and kurtosis, i.e. skewness = ####, kurtosis = #### par(mfrow=c(2,2))

Part c - Repeat Part b, but with QQ-norm plots.

Part d - Repeat Part b, but with box-whisker plots.

Exercise 3 -

Part a - We will create a series of graphs illustrating how the Poisson distribution approaches the normal distribution with large λ. We will iterate over a sequence of lambda, from 2 to 64, doubling lambda each time. For each 'lambda' draw 1000 samples from the Poisson distribution.

Calculate the skewness of each set of samples, and produce histograms, QQ-norm and box-whisker plots. You can use par(mfrow=c(1,3)) to display all three for one lambda in one line. Add lambda=## to the title of the histogram, and skewness=## to the title of the box-whisker plot.

Part b - Remember that lambda represents the mean of a discrete (counting) variable. At what size mean is Poisson data no longer skewed, relative to normally distributed data? You might run this 2 or 3 times, with different seeds; this number varies in my experience.

par(mfrow=c(1,3))

If you do this in SAS, create a data table with data columns each representing a different µ. You can see combined histogram, box-whisker and QQ-norm, for all columns, by calling proc univariate data=Distributions plot;

run;

At what µ is skewness of the Poisson distribution small enough to be considered normal?

Exercise 4 -

Part a - Write a function that accepts a vector vec, a vector of integers, a main axis label and an x axis label. This function should 1. iterate over each element i in the vector of integers 2. produce a histogram for vec setting the number of bins in the histogram to i 3. label main and x-axis with the specified parameters. 4. label the y-axis to read Frequency, bins = and the number of bins.

Hint: You can simplify this function by using the parameter ... - see ?plot or ?hist

Part b - Test your function with the hidalgo data set (see below), using bin numbers 12, 36, and 60. You should be able to call your function with something like

plot.histograms(hidalgo.dat[,1],c(12,36,60), main="1872 Hidalgo issue",xlab= "Thickness (mm)")

to plot three different histograms of the hidalgo data set.

If you do this in SAS, write a macro that accepts a table name, a column name, a list of integers, a main axis label and an x axis label. This macro should scan over each element in the list of integers and produce a histogram for each integer value, setting the bin count to the element in the input list, and labeling main and x-axis with the specified parameters. You should label the y-axis to read Frequency, bins = and the number of bins.

Test your macro with the hidalgo data set (see below), using bin numbers 12, 36, and 60. You should be able to call your macro with something like

%plot_histograms(hidalgo, y, 12 36 60, main="1872 Hidalgo issue", xlabel="Thickness (mm)");

to plot three different histograms of the hidalgo data set.

Hint: Assume 12 36 60 resolve to a single macro parameter and use %scan. Your macro definition can look something like

%macro plot_histograms(table_name, column_name, number_of_bins, main="Main", xlabel="X Label")

Exercise 5 -

We've been working with data from Wansink and Payne, Table 1:

Reproducing part of Wansink Table 1 (see attached file)

However, in Homework 2, we also considered the value given in the text

The resulting increase of 168.8 calories (from 268.1 calories . . . to 436.9 calories . . . ) represents a 63.0% increase . . . in calories per serving.

There is a discrepancy between two values reported for calories per serving, 2006. We will use graphs to attempt to determine which value is most consistent.

First, consider the relationship between Calories per Serving and Calories per Recipe:

Calories per Serving = Calories per Recipe / Servings per Recipe

Since Servings per Recipe is effectively constant over time (12.4-13.0), we can assume the relationship between Calories per Serving and Calories per Recipe is linear,

Calories per Serving = β0 + β1 × Calories per Recipe

with Servings per Recipe = 1/β1

We will fit a linear model, with Calories per Recipe as the independent variable against two sets of values for Calories per Serving, such that

  • Assumption 1. The value in the table (384.4) is correct.
  • Assumption 2. The value in the text (436.9) is correct.

We use the data:

Part a - Plot the regression. Use points to plot Assumption1 vs CaloriesPerRecipe, and Assumption2 vs CaloriesPerRecipe, on the same graph. Add lines (i.e. abline) to show the ?t from the regression. Use different colors for the two assumptions. Which of the two lines appears to best explain the data?

Part b - Produce diagnostic plots of the residuals from both linear models (in R, use residuals(Assumption1.lm)). qqnorm or box-whisker plots will probably be the most effective; there are too few points for a histogram. Use the code below to place two plots, side by side. You can produce more than one pair of plots, if you wise.

par(mfrow=c(1,2))

par(mfrow=c(1,2))

From these plots, which assumption is most likely correct. That is, which assumption produces a linear model that least violates assumptions of normality of the residual errors? Which assumption produces outliers in the residuals?

I've included similar data and linear models for SAS in the SAS template. If you choose SAS, you will need to modify the PROC GLM code to produce the appropriate diagnostic plots.

Attachment:- Graphs Assignment Files.rar

Reference no: EM132345133

Questions Cloud

Staff performance and provide feedback and coaching : Why it is so important to continuously monitor staff performance and provide feedback and coaching
Recommend changes to the proposed training program : What do you think of this? Is it likely that hotel staff will be able to learn how to handle unhappy customers from just listening to a presentation?
Different types of controls applied by management : There are different types of controls applied by management within the organization. However, not all controls are applicable to the hospitality industry.
Identify and discuss the nine points of security : Identify and discuss the "9 points of security". Different authors and time periods will have different viewpoints.
Create histograms QQ-norm and box-whisker plots for ELO : Graphs Assignment - Load the ncaa2018.csv data set and create histograms, QQ-norm and box-whisker plots for ELO
Compelling of interest about this particular artist-artwork : What is compelling/of interest about this particular artist/artwork. Response to a Master Artwork and b) Project Description
Articles for instance of abuse of power in a corporate : Refering 2 news articles for instance of abuse of power in a corporate, government, religious, or other organizational environment.
How important is diversity as a goal in university admission : How important is diversity as a goal in university admissions? and how should universities go about selecting diverse students?
Petition for a union or start an organizing campaign : How can you Describe the process for workers to petition for a union or start an organizing campaign?

Reviews

len2345133

7/24/2019 9:41:15 PM

Instructions: Please read instructions in the pdf file. All codes should be in the rmd template for all the 4 exercises, no exception. Provide output/results in pdf format. General Instructions - There are 5 exercises, each is worth 10 points. You are required to solve at least one exercise in R, and at least one in SAS. You are required to provide five solutions, each solution will be worth 10 points. For this exercise, you may use whatever graphics library you desire.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd