Write a general Cohen d function to be more generally useful

Assignment Help Other Subject

Reference no: EM132333913

Control Structures Assignment -

There are six exercises. You are required to provide solutions for at least four of the five. You are required to solve at least one exercise in R, and at least one in SAS.

Exercise 1 -

Write a general Cohen d function to be more generally useful, accepting a wider range of arguments. For convenience, name this general.d.

The new function should accept two parameters, m, s

In your function, check for these conditions:

If m is of length 1 and s is length 1, then simply divide m/s - that is, proceed with the calculations as if m = %Diff and s = CV.
If m is of length 2, then calculate the difference and proceed with the calculations.
If m is of length greater than 2, find the difference between the min and max of m and proceed with the calculations.
If s is of length greater than 1 calculate pooled sd as

s2pooled = √(_i∑^ks_i²/k)

Exercise 2 -

Previously, we've calculated required replicates based on the z distribution. In this exercise, you will calculate required replicates based on the t distribution. You must implement one of two algorithms given below. For both algorithms, calculate degrees of freedom as ν = n ∗ k - k where n is the current estimate for required replicates and let k = 2

Algorithm 1 (from Cochran and Cox, Experimental Design)

Use the formula:

n ≥ 2 × (CV/%Diff)² × (t_α/2,ν + t_β,ν)²

1. Start with a small n, say, 2.

2. Calculate critical t_α/2 and t_β quintiles with ν d.f, then calculate required replicates. Label this n_current.

3. Update ν using n_current, then recalculate critical values and required replicates. Label this n_next.

4. If n_current = n_next then the algorithm has converged. Otherwise, set n_current to n_next, and repeat 2-3.

5. If after some sufficiently large number (say, 20), the algorithm hasn't converged, print a message and return the largest of n_current and n_next

Algorithm 2 -

1. Start with a small n, say, 2.

2. Calculate critical t_α quantile using the central t distribution with ν d.f.

3. Estimate Type II error (p-value) under the alternate hypothesis using the non-central t distribution with ν d.f, at the critical t from 2. Calculate non-centrality parameter as

NCP = %Diff/CV √(n/2)

4. If the resulting error is less than 1 - β, accept the current value of n. Otherwise increment n and repeat 2-3.

5. If desired power is not achieved after a large number of iterations (say, 1000), terminate the calculations and return NA.

Implement the algorithm as a function or macro named required.replicates.t, with parameters mu, sigma and an optional parameter k. Test your function by comparing with required replicates from prior exercises for calories per serving, 1936 versus 2006, 1936 vs 1997 and 1997 vs 2006.

For either algorithm, you might consider starting with an initial value of n calculated using the z critical values as before. Can you be certain that the z formula will not estimate more required replicates than the t algorithm?

Exercise 3 -

Calculate a cumulative probability value from the normal pdf, using the Newton-Cotes formula

_{x_0}∫^x_nf(x)dx ≈ _i=0∑ⁿhf(x_i)

where x₁, ..., x_n are a sequence of evenly spaced numbers from -2 . . . 2, with x_i = x₀ + hi, n is the number of x_i in the sequence and step size h = (x_n - x₀)/n.

We will calculate this integral by calculating successive approximations of f = L(x; 0, 1) = norm.pdf over series of x with increasingly smaller step sizes.

Part a - Calculate L₀ by summing over L(X₀), where X₀ is a series from x₀ = -2, . . . , x_n = 2 incremented by h₀ = 0.1. Multiply this sum by h₀ for an approximate _{x_0}∫^x_nL(x)dx.

Think of this as the sum of a series of rectangles, each h wide and a height given by the normal pdf.

Part b - Create a second series X₁ by setting h₁ = h₀/2. Compute L₁ from this series as in part a. Let i = 1 You now have the are of twice as many rectangles as part a, but each is half as wide.

Part c - Compute δ = |L_i -L_i-1|. If δ < 0.0001, your sequence of iterations has converged on a solution for L. Finish with Part d. Otherwise, increment i, let h_i = h_i-1/2. Create the next series Xi and compute the next L_i.

Hint: code this first as a for loop of a small number of i until you know your code will converge toward a solution.

Part d - Report i, n and h.

To check your results, compare your final Li to

pnorm(-2, lower.tail = TRUE)-pnorm(-2, lower.tail = TRUE)

## [1] 0

Is your estimate within 0.0001 of this value?

You might find it useful to produce staircase plots for the first 2-4 iterations (plot L_i vs X_i on one graph). You might also find it interesting to plot δ or L versus i or h. You can create vectors to hold the intermediate steps - 10 iterations should be enough. How many iterations might it take to get within 0.000001 of the expected value from R?

Exercise 4 -

Part a - Write a function to compute mean, standard deviation, skewness and kurtosis from a single vector of numeric values. You can use the built-in mean function, but must use one (and only one) for loop to compute the rest. Be sure to include a check for missing values. Note that computationally efficient implementations of moments calculations take advantage of (Y_i - Y¯)⁴ = (Y_i - Y¯) × (Y_i - Y¯)³, etc.

Your function should return a list with Mean, SD, Skewness and Kurtosis. If you use IML, you will need to implement this as a subroutie and use call by reference; include these variables in parameter list.

Part b - Test your function by computing moments for Price from pumpkins.csv, for ELO from elo.csv or the combine observations from SiRstvt. If find that ELO shows both skewness and kurtosis, Price is kurtotic but not skewed, while SiRstvt are approximately normal.

If you wish, compare your function results with the skewness and kurtosis in the moments package. This package also implements test of significance for skewness and kurtosis.

Exercise 5 -

In this exercise, we will use run-time profiling and timing to compare the speed of execution for different functions or calculations. In the general, the algorithm will be

1. Write a loop to execute a large number of iterations. I find 10⁶ to be useful; you might start with a smaller number as you develop your code.

2. In this loop, call a function or perform a calculation. You don't need to use or print the results, just assign the result to a local variable.

3. Repeat 1 and 2, but with a different function or formula.

4. Repeat steps 1-3 10 times, saving the time of execution for each pair of the 10 tests. Calculate mean, standard deviation and effect size for the two methods tested.

If you choose R, I've included framework code using Rprof; I've included framework code for IML in the SAS template.

Test options - In homework, you were given two formula for the Poisson pmf,

f(x; λ) = e^-λλ^x/x!

= exp(-λ)(1/x!)exp[x × log(λ)]

Compare the computationally efficiency of these two formula.

Create a sequence x of numbers -3 to 3 of length 10⁶ or so. In the first test, determine the among of time it takes to compute 105 estimates of norm.pdf by visiting each element of x in a loop. In the second test, simply pass x as an argument to norm.pdf. Does R or IML optimize vector operations?
The mathematical statement √x can be coded as either sqrt(x) or xˆ(1/2). Similarly, e^x can be written as exp(1)^x or exp(x). These pairs are mathematically equivalent, but are they computationally equivalent. Write two test loops to compare formula with either √x or e^x of some form (the normal pdf, perhaps).

Exercise 6 -

Write an improved Poisson pmf function, call this function smart.pois, using the same parameters x and lamba as before, but check x for the following conditions. 1. If x is negative, return a missing value (NA,.). 2. If x is non-integer, truncate x then proceed. 3. If x is too large for the factorial function, return the smallest possible numeric value for your machine. What x is too large? You could test the return value of factorial against Inf.

You can reuse previously tested code writing this function as a wrapper for a previously written pois.pmf and call that function only after testing the for specified conditions.

Test this function by repeating the plots from Homework 4, Ex 4. How is the function different than dpois?

Warning You may not be able to call this new function exactly as in the last exercise (Hint - what are the rules for conditions in if statements?). Instead, you might need to create a matrix or data table and use apply functions, or write a loop for visit each element in a vector of x.

Note - Just do 4 exercises in R and 1 in SAS.

Attachment:- Control Structures Assignment Files.rar

Reference no: EM132333913

Questions Cloud

Explain key principles of mental health legislation : ADVO 306 -J/502/3296-Independent Mental Health Advocacy-Pearson Edexcel Level 5 Diploma in Leadership for Health and Social Care and Children and Young People.

Provide independent mental capacity advocacy : ADV 305-F/502/3295-Independent Mental Capacity Advocacy-Pearson Edexcel Level 5 Diploma in Leadership for Health and Social Care and Children and Young People.

Why the chosen area represents key issues in compliance : Write a paragraph about each one to present to your faculty member. In each paragraph, explain why you believe the chosen area represents key issues.

Evaluate the impact of early intervention : CYPOP 17-F/600/9777-Understand the Needs of Children and Young People who are Vulnerable and Experiencing Poverty and Disadvantage-Pearson Edexcel Level 5.

Write a general Cohen d function to be more generally useful : Write a general Cohen d function to be more generally useful, accepting a wider range of arguments. For convenience, name this general.d.

Develop financial forecasts : BSBFIM801 - Manage financial resources and Describe the products or services that your business will produce and Identify which type of forecasting technique

Discuss potential vulnerabilities in two paragraphs : Please discuss potential vulnerabilities in two paragraphs. You must have a minimum of 3 credible citations to support your claims or arguments.

Explain risks of not having information when making decision : Provide specific examples and explain how decisions are improved when the information is used accurately. Then, explain the risks of not having the information.

What do you predict will happen to oil prices in the future : You will research oil price changes and the impact on the economy. Start your research by retrieving historical data (1980-2016) on oil price and growth rate.

Reviews

len2333913

7/5/2019 10:54:30 PM

There are six exercises. You are required to provide solutions for at least four of the five. You are required to solve at least one exercise in R, and at least one in SAS. You are required to provide five solutions, each solution will be worth 10 points. Thus, you may choose to provide both R and SAS solutions for a single exercise, or you may solve five of the sixth problems, mixing the languages as you wish. Warning I will continue restricting the use of external libraries in R, particularly tidyverse libraries. You may choose to use ggplot2, but take care that the plots you produce are at least as readable as the equivalent plots in base R. You will be allowed to use whatever libraries tickle your fancy in the midterm and final projects.

Write a Review

Required(*) Message

User Account

All Pages