Write a general Cohen d function to be more generally useful

Assignment Help Other Subject
Reference no: EM132333913

Control Structures Assignment -

There are six exercises. You are required to provide solutions for at least four of the five. You are required to solve at least one exercise in R, and at least one in SAS.

Exercise 1 -

Write a general Cohen d function to be more generally useful, accepting a wider range of arguments. For convenience, name this general.d.

The new function should accept two parameters, m, s

In your function, check for these conditions:

  • If m is of length 1 and s is length 1, then simply divide m/s - that is, proceed with the calculations as if m = %Diff and s = CV.
  • If m is of length 2, then calculate the difference and proceed with the calculations.
  • If m is of length greater than 2, find the difference between the min and max of m and proceed with the calculations.
  • If s is of length greater than 1 calculate pooled sd as

s2pooled = √(iksi2/k)

Exercise 2 -

Previously, we've calculated required replicates based on the z distribution. In this exercise, you will calculate required replicates based on the t distribution. You must implement one of two algorithms given below. For both algorithms, calculate degrees of freedom as ν = n ∗ k - k where n is the current estimate for required replicates and let k = 2

Algorithm 1 (from Cochran and Cox, Experimental Design)

Use the formula:

n ≥ 2 × (CV/%Diff)2 × (tα/2,ν + tβ,ν)2

1. Start with a small n, say, 2.

2. Calculate critical tα/2 and tβ quintiles with ν d.f, then calculate required replicates. Label this ncurrent.

3. Update ν using ncurrent, then recalculate critical values and required replicates. Label this nnext.

4. If ncurrent = nnext then the algorithm has converged. Otherwise, set ncurrent to nnext, and repeat 2-3.

5. If after some sufficiently large number (say, 20), the algorithm hasn't converged, print a message and return the largest of ncurrent and nnext

Algorithm 2 -

1. Start with a small n, say, 2.

2. Calculate critical tα quantile using the central t distribution with ν d.f.

3. Estimate Type II error (p-value) under the alternate hypothesis using the non-central t distribution with ν d.f, at the critical t from 2. Calculate non-centrality parameter as

NCP = %Diff/CV √(n/2)

4. If the resulting error is less than 1 - β, accept the current value of n. Otherwise increment n and repeat 2-3.

5. If desired power is not achieved after a large number of iterations (say, 1000), terminate the calculations and return NA.

Implement the algorithm as a function or macro named required.replicates.t, with parameters mu, sigma and an optional parameter k. Test your function by comparing with required replicates from prior exercises for calories per serving, 1936 versus 2006, 1936 vs 1997 and 1997 vs 2006.

For either algorithm, you might consider starting with an initial value of n calculated using the z critical values as before. Can you be certain that the z formula will not estimate more required replicates than the t algorithm?

Exercise 3 -

Calculate a cumulative probability value from the normal pdf, using the Newton-Cotes formula

x_0x_nf(x)dx ≈ i=0nhf(xi)

where x1, ..., xn are a sequence of evenly spaced numbers from -2 . . . 2, with xi = x0 + hi, n is the number of xi in the sequence and step size h = (xn - x0)/n.

We will calculate this integral by calculating successive approximations of f = L(x; 0, 1) = norm.pdf over series of x with increasingly smaller step sizes.

Part a - Calculate L0 by summing over L(X0), where X0 is a series from x0 = -2, . . . , xn = 2 incremented by h0 = 0.1. Multiply this sum by h0 for an approximate x_0x_nL(x)dx.

Think of this as the sum of a series of rectangles, each h wide and a height given by the normal pdf.

Part b - Create a second series X1 by setting h1 = h0/2. Compute L1 from this series as in part a. Let i = 1 You now have the are of twice as many rectangles as part a, but each is half as wide.

Part c - Compute δ = |Li -Li-1|. If δ < 0.0001, your sequence of iterations has converged on a solution for L. Finish with Part d. Otherwise, increment i, let hi = hi-1/2. Create the next series Xi and compute the next Li.

Hint: code this first as a for loop of a small number of i until you know your code will converge toward a solution.

Part d - Report i, n and h.

To check your results, compare your final Li to

pnorm(-2, lower.tail = TRUE)-pnorm(-2, lower.tail = TRUE)

## [1] 0

Is your estimate within 0.0001 of this value?

You might find it useful to produce staircase plots for the first 2-4 iterations (plot Li vs Xi on one graph). You might also find it interesting to plot δ or L versus i or h. You can create vectors to hold the intermediate steps - 10 iterations should be enough. How many iterations might it take to get within 0.000001 of the expected value from R?

Exercise 4 -

Part a - Write a function to compute mean, standard deviation, skewness and kurtosis from a single vector of numeric values. You can use the built-in mean function, but must use one (and only one) for loop to compute the rest. Be sure to include a check for missing values. Note that computationally efficient implementations of moments calculations take advantage of (Yi - Y¯)4 = (Yi - Y¯) × (Yi - Y¯)3, etc.

Your function should return a list with Mean, SD, Skewness and Kurtosis. If you use IML, you will need to implement this as a subroutie and use call by reference; include these variables in parameter list.

Part b - Test your function by computing moments for Price from pumpkins.csv, for ELO from elo.csv or the combine observations from SiRstvt. If find that ELO shows both skewness and kurtosis, Price is kurtotic but not skewed, while SiRstvt are approximately normal.

If you wish, compare your function results with the skewness and kurtosis in the moments package. This package also implements test of significance for skewness and kurtosis.

Exercise 5 -

In this exercise, we will use run-time profiling and timing to compare the speed of execution for different functions or calculations. In the general, the algorithm will be

1. Write a loop to execute a large number of iterations. I find 106 to be useful; you might start with a smaller number as you develop your code.

2. In this loop, call a function or perform a calculation. You don't need to use or print the results, just assign the result to a local variable.

3. Repeat 1 and 2, but with a different function or formula.

4. Repeat steps 1-3 10 times, saving the time of execution for each pair of the 10 tests. Calculate mean, standard deviation and effect size for the two methods tested.

If you choose R, I've included framework code using Rprof; I've included framework code for IML in the SAS template.

Test options - In homework, you were given two formula for the Poisson pmf,

f(x; λ) = eλx/x!

= exp(-λ)(1/x!)exp[x × log(λ)]

Compare the computationally efficiency of these two formula.

  • Create a sequence x of numbers -3 to 3 of length 106 or so. In the first test, determine the among of time it takes to compute 105 estimates of norm.pdf by visiting each element of x in a loop. In the second test, simply pass x as an argument to norm.pdf. Does R or IML optimize vector operations?
  • The mathematical statement √x can be coded as either sqrt(x) or xˆ(1/2). Similarly, ex can be written as exp(1)x or exp(x). These pairs are mathematically equivalent, but are they computationally equivalent. Write two test loops to compare formula with either √x or ex of some form (the normal pdf, perhaps).

Exercise 6 -

Write an improved Poisson pmf function, call this function smart.pois, using the same parameters x and lamba as before, but check x for the following conditions. 1. If x is negative, return a missing value (NA,.). 2. If x is non-integer, truncate x then proceed. 3. If x is too large for the factorial function, return the smallest possible numeric value for your machine. What x is too large? You could test the return value of factorial against Inf.

You can reuse previously tested code writing this function as a wrapper for a previously written pois.pmf and call that function only after testing the for specified conditions.

Test this function by repeating the plots from Homework 4, Ex 4. How is the function different than dpois?

Warning You may not be able to call this new function exactly as in the last exercise (Hint - what are the rules for conditions in if statements?). Instead, you might need to create a matrix or data table and use apply functions, or write a loop for visit each element in a vector of x.

Note - Just do 4 exercises in R and 1 in SAS.

Attachment:- Control Structures Assignment Files.rar

Reference no: EM132333913

Questions Cloud

Explain key principles of mental health legislation : ADVO 306 -J/502/3296-Independent Mental Health Advocacy-Pearson Edexcel Level 5 Diploma in Leadership for Health and Social Care and Children and Young People.
Provide independent mental capacity advocacy : ADV 305-F/502/3295-Independent Mental Capacity Advocacy-Pearson Edexcel Level 5 Diploma in Leadership for Health and Social Care and Children and Young People.
Why the chosen area represents key issues in compliance : Write a paragraph about each one to present to your faculty member. In each paragraph, explain why you believe the chosen area represents key issues.
Evaluate the impact of early intervention : CYPOP 17-F/600/9777-Understand the Needs of Children and Young People who are Vulnerable and Experiencing Poverty and Disadvantage-Pearson Edexcel Level 5.
Write a general Cohen d function to be more generally useful : Write a general Cohen d function to be more generally useful, accepting a wider range of arguments. For convenience, name this general.d.
Develop financial forecasts : BSBFIM801 - Manage financial resources and Describe the products or services that your business will produce and Identify which type of forecasting technique
Discuss potential vulnerabilities in two paragraphs : Please discuss potential vulnerabilities in two paragraphs. You must have a minimum of 3 credible citations to support your claims or arguments.
Explain risks of not having information when making decision : Provide specific examples and explain how decisions are improved when the information is used accurately. Then, explain the risks of not having the information.
What do you predict will happen to oil prices in the future : You will research oil price changes and the impact on the economy. Start your research by retrieving historical data (1980-2016) on oil price and growth rate.

Reviews

len2333913

7/5/2019 10:54:30 PM

There are six exercises. You are required to provide solutions for at least four of the five. You are required to solve at least one exercise in R, and at least one in SAS. You are required to provide five solutions, each solution will be worth 10 points. Thus, you may choose to provide both R and SAS solutions for a single exercise, or you may solve five of the sixth problems, mixing the languages as you wish. Warning I will continue restricting the use of external libraries in R, particularly tidyverse libraries. You may choose to use ggplot2, but take care that the plots you produce are at least as readable as the equivalent plots in base R. You will be allowed to use whatever libraries tickle your fancy in the midterm and final projects.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd