Create an ordered treatment pairs table

Assignment Help Other Subject
Reference no: EM132335312

Assignment - Data Manipulation

There are six exercises below. You are required to provide solutions for at least four of the five. You are required to solve at least one exercise in R, and at least one in SAS. If you choose SAS for an exercise, you may use IML, DATA operations or PROC SQL at your discretion.

Reuse - For many of these exercises, you may be able to reuse functions written in prior homework. Define those functions here.

Exercise 1 -

Part a - Repeat the table from Homework 5, Exercise 2. The table will contain 30 rows, each corresponding to a unique combination of CV from 8, 12, ..., 28 and Diff from 5, 10, ..., 25. However, for this exercise you only need to calculate one column for required replicates (α = 0.05 and β = 0.2)

Define the table. Do not print this table.

Part b - Create two subset tables, one that contains the combinations of CV and Diff that require the five largest number of replicates and one the contains the combinations of CV and Diff the five smallest number of replicates. You can determine the subset by ranking or sorting by required replicates. You can add a rank column to your table if you wish. Call one table LargestFive and one table SmallestFive.

Part c - Print LargestFive sorted by required replicates in descending order, and print SmallestFive in ascending order.

Exercise 2 -

Part a - Go to Statistical Reference Datasets (StRD) website and use the data listed under Data File in Table Format. You may reuse the file from Homework 6. Load the data into a table below.

Part b - Reshape or transpose this table from the wide format to the long format. Make sure the resulting table has two columns - Resistance and Instrument.

Part c - To confirm that the table was reshaped correctly, use aggregate or tapply to calculate mean Resistance grouped by Instrument from the long table, and use apply or colMeans to calculate column means from the wide table. Print and compare the results.

Note that the reshaped table should be equivalent to the file linked under 'Data File in Two-Column Format'.

Exercise 3 -

Create an ordered treatment pairs table from the pumpkin.csv. In the submitted work print the table only once at the end of the exercise.

Part a - Read the pumpkin data and compute mean mi, standard deviation si and count ni for each level i of Class.

Part b - Create a table over all possible pairs i, j of k Classes from these data. Let one table column be i and another column be j. Let i = 1, 2, . . . (k - 1) and j = i + 1, i + 2, . . . k. There will be (k × (k - 1))/2 rows in this table. I usually create an empty table, then fill the table using a pair of nested loops, the outer loop over i and the inner loop over j. Use a counter variable to keep track of the current row and increment the counter in each step of the inner loop.

Part c - Calculate Cohen's d for each Class pair in this table. Use a pooled standard deviation given by

spooled = √(∑i(ni-1)si2/N-k)

You may add Class means to the table if you wish. Sort the table by d in descending order and print the table.

Exercise 4 -

Part a - Download the two files from D2L ncaa2018.csv and ncaa2019.csv, and read into data frames or tables. ncaa2018.csv comes from the same source as elo.csv from Homework 5, while ncaa2019.csv is the corresponding more recent data. These tables do not contain identical sets of columns, but we will be able to merge Finish by individual wrestlers. Do not print these tables.

Part b - The tables list the wrestlers qualifying for the NCAA 2018 and 2019 National Champions, respectively. Merge the tables into a single table that contains only those wrestlers who qualified for both tournaments. Use the columns Last and First to merge on; neither is unique for all wrestlers.

The merged table should have columns corresponding to Finish 2018 and Finish 2019 - you can leave the column names as the defaults produced by R or SAS. To check the merge, print the number of rows in the table, and determine if there are any missing values in either Finish column (sum or any are sufficient. Do not print the table.

Part c - Print a contingency table comparing Weight for 2018 and Weight for 2019. The sum of all cells in this table will be equal to the total number of wrestlers that competed in both tournaments; the sum of the main diagonal will be the number of wrestlers that competed in the same weight class for both. How many wrestlers changed weight classes?

Exercise 5 -

Background - I'm working on software that produces a repeated measures analysis. To test my code, I use published data and compare results. For one analysis, I used data from Contemporary Statistical Models for the Plant and Soil Sciences, Oliver Schabenberger and Francis J. Pierce, 2001. These data are measurements of the diameter of individual apples from selected apple trees.

Part a - Download the AppleData.csv if you choose R, the SAS data is included in the SAS template. Note the file include comments for the data; you may need to specify comment character in import. Do not print this table.

To simplify this exercise, create a subset of the AppleData including only trees number 3, 7 and 10.

Part b - Reshape or transpose this data from the long form to the wide form. Call this data AppleWide. This table should have one column for Tree, one column for Apple and six columns, diam.1 - diam.6. The values in the time columns come from diam in AppleData.

Part c - To confirm that you've reshaped correctly, print column means for the wide data set and use an aggregate or apply function to compute time means for the long format.

Part d - I choose this example for a test case because it shows a case where the best repeated measures model is an auto-regressive model - each measure is correlated with the preceding measure. We can estimate the degree of using the following R code. You don't need to evaluate this code for this exercise; it's provided as a motivation for reshaping the data.

mult.lm <- lm(cbind(diam.1, diam.2, diam.3, diam.4, diam.5, diam.6) ~ tree, data=AppleWide)

mult.manova <- manova(mult.lm)

print(cov2cor(estVar(mult.lm)))

Exercise 6 -

This is an exercise in computing the Wilcoxon Signed Rank test. We will be using an example from NIST (NATR332.DAT).

The data are provided:

NATR332.DAT <- data.frame(

Y1 = c(146,141,135,142,140,143,138,137,142,136),

Y2 = c(141,143,139,139,140,141,138,140,142,138)

)

Part a - Add a column Difference that is the difference between Y1 and Y2. For further analysis, exclude any rows where the difference is 0.

Next add the column Rank, which will be the rank of the absolute value of Difference.

Part c - Add the column SignedRank by applying the sign (+ or -) of Difference, to Rank (that is, if Difference is < 0, then SignedRank is -Rank, otherwise SignedRank is Rank).

Part d - Compute the sum of the positive ranks, and the absolute value of the sum of the negative ranks. Let W be the minimum of these two sums. Print W.

The expected mean of W is calculated by

µW = Nr ∗ (Nr + 1)/4

with a standard deviation of

σW = √(Nr(Nr + 1)(2Nr + 1)/6)

where Nr is the number of ranked values (excluding differences of 0). Calculate a z score by

zW = (W - µW)/σW

Print both µW and zW.

The NIST page gives a p-values based on the continuity correction. We are not computing this correction.

You can compute the P(z > zW) of your zW (using the normal distribution) and compare it to

wilcox.test(NATR332.DAT$Y1, NATR332.DAT$Y2, paired = TRUE, correct = FALSE, alternative = "less")

while the corrected p-values are given by

wilcox.test(NATR332.DAT$Y1, NATR332.DAT$Y2, paired = TRUE, correct = TRUE, alternative = "less")

wilcox.test(NATR332.DAT$Y1, NATR332.DAT$Y2, paired = TRUE, correct = TRUE, alternative = "greater")

Attachment:- Data Manipulation Assignbment File.rar

Reference no: EM132335312

Questions Cloud

Why two databases are better than a general internet search : Identify two GCU Library scholarly databases that will help you find the best research articles to support your EBP proposal. Discuss why these two databases.
Based on governance and sustainability at nike : Based on governance and sustainability at Nike, What competitive advantage does this provide the leader at Nike when attempting to engage in business globally?
Is there one in particular that you believe : Describe the four major emphasis strategies discussed in Chapter 2 that are used to enhance verbal communication.
At what point does management need to become involved : How much decision making power and authority should the teams have? At what point does management need to become involved?
Create an ordered treatment pairs table : Assignment - Data Manipulation. Create an ordered treatment pairs table from the pumpkin.csv. Read the pumpkin data and compute mean mi
Representative of a specific culture : Please note that a cultural event is an event that is representative of a specific culture (cultural events can include art, music, or sporting events, among
Attempting to engage in business globally : What competitive advantage does this provide the leader at Nike when attempting to engage in business globally?
Explain how the roles of medical social workers : Explain how the roles of medical social workers have evolved over the course of the epidemic. how HIV/AIDS epidemic has changed since it first emerged in 1981
Define barriers that hold nursing practice : The Institute of Medicine has stated a goal that 90% of practice be evidence-based by 2020. According to HealthyPeople.gov, the United States is currently at.

Reviews

len2335312

7/8/2019 10:36:24 PM

There are six exercises. You are required to provide solutions for at least four of the five. You are required to solve at least one exercise in R, and at least one in SAS. You are required to provide five solutions, each solution will be worth 10 points. Thus, you may choose to provide both R and SAS solutions for a single exercise, or you may solve five of the sixth problems, mixing the languages as you wish. If you choose SAS for an exercise, you may use IML, DATA operations or PROC SQL at your discretion. Warning I will continue restricting the use of external libraries in R, particularly tidyverse libraries. You may choose to use ggplot2, but take care that the plots you produce are at least as readable as the equivalent plots in base R. You will be allowed to use whatever libraries tickle your fancy in the midterm and final projects.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd