Create an ordered treatment pairs table

Assignment Help Other Subject

Reference no: EM132339167

Processing Text Assignment -

Instructions - There are six exercises below. You are required to provide solutions for at least four of the six using R. Please confirm which one you would be solving. Please share the output files. You are required to solve at least one exercise in R, and at least one in SAS. If you choose SAS for an exercise, you may use IML, DATA operations or PROC SQL at your discretion.

Exercise 1 - Write a loop or a function to convert a matrix to a CSV compatible string. Given a matrix of the form

C1	C2	C3
a	b	c
d	e	f
g	h	i

produce a string of the form

a,b,c\n,d,e,f\n,g,h,i

where \n is the newline character.

You are only required to convert a matrix to CSV format, but you may choose to write code to convert data tables to CSV; in this case, include column names in the output string. Use NATR332.DAT as a test case.

NATR332.DAT <- data.frame(

Y1 = c(146,141,135,142,140,143,138,137,142,136),

Y2 = c(141,143,139,139,140,141,138,140,142,138)

)

If you choose SAS, I've include the NATR332 data table and framework code for IML in the template. I used the CATX function in IML. I found I could do this in one line in R, with judicious use of apply, but I haven't found the equivalent in IML. Instead, I used a pair of nested loops to "accumulate" an increasingly longer string.

Exercise 2 - Create an ordered treatment pairs table from the pumpkin data. Before printing the table, iterate over each row to create a vector of row names that are more descriptive. First, use levels to get the text associated with each Class, then combine the Class text to create a row name of the form:

Blue vs Cinderella

(where Blue is the Class description for class 1, Cinderella is the description for class 2. This text should be the row name in the row corresponding to i = 1 and j = 2). You may choose to add a column with the specified descriptions, if you wish, but this must be the first column of the printed table.

Exercise 3 - Calculate MSW, MSB, F and p for the data from Wansink Table 1 where

MSB = ∑_in_i(x_i-x^-)²/(k-1)

MSW = ∑_i(n_i-1)s_i²/(N-k)

Start with the strings:

Means <- "268.1 271.1 280.9 294.7 285.6 288.6 384.4"

StandardDeviations <- "124.8 124.2 116.2 117.7 118.3 122.0 168.3"

SampleSizes <- "18 18 18 18 18 18 18"

Tokenize the strings, then convert the tokens to a create vectors of numeric values. Use these vectors to compute and print MSW, MSB, F and p.

If you use SAS, I've provided macro variables that can be tokenized in either macro language or using SAS functions. You can mix and match macro, DATA, IML or SQL processing as you wish, but you must write code to convert the text into numeric tokens before processing.

Compare your results from previous homework, or to the resource given in previous homework, to confirm that the text was correctly converted to numeric values.

Exercise 4 - Repeat the regression analysis, but start with the text

Rate <- "Rate | 23000 | 24000 | 25000 | 26000 | 27000 | 28000 | 29000"

Yield <- "Yield | 111.4216 | 155.0326 | 181.1176 | 227.5800 | 233.4623 | 242.1753 | 231.3890"

Note that by default, strsplit in R will read split as a regular expression, and | is a special character in regular expressions. You will need to change one of the default parameters for this exercise.

Tokenize these strings and convert to numeric vectors, then use these vectors to define

Solve for and print βˆ.

Compare your results from previous homework, or to the resource given in previous homework, to confirm that the text was correctly converted to numeric values.

Exercise 5 - Use the file openmat2015.csv from D2L. This is a list of top-ranked high school wrestlers in 2015, their high School, Weight class and in some cases the College where they expected to enroll and compete after high school.

We wish to know how many went on to compete in the national championship in 2019, so we will merge this table with the data from Homework 7, ncaa2019.csv. The openmat2015.csv data contains only a single column, Name. You will need to split the text in this column to create the columns First and Last required to merge with ncaa2019.csv.

Do not print these tables in the submitted work Instead, print a contingency table comparing Weight for 2015 and Weight for 2019. What is the relationship between high school and college weight classes? You may instead produce a scatter plot or box-whisker plot, using high school weight class as the independent variable.

If you do this in SAS, use the openmat2015SAS.csv file, it will import College correctly.

Exercise 6 - Use file openmat2015.csv from Exercise 6, and use partial text matching to answer these questions. To show your results, print only the rows from the table that match the described text patterns, but to save space, print only Name, School and College. Each of these can be answered in a single step.

Which wrestlers come from a School with a name starting with St.?
Which wrestlers were intending to attend an Iowa College?
Which wrestlers were intending to start College in 2016 or 2017 (College will end with 16 or 17)?
Which wrestlers are intending compete in a sport other than wrestling? (look for a sport in parenthesis in the College column. Note - (is a special character in regular expressions, so to match the exact character, it needs to be preceded by the escape character \. However, \ in strings is a special character, so itself must be preceded by the escape character.

Attachment:- Processing Text Assignment Files.rar

Reference no: EM132339167

Questions Cloud

Someone explain this in a good amount of detail : Can someone explain this in a good amount of detail so I can have better understanding of the reasoning as to why this is behind it.

Examining how the roots of a system change : Examining how the roots of a system change with variation of a certain system parameter. This is a technique used as a stability criterion in the field

Write a brief case history of your chosen venue : Write a brief case history of your chosen venue, and this should be considered as a prelude - Create a blank Venue Condition Assessment form using the Microsoft

Explain what is meant corporate governance : RDI/EDEXCEL-Business-Level 4-BTEC Higher Nationals-Financial Systems and Auditing- Explain the key purposes of the financial statements contained.

Create an ordered treatment pairs table : Processing Text Assignment - Create an ordered treatment pairs table from the pumpkin data. You may choose to add a column with the specified descriptions

Explain future inventions inspired by the iphone x : Please explain future inventions inspired by the iPhone X. Also, please predict the iPhone demand in next 5 years.

Common types of unethical behaviors in organizations : What are some of the common types of unethical behaviors in organizations? Why should leaders monitor these behaviors? What types of leaders implement ethical

Cultural impact of sustainability and commitment for nike : What has been the cultural impact of sustainability and commitment for Nike?

Does this sequence of moves make strategic sense for pfizer : 1. Does this sequence of moves make strategic sense for Pfizer? For Warner lambert?

Reviews

len2339167

7/15/2019 12:44:44 AM

Instructions - 4 problems to be solved using R. Please confirm which one you would be solving. Please share the output files. I've shared all the details but also read them from the instructions in the exercise pdf file top of the first page. The other thing can expert use the templates when answering the exercises. There is one for R. Let me know if you have questions before time, not on the due date.

7/15/2019 12:44:39 AM

There are six exercises below. You are required to provide solutions for at least four of the six. You are required to solve at least one exercise in R, and at least one in SAS. You are required to provide five solutions, each solution will be worth 10 points. Thus, you may choose to provide both R and SAS solutions for a single exercise, or you may solve five of the sixth problems, mixing the languages as you wish. If you choose SAS for an exercise, you may use IML, DATA operations or PROC SQL at your discretion.

7/15/2019 12:44:31 AM

Warning I will continue restricting the use of external libraries in R, particularly tidyverse libraries. You may choose to use ggplot2, but take care that the plots you produce are at least as readable as the equivalent plots in base R. You will be allowed to use whatever libraries tickle your fancy in the midterm and final projects. Reuse - For many of these exercises, you may be able to reuse functions written in prior homework. Define those functions here.

Write a Review

Required(*) Message

User Account

All Pages