Reference no: EM132260148
Statistics Assignment -
You should include relevant statistical (R and Stata) code; please use a different font (Courier 10pt is recommended) and format it neatly. Recommend to include full copies of commented scripts as an appendix.
QUESTION 1 - Answer all parts of question using R
The data set required for this question is called heart transplant data (HTD.csv). This data set contains 11 variables
Table 1 - Description of variables in the HTD.csv dataset
Variable
|
Description
|
Patient id
|
Unique patient identifier
|
Year of acceptance
|
Year admitted
|
Age
|
Age of the patient
|
Survival Status
|
Describes if the patient is alive or dead
1= dead
0= alive
|
Survival time
|
Number of months survived
|
Prior surgery
|
Surgery history
1= Yes
0= No
|
Transplant status
|
Has the patient received transplant or not
1= transplanted
0= not transplanted
|
Waiting time for transplant
|
Number of months the patient had to wait for transplant
|
Mismatch on alleles
|
|
Mismatch on antigen
|
|
Mismatch score
|
|
a. Read data into R
1. Label all variables and apply formats to the categorical variables.
2. Produce a list of the first 10 records in the dataset.
3. Produce a list of the first 10 records if the patient survived.
4. Produce two separate tables which show (1) the number of people transplanted and (2) the number of people who had prior surgery. Also do a cross table of transplanted and people who had prior surgery.
5. Create categories of age and label the categories (<40, 41-45, 46-50, 51-55, 56-60, 60+)
b. Check to see whether any people have a record of transplant status after their death. Produce the list of records if any.
c. Are they any patients who were alive, had prior surgery, and missing information about mismatch on antigen, tabulate the percentage and state how many such cases are there?
d. Find the mean, median, and inter-quartile range of waiting time for transplant among people who died.
e. Do a cross tabulation of survival status and transplant status, present the table with variable and value labels.
f. Present a table showing the number of people who survived within each age category.
QUESTION 2 - Answer all parts of question using Stata
The data set for this question is contained in the file breakfastdata.dat. This data set has 15 variables.
Table 2 - Description of variables in the breakfasdata.dat dataset
Variable
|
Description
|
Name
|
Cereal name
|
Mfr
|
Cereal manufacturer
A= American Home Food Products
G= General Mills
K=Kelloggs
N=Nabisco
P=Post
Q=Quaker oats
R=Ralston Purnia
|
Type
|
Type
C= cold
H=hot
|
Calories
|
Calories (number)
|
Protein
|
Protein (g)
|
Fat
|
Fat (g)
|
Sodium
|
Sodium(mg)
|
Fiber
|
Dietary fiber (g)
|
Carbo
|
Complex Carbohydrates (g)
|
Sugars
|
Sugars(g)
|
Shelf
|
Display shelf (coded as 1,2,3)
|
Potass
|
Potassium (mg)
|
Vitamins
|
Vitamins and Minerals
0- None added
25- "enrich often to 25% FDA Recommended"
100- "100% of FDA recommended"
|
Weight
|
Weight (in ounces)
|
Cups
|
Cups per serving
|
a) Apply variable and value labels to each of the variables using the description provided in Table-2. Recode -1 category in each variable to missing. Produce univariate statistics for the following variables: type, mfr, vitamins, sugars, calories, protein. Present your descriptive table as if it is ready to be published in a report.
b) Produce 8 records observations that has cereal manufactured by general mills and 8 records that show cereal manufactured by Kelloggs and also show 100% FDA recommended vitamins and minerals in both sets of records.
c) Create sodium categories (6 possible categories, including 0 as one of the categories), describe the criteria used, define the variable and value labels.
d) Using the categorical sodium variable created above, provide the syntax to compute the following table.
sodc
|
Type and Manufacturer
|
C
|
H
|
A
|
G
|
K
|
N
|
P
|
Q
|
R
|
A
|
G
|
K
|
N
|
P
|
Q
|
R
|
0
|
|
|
6
|
4
|
2
|
3
|
1
|
1
|
|
|
1
|
|
1
|
|
1
|
|
9
|
6
|
1
|
5
|
2
|
1
|
|
|
|
|
|
|
|
2
|
|
7
|
3
|
|
2
|
|
3
|
|
|
|
|
|
|
|
3
|
|
4
|
5
|
|
|
2
|
2
|
|
|
|
|
|
|
|
4
|
|
2
|
2
|
|
|
|
1
|
|
|
|
|
|
|
|
5
|
|
|
1
|
|
|
|
|
|
|
|
|
|
|
|
e) Provide appropriate descriptive summaries of each variable (calories, protein, fat, sodium, fiber, carbo, sugars, and potass) for each cereal manufacturer. (State why you think your measure of descriptive summary is appropriate in one sentence).
Note - Need to solve using STATA or R programming.
Attachment:- Assignment Files.rar