Reference no: EM132392734
Project
Step 2 – One Variable Software Results and Data
STEP 2 Directions
Part 1: Prepare the data: The results of this step will be evident in future output. DO NOT document your work on this step in your software results submission.
You will be asked to submit your SAS Code as part of this assignment (preferably copied into WORD so that the colors are preserved).
1. Import the data into your software. Be sure to check your imported data for any obvious problems before proceeding (variables are the wrong type, things just don’t look right, etc.)
2. For your original quantitative EXPLANATORY variable, create TWO new variables which are coded categorical versions based upon cutoffs of your choice using the requirements below.
One of these new variables will be a binary version (two levels).
The other will be a multi-level version with 3-5 groups.
ALL groups must have AT LEAST 10% of the overall sample. (CHECK USING FREQUENCY TABLES requested in Parts 2 and 3 of this assignment)
Be sure you define your groups for each categorization so that they do not overlap
Be sure you do not miss any observations (sample sizes must match original variables).
When creating these new categorical variables, be sure to use numeric codes (not character text) to represent the levels of the new categorical variables. You will translate these coded categorical variables in Question 4.
SAS Tutorial: Topic 2E – (8:34) Categorize a Quantitative Variable
3. For your original quantitative RESPONSE variable, create TWO new variables which are coded categorical versions based upon cutoffs of your choice using the EXACT same requirements given above as Question 2.
4. Create translations for the FOUR NEW VARIABLES created in Questions 2 and 3 which utilize the range of values of the variable used to create the groups. See below for clarification and ask if you still do not understand what we mean here.
SAS Tutorial: Topic 2B – (7:12) Translating Categorical Variables (Using PROC FORMAT)
For example if your binary variable for age has 0 = under 50 and 1 = 50+ as your two groups, you might use the following translations: 0 = “Under 50” and 1 = “50 and older”
For a multi-level age variable you might have something like: 1 = “Under 20”, 2 = “20-49”, 3 = “50-69” and 4 = “70 and older.”
You could also use algebra notation such as 2 = “[20, 30)” for the above example to indicate that 20 is in the interval but 30 is not.
I simply want to have the numeric values in the translations since everyone has a different dataset and it gets difficult to keep track of when grading and descriptions such as LOW and HIGH are NOT helpful to the grader.
5. Label ALL SIX VARIABLES with descriptive titles which are different from the variable names.
See SAS Tutorial: 2A – (6:23) Permanently Labeling Variables in a SAS Dataset
Part 2: Descriptive Summary of EXPLANATORY variable
6. Calculate the sample size, sample mean, sample median, sample standard deviation, min, max, Q1, Q3, and 95% confidence interval for the population mean for your original quantitative EXPLANATORY variable. Provide the software output containing these results in your solution. The sample size can be provided via the case processing summary or the histogram label.
Note: 95% confidence intervals for the population mean can be obtained in PROC MEANS by adding the keyword CLM to your list of keywords. This is presented later in Topic 7C – Two Sample T-Test.
7. Construct a histogram, boxplot, and QQ-plot for your original quantitative EXPLANATORY variable. Provide ONLY THESE 3 GRAPHS in your solution.
8. Construct a frequency table for the binary version for your EXPLANATORY variable created in Question 2.
9. Construct a frequency table for the multi-level version for your EXPLANATORY variable created in Question 2.
Part 3: Descriptive Summary of RESPONSE variable
10. Calculate the sample size, sample mean, sample median, sample standard deviation, min, max, Q1, Q3, and 95% confidence interval for the population mean for your original quantitative RESPONSE variable. Provide the software output containing these results in your solution. The sample size can be provided via the case processing summary or the histogram label.
Note: 95% confidence intervals for the population mean can be obtained in PROC MEANS by adding the keyword CLM to your list of keywords. This is presented later in Topic 7C – Two Sample T-Test.
11. Construct a histogram, boxplot, and QQ-plot for your original quantitative RESPONSE variable. Provide ONLY THESE 3 GRAPHS in your solution.
12. Construct a frequency table for the binary version for your RESPONSE variable created in Question 3.
13. Construct a frequency table for the multi-level version for your RESPONSE variable created in Question 3.
Attachment:- Vuskovich Project.rar