Reference no: EM132451643
Lab Assignment - R Basics and Working with Data: Introduction to Statistical Reasoning
Objectives -
1. Demonstrate basic R Skills (create vectors, perform vector operations, install packages).
2. Use basic statistical functions (mean, min, max, median, sd).
3. Visualize data (dot plot, histogram, box plot, contingency tables, scatter plot, bubble plot).
Section 1 - R and RStudio Basics - Exercises
1 - Vectors:
a. Create a vector named heights that contains the heights, in inches, of yourself and two students near you. Print the contents of this vector.
b. Create a vector named names that contains the names of these people. Print the contents of this vector.
c. Try typing cbind(heights, names). What did this command do? What class is this new object?
Hint: Try the class() function.
2 - Downloading data:
a. Download the data set births.csv from the CCLE site and upload it into RStudio. Name the data frame NCbirths.
b. Demonstrate that you have been successful by typing head(NCbirths) and copying and pasting the output into your word processing document.
3 - Load the maps package:
a. Install the maps package. Verify its installation by typing find.package("maps") and include the output in your answer.
b. Type library(maps) to load up the package. Type map("state") and include the plot output in your answer.
4 - Perform vector operations:
a. Extract the weight variable as a vector from the data frame by typing weights <- NCbirths$weight
b. What units do you think the weights are in?
c. Create a new vector named weights_in_pounds which are the weights of the babies in pounds. You can look up conversion factors on the internet.
d. Demonstrate your success by typing weights_in_pounds[1:20] and including the output in your word processing document.
Section 2 - Summarizing Data (one variable) - Exercises
1. What is the mean weight of the babies in pounds?
2. What percentage of the mothers in the sample smoke? Hint: use the tally function with the format argument. Use the help screen for guidance.
3. According to the Centers for Disease Control, approximately 21% of adult Americans are smokers. How far off is the percentage you found in 2 from the CDC's report?
Section 3 - Visualizing Data (one quantitative variable) - Exercises
1. Produce a dot plot of the weights in pounds.
2. Produce three different histograms of the weights in pounds. Use 3 bins, 20 bins, and 100 bins. Which histogram seems to give the best visualization, and why?
3. We can use the syntax boxplot(vector1, vector2) to make a side by side box plot. Create a side by side boxplot of the mother's ages and the father's ages. Which gender tends to be older?
4. Try typing histogram(~ weight | Habit, data = NCbirths, layout = c(1, 2)). Describe what this code does. Based on the graph, do you see any major differences between baby weights from smoking moms vs. non-smoking moms?
Section 4 - Visualizing Data (two categorical) - Exercises
1. Consider the other categorical variables in this data. Of those that record the health of the baby, which do you think will be associated with the mother's smoking and why? Make a two-way Summary Table to check your hypothesis. Do you have evidence that this variable associated with smoking? Why?
Section 5 - Visualizing Data (two quantitative) - Exercises
1. Produce a nicely formatted scatter plot of the weight of the baby vs. the mother's age.
Section 6 - Visualizing Data (geographic data) - Exercises
1. To demonstrate you have followed the handout, produce a modified version of the colored bubble plot. Rename the title to "California ozone bubble plot". Also, use a different point style, and different colors for each air quality category.
Attachment:- R Basics and Working with Data Lab Assignment File.rar