Reference no: EM132848149
Case-control studies help determine whether certain exposures are associated with outcomes such as developing cancer. The built-in dataset esoph contains data from a case-control study in France comparing people with esophageal cancer (cases, counted in ncases) to people without esophageal cancer (controls, counted in ncontrols) that are carefully matched on a variety of demographic and medical characteristics. The study compares alcohol intake in grams per day (alcgp) and tobacco intake in grams per day (tobgp) across cases and controls grouped by age range (agegp).
--------------------------------------------------------------------------------------------------------------
The dataset is available in base R and can be called with the variable name esoph:
head(esoph)
You will be using this dataset to answer the following four multi-part questions (Questions 3-6).
You may wish to use the tidyverse package:
library(tidyverse)
The following three parts have you explore some basic characteristics of the dataset.
Each row contains one group of the experiment. Each group has a different combination of age, alcohol consumption, and tobacco consumption. The number of cancer cases and number of controls (individuals without cancer) are reported for each group.
Do using statistical R Language to solve following questions :
- What is the probability that a subject in the highest alcohol consumption group is a cancer case?
- What is the probability that a subject in the lowest alcohol consumption group is a cancer case?
- Given that a person is a case, what is the probability that they smoke 10g or more a day?
- Given that a person is a control, what is the probability that they smoke 10g or more a day?