Reference no: EM132359252
Assignment -
Table - Variables in BMI
Variable
|
Description
|
Age
|
The age of the participant (in years)
|
Gender
|
The gender of the participant: 1 = Male, 2 = Female
|
Smoking
|
Whether the person smokes or not: 1 = Yes, 2 = No
|
Alcohol
|
Alcohol consumption (grams of ethanol)
|
PHYSACT
|
Whether the subject participates in regular physical activity: 1 = Yes , 2 = No
|
SES
|
The socio-economic status of the participant: 1 = Lower, 2 = Medium, 3 = High
|
BMI
|
Body mass index (in kg/m2)
|
QUESTION 1 - Before starting to do the multivariable regression analysis, the researcher would like to learn some basic knowledge related to multiple regression analysis from you. Her first question is related to the concept of interaction. Which of the following statements best describes an interaction effect occurring?
a. The effect of one independent variable (on a dependent variable) does not depend on the levels of another independent variable
b. The effect of one independent variable (on a dependent variable) equals to the effect of another independent variable
c. One independent variable and the dependent variable interact each other
d. The effect of one independent variable (on a dependent variable) is different for the different levels of another independent variable
e. Both answer c) and d) are correct
QUESTION 2 - The researcher's second question is given as below: In a multiple regression analysis, if one additional independent variable was added into the model, which of the following statements is correct?
a. The unexplained variability of the dependent variable will increase or stay the same
b. The total variability of the dependent variable will increase
c. The estimated coefficient of this new independent variable will be positive
d. The explained variability of the dependent variable will increase
e. The p value of this new added variable will be < 0.05
QUESTION 3 - The researcher's third question is given below: after a multiple regression analysis, if the scatter plot of the standardised residuals against a continuous independent variable has a clear pattern (eg, fan-shaped), which of the assumptions are not met?
a. Normality of the standardised residuals
b. Homoscedasticity of the standardised residuals
c. Independence of the standardised residuals
d. Random sampling of the standardised residuals
e. Constant variation of the standardised residuals
f. Assumptions b) and e) are the same and they are not met
QUESTION 4 - Now the researcher gained more confidence to do her regression analysis. Before starting to do the multivariable regression analysis, the researcher would like to test the marginal association between BMI and Gender, i.e, she wanted to test a null hypothesis that there is no difference in population mean BMI between females and males. Which of the following tests you should recommend?
a. Paired samples t-test
b. Independent samples (two-samples) t-test with equal variances
c. Independent samples (two-samples) t-test with unequal variances
d. One-way ANOVA
e. Chi-square test
QUESTION 5 - After you conducted the statistical test, you recommended in the previous question, what could you conclude about it?
a. The test statistics is 1.4425, the p-value is 0.1538 >0.05, the 95% CI of the difference is (-0.414, 2.574) and does include '0', suggesting that the population mean BMI is significantly different between females and males
b. The test statistics is 1.4425, the p-value is 0.1538 >0.05, the 95% CI of the difference is (-0.414, 2.574) and does include '0', suggesting that the population mean BMI is not significantly different between females and males
c. The test statistics is 1.5519, the p-value is 0.1236 >0.05, the 95% CI of the difference is (-0.299, 2.459) and does include '0', suggesting that the population mean BMI is significantly different between females and males
d. The test statistics is 1.5519, the p-value is 0.1236 >0.05, the 95% CI of the difference is (-0.299, 2.459) and does include '0', suggesting that the population mean BMI is not significantly different between females and males
e. The test statistics is 67.2091, the p-value is <0.001, the 95% CI of the difference is (22.402, 23.763) and does not include '0', suggesting that the population mean BMI is significantly different between females and males
QUESTION 6 - Before starting to do the multivariable regression analysis, the researcher also wanted to test the marginal association between BMI and Smoking, i.e., she wanted to test the null hypothesis that there is no difference in population mean BMI between smokers and non-smokers. Which of the following statements is correct?
a. An independent samples (two-sample) t test with unequal variances (Levene's test p = 0.037) should be performed. This test suggested that the difference in the population mean BMI between smokers and non-smokers is not significant (t = 1.7766, p = 0.0852) at 5% significance level
b. An independent samples (two-sample) t test with equal variances (Levene's test p = 0.037) should be performed. This test suggested that the difference in the population mean BMI between smokers and non-smokers is statistically significant (t = 2.0020, p = 0.0478) at 5% significance level
c. An independent samples (two-sample) t test with unequal variances (Levene's test p = 0.037) should be performed. This test suggested that the difference in the population mean BMI between smokers and non-smokers is statistically significant (t = 66.9032, p <0.001) at 5% significance level
d. Answers (b) and (c) are both correct
QUESTION 7 - Before starting to so the multivariable regression analysis, furthermore the researcher wanted to test the marginal association between BMI and PHYSACT, i.e., she wanted to test there is no difference in population mean BMI between people who participate in regular physical activity and those who do not. Which of the following statements is correct?
a. Based on the independent sample t-test with unequal variances (Levene's test p = 0.629), the population mean BMI between the two physical activity groups (Yes vs No) is significant different (t= -3.3054, p = 0.0014), with those who do not undertake regular physical activity having a significant higher BMI on average
b. Based on the independent sample t-test with equal variances (Levene's test p = 0.629), the population mean BMI between the two physical activity groups (Yes vs No) is significant different (t= -3.2667, p = 0.0015), with those who do not undertake regular physical activity having a significant higher BMI on average
c. Based on the independent sample t-test with equal variances (Levene's test p = 0.629), the population mean BMI between the two physical activity groups (Yes vs No) is significant different (t= -3.2667, p = 0.0015), with those who do not undertake regular physical activity having a significant lower BMI on average
d. Based on the independent sample t-test with equal variances (Levene's test p = 0.629), the population mean BMI between the two physical activity groups (Yes vs No) is significant different (t = 67.9958, p <0.001), with those who do not undertake regular physical activity having a significant higher BMI on average
QUESTION 8 - Then the researcher wanted to test the null hypothesis that the population mean BMI (denoted as µ) among the three SES groups (Lower, Medium, Higher) are the same, which of the following statements is correct?
a. Ho: µLower = µMedium = µHigher, Ha: at least two population mean BMI µ differ to each other
b. Ho: µLower = µMedium = µHigher, Ha: µLower ≠ µMedium ≠ µHigher
c. Based on the one-way ANOVA, the differences in the population mean BMI amongst the three different SES groups are not significant because p=0.491 is greater than 0.05
d. Based on the one-way ANOVA, the differences in the population mean BMI amongst the three different SES groups are not significant because p=0.451 is greater than 0.05
e. Answers (a) and (d) are both correct
QUESTION 9 - Lastly, before starting to do the multivariable regression analysis, the researcher wanted to test the association between BMI and Age using scatter plot and Pearson's correlation coefficient. Which of the following statements is appropriate?
a. There was an association between BMI and age (Pearson's correlation coefficient = 0.2659, p = 0.005), indicating people who have a higher BMI getting older quicker
b. There was a positive association between BMI and age (Pearson's correlation coefficient = 0.2659, p = 0.005), indicating people with higher BMI tend to older
c. There was a positive weak linear yet statistically significant relationship between BMI and age (Pearson's correlation coefficient = 0.2659, p = 0.005), indicating older people tend to have a higher BMI
d. There was a positive strong linear yet statistically significant relationship between BMI and age (Pearson's correlation coefficient = 0.2659, p = 0.005), indicating if you have a higher BMI, you tend to older
e. Answers (c) and (d) are both correct
QUESTION 10 - The researcher requests your help to build up a parsimonious regression model for BMI, using a backward elimination process (remove a variable with the largest p value > 0.05 and one at a time) by treat All given independent variables equally, i.e., there is no major variable of interest. Also she does not want to test for interaction or confounding effects.
Please perform the regression analysis and obtain a parsimonious regression model. Then answer following relevant questions.
Which of the following reasons are correct for describing your final parsimonious model?
a. A significant interaction is included in my final model
b. All variables included in my final model are statistically significant at 5% level (i.e., all p values < 0.05)
c. My final model includes only two variables Age and PHYSACT, because all other variables are not significant associated with BMI
d. My final model includes only three variables Age, Smoking and PHYSACT
e. Answers (b) and (d) both are correct
QUESTION 11 - The researcher would like to understand the concept of "multicollinearity", and to know whether it is a concern about your final model. Which of the following statements you think is correct?
a. Multicollinearity occurs when two independent variables are highly correlated with a correlation coefficient greater than 0.90 (i.e, r > 0.9)
b. Multicollinearity occurs when the dependent variable (DV) and one independent variable (IV) are highly correlated with a correlation coefficient greater than 0.90 (i.e, r > 0.9)
c. If both highly correlated independent variables are included in a same model, neither of them will be identified as a significant predictor of the DV, even though individually they may be significant predictors of the DV
d. In my final model, there is no concern about the multicollinearity, because no any two independent variables are highly correlated with a correlation coefficient greater than 0.90
e. Only answer b) is incorrect
QUESTION 12 - Now based on your final parsimonious model (Question 10), which of the following prediction equations is correct (Hint: you need to read carefully your regression model parameter table output, and use 3 decimal places in the equation).
a. Predicted Mean BMI = 22.337 - 1.801* [PHYSACT = no] +1.586*[Smoking = non-smokers]+ 0.068*age
b. Predicted Mean BMI = 22.337 - 1.801* [PHYSACT = no] +1.586*[Smoking = smokers]+ 0.068*age
c. Predicted Mean BMI = 22.337 - 1.801* [PHYSACT = yes] +1.586*[Smoking = smokers]+ 0.068*age
d. Predicted Mean BMI = 22.337 - 1.801* [PHYSACT = yes] +1.586*[Smoking = non-smokers]+ 0.068*age
e. Predicted Mean BMI = 22.337 - 1.801* [PHYSACT] +1.586*[Smoking]+ 0.068*age
QUESTION 13 - The researcher is happy with your model. Based on your final model (see Question 10 & 12 ), which of the following statements is correct? Please note the Yes group refers to those who participate in regular physical activity, and the No group refers to those who do not.
a. Because PHYSACT is not a significant predictor of BMI, there are no differences in the mean BMI between the two PHYSACT groups, i.e., between Yes group and No group
b. The differences in the mean BMI between the two PHYSACT groups, i.e., between Yes group and No group, cannot be assessed by the available regression analysis
c. The No group has a higher BMI on average than the Yes group only for older people
d. The No group has a higher BMI on average than the Yes group only for younger people
e. The No group has a higher BMI on average than the Yes group controlling for the subject's Age and their Smoking status
QUESTION 14 - Based on your final model (see Question 10 &12), which of the following statements is correct? Please note the Yes group refers to those who participate in regular physical activity, and the No group refers to those who do not.
a. The estimated mean difference in BMI between those who do and do not participate in regular physical activity is -1.801 kg/m2; with the Yes group estimated with 95%certainty to have a higher BMI on average (by between -3.135 and -0.466 kg/m2) in the population, after controlling for Age and their Smoking status
b. The estimated mean difference in BMI between those who do and do not participate in regular physical activity is -1.801 kg/m2; with the No group estimated with 95%certainty to have a lower BMI on average (by between -3.135 and -0.466 kg/m2) in the population, after controlling for Age and their Smoking status
c. The estimated mean difference in BMI between those who do and do not participate in regular physical activity is -1.801 kg/m2; with the No group estimated with 95%certainty to have a higher (by between 0.466 and 3.135 kg/m2) BMI on average in the population, after controlling for Age and their Smoking status
d. The estimated mean difference in BMI between those who do and do not participate in regular physical activity is -1.801 kg/m2; with the Yes group estimated with 95%certainty to have a lower (by between 0.466 and 3.135 kg/m2) BMI on average in the population, after controlling for Age and their Smoking status
e. Both statements c) and d) are correct
QUESTION 15 - Based on your final model (see Question 10 & 12), which of the following statements is correct?
a. Age is not a significant predictor of BMI for any people in the sample
b. Age is a significant predictor of BMI only for smokers who do not participate in regular physical activity
c. Age is a significant predictor of BMI only for non-smokers who participate in regular physical activity
d. Age is a significant predictor of BMI for all smokers and non-smokers who do and do not participate in regular physical activity
e. The significance of Age as a predictor of BMI cannot be assessed from the available information
QUESTION 16 - The researcher interpreted the estimated regression coefficient for Age in your final model (see Question 10 & 12) as following, and she is confused which is correct, please help her find the correct one.
a. For every one-year increase in Age, the mean BMI was estimated to increase by 0.068 kg/m2 only for smokers who do not participate in the regular physical activity in the population
b. For every one-year increase in Age, the mean BMI was estimated to increase by 0.068 kg/m2 only for non-smokers who participate in the regular physical activity in the population
c. For every one-year increase in Age, the mean BMI was estimated to increase by 0.068 kg/m2 only for those who participate in the regular physical activity the population, regardless their Smoking status
d. For every one-year increase in Age, the mean BMI was estimated to increase by 0.068 kg/m2 only for those who do not participate in the regular physical activity the population, regardless their Smoking status
e. For every one year increase in Age, the mean BMI was estimated to increase by 0.068 kg/m2 for all subjects in the population, controlling for their PHYSACT and Smoking status
QUESTION 17 - Based on your final model (see Question 10 & 12), the researcher also made her conclusions regarding subject's smoking status as follows. Do you think which of her conclusions is correct?
a. As its p value is 0.041 < 0.05, Smoking status is a significant predictor of BMI, however only for smokers who participate in regular physical activity
b. As its p value is 0.041 < 0.05, Smoking status is a significant predictor of BMI, however only for smokers who do not participate in regular physical activity
c. As its p value is 0.041 < 0.05, Smoking status is a significant predictor of BMI, even after subject's age and physical activity participation status were adjusted in the final model
d. As its p value is 0.041 < 0.05, Smoking status is a significant predictor of BMI, however only for younger smokers who do not participate in regular physical activity
e. As its p value is 0.041 < 0.05, Smoking status is a significant predictor of BMI, however only for older smokers who participate in regular physical activity
QUESTION 18 - Based on your final model (see Question 10 & 12), the researcher also interpreted the estimated coefficient of Smoking as follows. Which of the following researcher's interpretations is correct?
a. The BMI of non-smokers was estimated by 1.586 kg/m2 higher on average compared to that of smokers, and the population mean BMI of non-smokers was estimated with 95% certainty to be by between 0.068 and 3.104 kg/m2 higher than that of smokers, after the adjustment of subjects' age and their physical activity status
b. The BMI of smokers was estimated by 1.586 kg/m2 lower on average compared to that of non-smokers, and the population mean BMI of smokers was estimated with 95% certainty to be by between 0.068 and 3.104 kg/m2 lower than that of non-smokers, after the adjustment of subjects' age and their physical activity status
c. For each one additional cigarette smoked per day, non-smokers' BMI was increased by 1.586 kg/m2 on average, and the population mean BMI of non-smokers was increased with 95% certainty to be by between 0.068 and 3.104 kg/m2, after the adjustment of subjects' age and their physical activity status
d. For each one additional cigarette smoked per day, smokers' BMI was decreased by 1.586 kg/m2 on average, and the population mean BMI of smokers was decreased with 95% certainty to be by between 0.068 and 3.104 kg/m2, after the adjustment of subjects' age and their physical activity status
e. None of the above
QUESTION 19 - Based on your final model (see Question 10 & 12), the researcher would like to make a mean BMI prediction for a 50 years old non-smoker who participate in physical activity, and her results is 27.323 kg/m2. Do you agree?
a. No. The researcher's prediction is incorrect and the correct prediction should be 25.737 kg/m2
b. No. The researcher's prediction is incorrect and the correct prediction should be 23.936 kg/m2
c. Yes. The researcher's prediction of 27.323 kg/m2 is correct
d. No. The researcher's prediction is incorrect and the correct prediction should be 25.522 kg/m2
e. No. The researcher's prediction is incorrect and the correct prediction should be the coefficient for age, which is 0.068*50 = 3.4 kg/m2
QUESTION 20 - Based on your final model (see Question 10 & 12), consider one subject is 20 years old, and another is 40 years old, what is the mean increase in BMI between these two subjects, assuming they have exact same smoking status and the participation status in regular physical activities?
a. 0.068*40 kg/m2
b. 0.068*20 kg/m2
c. 22.337*20 kg/m2
d. -1.801*20 kg/m2
e. 1.586*20 kg/m2
QUESTION 21 - Based on your final model (see Question 10 & 12), the researcher would like to predict the mean value of BMI for an 18 years old non-smoker who participate in physical activity. Which of the following predictions is appropriate?
a. 23.561 kg/m2
b. 25.147 kg/m2
c. 21.760 kg/m2
d. 23.346 kg/m2
e. None of the above is correct as the prediction cannot be made
QUESTION 22 - Based on your final model (see Question 10 & 12), the researcher saved the standardised residuals and she would like to assess the normality of the standardised residuals. You suggested the 5-measures approach, and the researcher concluded that the standardised residuals can be assumed to have a normal distribution. Do you agree?
a. Yes, the standardised residuals can be assumed to have a normal distribution
b. No, the standardised residuals has a strong positively skewed distribution
c. No, the standardised residuals has strong negatively skewed distribution
d. No, the standardised residuals has a bimodal distribution
QUESTION 23 - Based on your final model (see Question 10 & 12), the researcher would like to assess the assumption of the constant variation related to continuous variables. She has done a scatter plot using the standardised residuals against the continuous variable in your final model, which of the following conclusions is correct?
a. From the scatter plot of standardised residuals against Smoking, constant variation can be assumed as no clear (funnel or curved) shape is evident and the plot displays equal variation of residuals across the values of Smoking.
b. From the scatter plot of standardised residuals against PHYSACT, constant variation can be assumed as no clear (funnel or curved) shape is evident and the plot displays equal variation of residuals across the values of PHYSACT.
c. From the scatter plot of standardised residuals against Age, constant variation can be assumed as no clear (funnel or curved) shape is evident and the plot displays equal variation of residuals across the values of Age.
d. The researcher used a wrong method, as the constant variation related to a continuous variable should be assessed by using a Levene's test.
e. It is actually not necessary to assess the constant variation as it is always assumed for any model.
QUESTION 24 - Based on your final model (see Question 10 & 12), the researcher would like to assess the assumption of the equal variances related to the categorical variables in your final model. Which of the following conclusions is correct?
a. Based on the Levene's test regarding Smoking status, equal variance can be assumed (p=0.148 > 0.05) between smokers and non-smokers group; and based on the Levene's test for PHYSACT, equal variance also can be assumed between regular physical active group and its non-regular physical active counterpart (p=0.195 > 0.05)
b. Based on the Levene's test regarding Smoking status, equal variance can be assumed (p=0.148 > 0.05) between smokers and non-smokers group
c. Based on the Levene's test regarding PHYSACT status, equal variance can be assumed (p=0.195 > 0.05) between regular physical active group and its non-regular physical active counterpart
d. Based on the Levene's test regarding Smoking and PHYSACT status, equal variance can be assumed (p=0.291 > 0.05) among the groups defined by these two variables
e. The researcher does not need a Levene's test because the equal variances related to a categorical variable should be assessed by using a scatter plot of standardised residuals against the categorical variable, which has been done in the previous question
QUESTION 25 - Based on your final model (see Question 12 & 14), the researcher would like to assess the goodness-of-fit of your final model using the adjusted R2 value. Which of the following interpretations is correct?
a. The researcher should interpret the R2 value rather the adjusted R2 value to assess the goodness-of-fit of your final multiple linear regression model
b. As the adjusted R2 value is 0.1446, the final model will accurately predict BMI 14.46% of the time
c. As the adjusted R2 value is 0.1446, indicating the final model only explains approximately 14.46% of variation in BMI by subject's age, smoking status and physical activity participation status together
d. As the adjusted R2 value is 0.1446, there is 85.54% of total variation in BMI unexplained by the final model
e. Answer (c) and (d) are correct
QUESTION 26 - Based on your final model (see Question 10 & 12), the researcher would like to know how good the model is and she used two different criteria. Which of the following conclusions is correct?
a. The range of standardized residuals values is (-1.883, 2.654), indicating no standardised residuals are found to be outside the reference range of (-3, 3) and the model does fit well for all sample observations
b. The model is not practical useful in practical applications for predictions due to its low Adjusted R2
c. BMI would be impacted by many other variables not considered in this study
d. If other important potential factors were included, the model performance would be improved with a higher R2 value
e. All of above are correct
QUESTION 27 - Based on your final model (see Question 10 & 12), the researcher concluded that the difference in the mean predicted BMI between her two friends (one is a smoker and the other a non-smoker) is 1.586 kg/m2. Do you agree with the researcher?
a. Yes. The researcher's conclusion is correct as the estimated coefficient for smoking is 1.586, hence the difference in the mean predicted BMI between her two friends is 1.586 kg/m2
b. No. The researcher's conclusion is incorrect unless her two friends are at the same age ranged from 19 to 69 years and also have the same physical activity participation status (eg., both are physically active or both physically inactive)
c. Not really. The researcher's conclusion is only correct when both her two friends have the same age (ranged from 19 to 69 years) and have the exact same physical activity participation status
d. No. The researcher's conclusion is wrong because she did not mentioned the gender of her friends
e. No. I think statements (b) and (c) are both correct
QUESTION 28 - The researcher would like to know the steps of assessing a confounder for her future study. Given a dependent variable Y (continuous), a study factor of interest X and a possible confounding variable C, which of the following steps are NOT correct.
a. Obtain a crude estimated regression coefficient for X using a simple linear regression for Y with only X
b. Obtain an adjusted estimated regression coefficient for X and p value for C using a multiple linear regression for Y with X and C
c. Calculate the change between the crude estimate and the adjusted estimate using 100%*(crude estimate - adjusted estimate) / crude estimate
d. C is a confounder if the change is greater than 15%
e. C is concluded to be a confounder if p value for C < 0.05
f. Answer d) and e)