MPH406 Quantitative Methods Assignment problem

Assignment Help Basic Statistics
Reference no: EM132377900



EPID6002 Quantitative Methods MPH406

Assignment Case study application (CSA)

Case study application ONE

This case study application uses a sub-dataset of random sample (Weight 2019S2QM.dta) from a larger scale study. This dataset contains information of weight, exercise level and gender measured for 72 participants. This dataset provides one continuous variable Y (weight) and two categorical variables (exercise and gender) as follows in Table 1.

Table 1 Variables in Weight 2019S2QM.dta




Weight (kg)


Exercise index

1 = Low, 2 = High


Gender 1 = Female, 2 = Male

The researcher who conducted the study would like to answer the following questions with your help.
1) If gender is an effect modifier on the association between exercise and weight? i.e., if gender modifies the effect of exercise on weight?
2) If mean weight loss due to different exercise levels is the same for males and females?
Use a significance level α of 5% for your tests and conclusions.

• You may find helpful to follow the instructions in computing Lab Review of introductory biostatistics for t test.
• You may find helpful to follow the strategy for analyses given in computing Lab Interactions.

1. The researcher believes that investigating the relationship between weight and exercise, accounting for gender as a potential effect modifier would be able to answer the research questions. Do you agree?

Yes, I agree.

Which variable should be the dependent variable?____________________________

Which are independent variables? ________________________________________

Which variable is the possible effect modifier? ______________________________

No, I do not agree.

Then how do you analyse this data to answer the research question?

Which variable should be the dependent variable? ___________________________

Which are independent variables? _______________________________________

2. Firstly the researcher decided to ask your help for testing the null hypothesis that the population mean weight is the same regardless exercise levels, i.e., she would like to compare the population mean weight between the two exercise levels.

(No Stata output(s) are required for this question)

i) What are the hypotheses?
HO: __________________________________________________

HA: __________________________________________________

ii) Which t test you would recommend for testing the null hypothesis?

iii) Conclusion of the t test:

3. Having compared the population mean weight between the two exercise levels, the researcher would like to perform a multiple regression analysis for comparing the difference in the population mean weight between the two exercise levels, accounting for gender in the model as a potential effect modifier.

i. Which of the multiple regression models you learnt from our unit is appropriate for the research's analysis? Why?

ii. The researcher plotted the following mean plots to check whether there is an interactive effect.


a) Which mean plot (Plot 1 or 2) indicates that gender probably modifies the effect of exercise on weight? Why? _______________________________________________________________________

b) Based on the mean plot you chose, make a justification on whether the interaction term between exercise and gender should be included and assessed in the multiple regression model.

iii. Fit the multiple regression model you recommended (in Question 3.i) for the researcher.

Attach relevant Stata output (eg., ANOVA table) here


iv. Based on the output of your regression model (Question 3.iii), test the hypothesis that there is no interaction in the population between the exercise and gender, and answer the question "if gender modifies the effect of exercise on weight?"

v. The researcher believes that any variable with a p value >0.05 needs to be removed from the multiple regression model, and a further simplified model is necessary. Do you agree with her? Comment on "if a further model is necessary" by selecting an answer below (2 marks):

a) Yes I agree, the non-significant variable (list it here_______________) should be removed from the model and hence I can have a further simpler model. Briefly justify your answer.

Attach Stata output (eg., parameter estimation table) here


b) No, I disagree, there is no need to remove any variable from the model and hence no need to have a further model. Briefly justify your answer.
Attach Stata output (eg., parameter estimation table) here


4. Based on your final model obtained in Q3 v,

a. What is the prediction equation (estimated regression coefficients are rounded up to 3 decimal places)?

b. The researcher believes the constant in your final model has no any meaningful interpretation (1 mark).

I do not agree with the researcher. Then give your interpretation of the constant in the final model here

I do agree with the researcher. Then briefly explain your reasons here.


c. Calculate the predicted mean weight loss low and high exercise levels for each gender based on the regression equation obtained in Q4 a.


d. The researcher interpreted the regression coefficient ‘7.733' as ‘People who participated in low level exercises were heavier by 7.733 kg on average than those who participated in high level exercises regardless their gender'. Do you think this interpretation is correct? (2 marks)

Yes. I think this interpretation is correct because...

No. I think this interpretation is incorrect because...

5. Using information you obtained from the final model in Q4, write a detailed conclusion with supporting evidence to help the researcher answer the research questions (page 3).

Case study application TWO

A researcher would like to investigate the association between exposure to a special chemical and a specific type of cancer using a random sample (Cancer 2019S2QM.dta) from a practical study. He collected other variables such as number of cigarette smoked per day and living distances from the site of exposure, which will be accounted for as well in his analysis.

The main research question of this study is
1) "if exposure to a special chemical increases the risk of having the cancer?"

The other research questions are
2) "if living close to the site of exposure is associated with a higher risk of having the cancer?"
3) "if more number of cigarette smoked per day is associated with a higher risk of having the cancer?"

Furthermore, the researcher would like to predict the probability of having the cancer for individuals, who have different personal characteristics.

Use a 5% significance level for all statistical tests and conclusions. The information of the variables in the data set is given below in Table 2:

Table 2: Variable in Cancer 2019S2QM.dta

i. You may find it helpful to follow the strategy for analyses given in computing lab Logistic Regression I &II.

1. Given this data, to answer the research questions, you need to help the research identify

1.1. Which variable is the dependent variable?
1.2. Which variable is the main independent variable of interest?
1.3. Which other variables should be controlled in your analysis?

1.4. Given the type of the dependent variable, which kind of regression analysis you should use, and Why?

2. Literature shows number of cigarette smoked per day may confound the association between exposure to a special chemical and the risk of having the specific type of cancer. You need to help the researcher using the steps covered in lectures/labs to assess the possible confounding effect.

1.1 Find the crude odds ratio (also its 95% confidence interval (CI)) of having the Cancer comparing the individuals who exposed to the special chemical with the individuals who didn't.
Attach Stata output here

Crude odds ratio (related to Exposure): _____________________________
Corresponding 95% CI: ______________________________

1.2 Assess whether NumberCigarette confounds the effect of Exposure on the risk of having the Cancer. Assess this possible confounding effect by calculating the relevant statistic from both epidemiological and statistical perspectives. Attach Stata output here

Adjusted Odds Ratio (related to Exposure): ______________________________
P value for ‘NumberCigarette: _______________________________________
Assessment of confounding effect:


3. Given that living distance from the site of exposure and exposure to the special chemical may have a combined or multiplicative effect on the risk of having the Cancer, the researcher would like to test whether there is an interactive effect between Distance (continuous) and Exposure on the risk of having the Cancer. Help the researcher answer the following questions with evidence from your analysis.

Attach Stata output here

a) Whether the effect of exposure to the special chemical is modified by the living distance from the site of exposure? You need to answer this question with evidence from your analysis.


b) Whether you can have a simpler model by removing this interaction term? You need to answer this question with at least two reasons.

4. The researcher wanted to predict the probability of having the cancer for some individuals with different personal characteristics. He would like to perform a multiple regression analysis you recommended in Q1, and include Exposure, NumberCigarette, and Distance (continuous) in his model without any interactions between the independent variables.
Attach Stata output (eg., parameter estimation table with Coefficients) here

4.1 What is the regression equation (coefficients are round up to 3 decimal places)? P is the probability of having the Cancer.


4.2 Now help the researcher calculate the predicted probability of having the cancer for an individual who

a) not expose to the special chemical, smoked 50 cigarettes a day and lived 12 km from the site of exposure? Make a brief comment


b) exposed to the special chemical, smoked 20 cigarettes a day and lived 0.1 km from the site of exposure? Make a brief comment


5. To help the researcher answer the research questions, you need to perform the multiple regression analysis you recommended in Q1, including Exposure, NumberCigarette, and Distance (continuous) in your model without any interactions between the independent variables.
Attach Stata output (eg., parameter estimation table with Odds Ratios) here

1) Whether exposure to a special chemical increases the risk of having the cancer? Answer this question by interpreting the information provided by the adjusted odds ratios (and 95% CI) related to Exposure.

2) Whether living close to the site of exposure is associated with a higher risk of having the cancer? Answer this question by interpreting the information given from the adjusted odds ratios (and 95% CI) related to Distance.

3) If more number of cigarette smoked per day is associated with a higher risk of having the cancer? Answer this question by interpreting the information obtained from the adjusted odds ratios (and 95% CI) related to NumberCigarette.

6. Given the categorized distance, the researcher would like to know from which living distance category, the risk of having the cancer will be significantly decreased. You need to perform another regression analysis including Exposure, NumberCigarette, and DistanceCat (categorical) in your model without any interactions between the independent variables.
Attach Stata outputs (eg., parameter estimation table with Odds Ratios) here

a) Do you think overall DistanceCat is a significant predictor of having the Cancer? Why?

b) From which living distance category, the risk of having the cancer will be significantly decreased? By how many %? How the population mean odds decreased with 95% certainty?

c) Compare the multivariable models with continuous Distance (Q5) and the model with the categorical DistanceCat (this question Q6), if you were the researcher, which model you would use to explain the association between the risk of cancer and living distances from the site of exposure. Justify your choice.

Case study application Three

This case study application uses information from a published paper "Survival of neonates and predictors of their mortality in Tigray region, Northern Ethiopia: prospective cohort study. H G Mengesha et al" BMC Pregnancy and Childbirth (2016) 202

1. Briefly describe the study to answer the following questions:
i. What were the study design and objectives?
ii. How were the participants recruited: where, when, how many?
iii. What are the main limitations (list two) of the study?

2. Read Methods section, answer the following questions:
1) How did the authors define the main outcome variable Neonatal Mortality (NM)? What is the event of interest, how did the authors calculate survival time (i.e., time-to-event), how did the authors define the censored neonates?

2) If you were the authors, complete the table below for the 10 neonates with different conditions, where Censoring status is coded "1" for event and "0" for censored.

3. Read Data Management and Analysis section and Table 6, answer the following questions:

i. Which statistical regression analysis with which regression strategy were used for Table 6? Which method the authors used for assessing the proportional-hazards assumption?

ii. How many (and list all) statistically significant factors associated with neonatal mortality obtained based on the multivariable Cox-proportional hazard model in Table 6? Why you know they are statistically significant?

iii. Choose ONE significant factor obtained based on the multivariable Cox-proportional hazard model from Table 6, interpret its effect and 95%CI on neonatal mortality using your own words.

RESEARCH ARTICLE - Survival of neonates and predictors of their mortality in Tigray region, Northern Ethiopia: prospective cohort study Hayelom Gebrekirstos Mengesha, Alem Desta Wuneh, Wondwossen Terefe Lerebo and Tesfay Hailu Tekle

Attachment:- Quantitative Methods.rar

Verified Expert

This task provides a clear working example of binary logistic regression analysis. It determines the risk factor for cancer and also the risk factor for neonate mortality rate. two different stata files was used to assess the risk factors for cancer and risk factors for neonate mortality. Once significant risk factors are identified, the, multivariate logistic regression was used to test the claim

Reference no: EM132377900

Questions Cloud

Should information be treated like other types of property : Should information be treated like other types of property (e.g. clothing, cars)? What are the implications of this for students, educators, the private sector.
What would be advantages and disadvantages of organization : What would be advantages and disadvantages of an organization's Human Resources department aligning its strategy with the overall organizational strategy?
What is the concept of globally right source : What is the concept of Globally Right source and how can an organization use this to add value to their activities to increase their competitive positions?
Do you think ict can be used to expand access to education : Do you think ICT can be used to expand access to education and improve educational quality? Can ICT contribute to poverty alleviation? If so, how?
MPH406 Quantitative Methods Assignment problem : MPH406 Quantitative Methods Assignment help and solution, Curtin University, Assessment help - Calculate the predicted mean weight loss low and high exercise
How are issues of poverty and development portrayed in media : How are issues of poverty and development portrayed in the media? How does this reflect upon the international development sector as a whole?
Describe the concepts of progress and development : Meyer et al. (1997) ask ‘If an unknown society were "discovered" on a previously unknown island' what changes would occur? They predict that the following would
PUBH6008 Capstone Applied Research Project in Public Health : PUBH6008: Capstone A: Applied Research Project in Public Health Assessment - Research proposal, Assignment Help, Laureate International Universities, Australia
Analyze the techniques that you used to extract data : Using the information below respond to the discussion question provided to you by your instructor. Based on your Ashford University major of study.



9/27/2019 11:56:26 PM

INSTRUCTIONS: 1) Open corresponding data sets or paper. Answer all questions listed in the assignment. You need Stata to complete this assignment. You also need to copy and paste relevant Stata outputs to the assignment against those questions, which request you to do so. Marks will be deducted for missing outputs, but do not include more than one copy of each table/graph. Do not submit Stata output separately with your assignment. 2) Step 4: Save your completed assignment as one Word document (other format will not be accepted for marking), and now you are ready for submission. 1) Step 1: Submit your assignment to Turnitin (plagiarism detection software) via the "Turnitin Assignment (CSA)" 2) Step 2: Make revision according to the Originality Report from Turnitin in Step 1. You need to resubmit a revised Assignment to "Turnitin Assignment (CSA): Revision 1". This is the final version for marking.

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models


Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd