Reference no: EM133256623
Assignment - Research Methods and Statistics
Question 1. For a particular illness, there are two possible treatments: Medicine A and Medicine B. It is known that there is an unwanted, but non-life-threatening, side effect of both of these medicines. A hospital is conducting a study to determine if these medicines are equally likely to cause this side effect to help them determine which medicine they should use. Over a period of 3 months the hospital randomly allocates one of these medicines to patients with this particular illness.
During the study period 250 patients were given Medicine A, and 135 of these patients had the side effect. During the study period 240 patients were given Medicine B, and 96 of these patients had the side effect.
You will perform a hypothesis test to determine if there is a difference between the two medicines in terms of the probability that a person taking that medicine will have the side effect.
In the following parts all calculations must be done by hand, unless otherwise specified.
(a) What kind of hypothesis test will you be doing?
(b) Define the variables and parameter of interest.
(c) What are the null and alternative hypotheses?
(d) Calculate the estimate for the parameter of interest.
(e) Calculate the appropriate test statistic.
(f) Calculate the p-value (you will need to use MATLAB).
(g) Based on the p-value do you reject or retain the null hypothesis at the 5% level of significance? Provide justification for your answer.
(h) Summarise your conclusion from Part (g) in the context of the question. (i) Calculate the 95% confidence interval for the parameter of interest. (j)(1 point) Summarise your confidence interval in the context of the question.
Question 2. The NOAA Atlantic hurricane database contains data about the positions and attributes of storms in the Atlantic from 1975 - 2020, measured every six hours during the lifetime of a storm.
Two of these attributes are the storm's maximum sustained wind speed in knots (wind) and the air pressure at the storm's center in millibars (pressure). This data, storm data.xlsx, can be found on the Assignment 6 page on MyUni. Perform a linear regression analysis to investigate the relationship between wind and pressure by completing the following steps.
In these steps, assume that wind is the predictor variable and pressure is the response
variable.
(a) Load the data into MATLAB and perform a linear regression using the MATLAB command fitlm(). Provide your code and output.
(b) Use MATLAB to create an appropriate scatterplot of this regression and add the line of best fit to your plot. Make sure your axes are labelled.
(c) Based on the output of your linear regression, write down the equation of the line of best fit.
(d) Perform an appropriate hypothesis test to determine whether there is a statistically significant linear relationship between wind and pressure at the 5% level of signifi- cance by completing the following steps.
i. Write down the appropriate null and alternative hypotheses.
ii. From your MATLAB output in Part (a), state the observed value of the test statistic.
iii. From your MATLAB output in Part (a), state the p-value.
iv. Based on your p-value do you retain or reject your null hypothesis at the 5% level of significance? Provide justification for your answer.
v. Write your conclusion in the context of the question.
(e) An important part of performing a linear regression is assumption checking. Complete the following steps to check if our assumptions are valid.
i. Using MATLAB, create a residuals vs fitted values scatterplot. Based on this plot, are our assumptions of linearity and constant spread valid? Provide justification for your answers.
ii. Using MATLAB, create a normal probability plot of the residuals. Based on this plot, is our assumption of normality valid? Provide justification for your answer.
iii. A statistician has looked at this problem and believes the assumption of independence is not valid. Provide a reason why the statistician believes this.
Question 3. In order to produce better wines, and hence improve sales, researchers have studied how the physiochemical properties of wine affect its quality. In the paper, Modeling wine pref- erences by data mining from physicochemical properties, the authors note that an increase in alcohol often improves quality of vinho verde white wine.
We will consider a predictive model for the alcohol content in wine (alcohol) based on four predictor variables, the amount of residual sugar in the wine (residual sugar), the amount of chlorides in the wine (chlorides), the pH value of the wine (pH), and the amount of sulphates in the wine (sulphates). The data we are using comes from the aforementioned study, and includes a sample of 4898 vinho verde white wines. The output of the multiple linear regression in MATLAB is:
>> X = table(residual_sugar, chlorides, pH, sulphates, alcohol);
>> wine_model = fitlm(X) wine_model =
Linear regression model:
alcohol ~ 1 + residual_sugar + chlorides + pH + sulphates
Estimated Coefficients:
Estimate SE tStat pValue
(Intercept) 11.743 0.32436 36.203 1.6755e-254
residual_sugar -0.10189 0.0029509 -34.528 5.3552e-234
chlorides -18.091 0.67513 -26.796 1.0428e-147
pH 0.12099 0.10036 1.2056 0.22805
sulphates -0.27563 0.12996 -2.1208 0.033986
Number of observations: 4898, Error degrees of freedom: 4893 Root Mean Squared Error: 1.02
R-squared: 0.307, Adjusted R-Squared: 0.307
F-statistic vs. constant model: 542, p-value = 0
When performing multiple linear regression we determine our regression model by ini- tially including all predictor variables, and then we perform hypothesis tests on each pre- dictor variable to determine if it could be removed from the model. Determine if either residual sugar or pH could be removed from our model by completing the following steps.
(a) To test if residual sugar can be removed from our model, we have the following hypotheses:
H0 : βresidual sugar = 0,
HA : βresidual sugar 0,
where βresidual sugar is the regression coefficient for residual sugar.
i. Based on the given MATLAB output, state the p-value for this test. ii. Based on the p-value, do we reject or retain the null hypothesis at the 5% level of significance? Provide justification for your answer.
(b) To test if pH can be removed from our model, we have the following hypotheses:
H0 : βpH = 0,
HA : βpH ?= 0,
where βpH is the regression coefficient for pH.
i. Based on the given MATLAB output, state the p-value for this test. ii. Based on the p-value, do we reject or retain the null hypothesis at the 5% level of significance? Provide justification for your answer.
(c) Based on your conclusions from Parts (a) and (b), if you were going to re- duce this model, would you keep residual sugar or pH in the model?
Note: Need Question 2 solution only
Attachment:- Research Methods and Statistics.rar