Calculate the value of the test statistic

Assignment Help Applied Statistics

Reference no: EM132315617

Principles of Statistical Inference (PSI) Assignment -

Question 1 - Neurofibromatosis Type 1 (NF1) is a human genetic disorder. As well as physical symptoms, affected children often suffer from impaired cognition and learning. A learning task that involves recognising and remembering the location of patterns on a screen is administered. If the child makes an error the task is presented again, and the number of attempts recorded. We are interested in estimating the population mean number of unsuccessful attempts before solving the task correctly in children with NF1 and in healthy controls.

Although the Poisson distribution is often used for statistical models of count data, data which exhibit greater than expected variability ("overdispersion") may be modelled by the negative binomial distribution, which has probability function

f_X(x) = P(X = x) = Γ(k+x)/(Γ(x+1)Γ(k))(u/(k+μ))^x(k/(k+μ))^k

where

x = 0, 1, 2, . . .

μ > 0 is the mean

k > 0 is known as the dispersion parameter

Assume that n₁ typically developing "control" children are each given this task, and the number of unsuccessful attempts taken by child ???? is described by the random variable X_i, which has a negative binomial distribution with mean μ₁. A further n₂ children with NF1 are given the task, and their number of unsuccessful attempts are described by the random variable Y_i, which has a negative binomial distribution with mean μ₂.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "NF1" are the observed values from attempting this task in two groups of children:

x_i for n₁ = 42 control children without NF1

y_i for n₂ = 107 children with NF1.

When evaluating with the above data, assume that the dispersion parameter k = 10 and is the same in both populations. However, derive all results in general terms for any value of k before evaluating numerical results for this known value of k.

We wish to test whether the population means are equal, i.e. to test the null hypothesis

H₀: μ₁ = μ₂ versus H₁: μ₁ ≠ μ₂

Carry out a likelihood ratio test of H₀: μ₁ = μ₂

a) Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs for μ₁and μ₂. Compute the MLEs for the observed data above. Obtain the maximum value achieved by the log-likelihood in the full model [the constant term should be omitted in this calculation].

b) Write down the likelihood function in the reduced model, i.e. under the assumption that a common parameter μ = μ₁= μ₂can be used to describe the number of unsuccessful attempts in both populations. Derive the MLE for μ, first in general terms and then for the data above. Obtain the maximum value achieved by the log-likelihood under the reduced model [as above, the constant must be omitted].

c) Using your results from parts (a) and (b) write down the likelihood ratio test statistic for testing the null hypothesis H0: μ₁= μ₂, evaluate the test statistic and compute its p-value. What do you conclude about the mean number of unsuccessful attempts in the two populations?

Carry out a Wald test of H₀: μ₁= μ₂.

d) Compute the expected information and the asymptotic variance-covariance matrix of the MLE's in the full model. (In general terms, i.e. don't substitute in data at this stage).

Consider the new parameter δ = μ₁ - μ₂ in the full model; then testing H₀ is equivalent to testing δ = 0.

e) Using your answers for the previous questions only (do not rewrite the log-likelihood and derive these parameters), give the MLE for δ and its standard error in general terms. Justify your answers.

f) Calculate the MLE and its standard error for the data above and calculate a 95% confidence interval for δ.

g) Give the formula for an approximate Wald test of δ = 0. Calculate the value of the test statistic and its associated p-value, using the data. What can you conclude about the mean number of unsuccessful attempts in the two populations using only this information?

h) Are your conclusions from the two hypothesis tests consistent with each other?

Question 2 - In Phase I clinical trials the maximum tolerated dose (MTD) of a drug treatment is often chosen by first finding the dose at which no more than one patient in a cohort of six experiences any dose limiting toxicities (DLT). The recommended dose for further development is often then set to a dose below the MTD.

Assuming that the number (X) of patients experiencing a DLT in a group of six patients follows a binomial distribution with parameter p

X ∼ Bin(6, p)

a) Write down the probability that at most one patient experiences a DLT.

b) Tabulate this probability for values of p = 0.05, 0.10,..., 0.95.

How high does the parameter p need to be before the probability of at most one patient with DLT being observed becomes < 0.1? Use the above values only - you do not need to calculate it exactly.

c) What conclusion could you make about the underlying rate of DLT if we observe at most one patient with DLT in a cohort of 6 patients?

For a particularly severe outcome (toxic death), we wish to be confident that the population rate (p_t) is at most 10%. Such an event is highly unlikely to be observed in a cohort of 6 patients if p_t is as low as 10%, so the investigators propose a Bayesian monitoring rule for the next study. This is designed to trigger stopping of the trial if the posterior probability p_t > 0.1 exceeds 75%.

We assume a prior Beta(1, 3) distribution for p_t and assume that the number of toxic deaths follows a Bin(n, p_t) distribution.

d) Suggest why a Beta prior distribution has been chosen.

e) What is the prior probability that p_t > 0.1?

f) Give the posterior distribution for p_t if d toxic deaths are observed in the first n patients.

g) If no toxic deaths are observed in the first 10 patients, what is the posterior probability that p_t > 0.1? Suppose that the next two patients (i.e. patients 11 and 12) both experience toxic death. Would you consider stopping the study?

h) Tabulate the posterior probabilities that p_t > 0.1 for 2/20, 4/40, 6/60, 8/80 and 10/100 observed toxic deaths.

What do you notice about these probabilities? Discuss this in terms of the behaviour of the posterior distribution as data accumulate.

i) Plot the prior distribution and the posterior distribution after 10/100 observed toxic deaths on a single graph.

j) If the study had continued to observe this number of events (10/100) what would you conclude about the prevalence of toxic death in the study?

Question 3 - The six minute walk test is used as a measure of exercise tolerance in a number of medical conditions. It involves walking as far as possible during a six minute time period on a flat straight track of length 30 metres. An exercise physiologist wishes to perform this test on a group of 48 children with Perthes disease.

It is not known whether there is any suitable parametric model for the walk distances so we will investigate non-parametric methods.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "SixMWT" are the observed distances walked by the 48 children.

a) Calculate appropriate summary statistics and thus give the parameters for a normal distribution that may be applicable to these data.

b) Using the observed data, calculate the empirical distribution function. Plot the empirical distribution function and the CDF of the normal distribution described in (a) on a single graph.

c) Do you think the normal distribution is an appropriate model for the data? Justify your answer.

The mean six-minute walk distance in healthy control children is 503 metres.

Carry out the Wilcoxon signed-rank test on these data to test the null hypothesis that the mean walk distance for children with Perthes disease is the same as for healthy controls.

d) Calculate the value of the test statistic and give the approximate normal distribution of the test statistic under the null hypothesis.

e) Calculate the p-value for the test assuming a two-sided alternative hypothesis. Interpret the p-value.

f) What do you conclude about the mean walk distance for children with Perthes disease compared to healthy controls?

g) Describe in a few sentences how you would calculate a 95% confidence interval for the mean distance without assuming any particular parametric model for the data.

You do not need to calculate the confidence interval.

Attachment:- Assignment & Data File.rar

Reference no: EM132315617

Questions Cloud

What is the maximum allowable deduction that lisa may take : Lisa Co. was organized on January 4, Year 2. For the year ended December 31, Year 2, Lisa had taxable income of $550,000 before charitable contributions.

What is the average collection period in days : What is the average collection period in days? Sales on credit - 40,000,000 Beginning accounts receivable - 1,000,000.

What impact does changing the discount rate have : What impact does changing increasing or decreasing the discount rate have on the calculation of impairment losses.

Advanced waste management in australia : Advanced waste management in Australia -Provide a clear and concise overview of project. Present, aims discussed, potential findings and conclusions to be drawn

Calculate the value of the test statistic : Principles of Statistical Inference (PSI) - Calculate the value of the test statistic and give the approximate normal distribution

Prepare the march 31 general journal entry to record : The overhead application rate is based on direct labor hours. Prepare the March 31 General Journal entry to record the factory overhead costs.

Calculate the year-end price for amid : Calculate the year-end price for AMID , computing the compound value of the beginning - of - year price of $5. 69 per share for 12 months.

Compute the annual rate of return for amid : Compute the annual rate of return for AMID using the beginning stock price for the period and the ending price i.e. $5. 69 and $1. 88.

How does thoroughly analyzing financial information help : How does thoroughly analyzing financial information help companies make good business decisions?

User Account

All Pages