Calculate the value of the test statistic

Assignment Help Applied Statistics
Reference no: EM132315617

Principles of Statistical Inference (PSI) Assignment -

Question 1 - Neurofibromatosis Type 1 (NF1) is a human genetic disorder. As well as physical symptoms, affected children often suffer from impaired cognition and learning. A learning task that involves recognising and remembering the location of patterns on a screen is administered. If the child makes an error the task is presented again, and the number of attempts recorded. We are interested in estimating the population mean number of unsuccessful attempts before solving the task correctly in children with NF1 and in healthy controls.

Although the Poisson distribution is often used for statistical models of count data, data which exhibit greater than expected variability ("overdispersion") may be modelled by the negative binomial distribution, which has probability function

fX(x) = P(X = x) = Γ(k+x)/(Γ(x+1)Γ(k))(u/(k+μ))x(k/(k+μ))k

where

x = 0, 1, 2, . . .

μ > 0 is the mean

k > 0 is known as the dispersion parameter

Assume that n1 typically developing "control" children are each given this task, and the number of unsuccessful attempts taken by child ???? is described by the random variable Xi, which has a negative binomial distribution with mean μ1. A further n2 children with NF1 are given the task, and their number of unsuccessful attempts are described by the random variable Yi, which has a negative binomial distribution with mean μ2.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "NF1" are the observed values from attempting this task in two groups of children:

xi for n1 = 42 control children without NF1

yi for n2 = 107 children with NF1.

When evaluating with the above data, assume that the dispersion parameter k = 10 and is the same in both populations. However, derive all results in general terms for any value of k before evaluating numerical results for this known value of k.

We wish to test whether the population means are equal, i.e. to test the null hypothesis

H0: μ1 = μ2 versus H1: μ1 ≠ μ2

Carry out a likelihood ratio test of H0: μ1 = μ2

a) Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs for μ1 and μ2. Compute the MLEs for the observed data above. Obtain the maximum value achieved by the log-likelihood in the full model [the constant term should be omitted in this calculation].

b) Write down the likelihood function in the reduced model, i.e. under the assumption that a common parameter μ = μ1 = μ2 can be used to describe the number of unsuccessful attempts in both populations. Derive the MLE for μ, first in general terms and then for the data above. Obtain the maximum value achieved by the log-likelihood under the reduced model [as above, the constant must be omitted].

c) Using your results from parts (a) and (b) write down the likelihood ratio test statistic for testing the null hypothesis H0: μ1 = μ2, evaluate the test statistic and compute its p-value. What do you conclude about the mean number of unsuccessful attempts in the two populations?

Carry out a Wald test of H0: μ1 = μ2.

d) Compute the expected information and the asymptotic variance-covariance matrix of the MLE's in the full model. (In general terms, i.e. don't substitute in data at this stage).

Consider the new parameter δ = μ1 - μ2 in the full model; then testing H0 is equivalent to testing δ = 0.

e) Using your answers for the previous questions only (do not rewrite the log-likelihood and derive these parameters), give the MLE for δ and its standard error in general terms. Justify your answers.

f) Calculate the MLE and its standard error for the data above and calculate a 95% confidence interval for δ.

g) Give the formula for an approximate Wald test of δ = 0. Calculate the value of the test statistic and its associated p-value, using the data. What can you conclude about the mean number of unsuccessful attempts in the two populations using only this information?

h) Are your conclusions from the two hypothesis tests consistent with each other?

Question 2 - In Phase I clinical trials the maximum tolerated dose (MTD) of a drug treatment is often chosen by first finding the dose at which no more than one patient in a cohort of six experiences any dose limiting toxicities (DLT). The recommended dose for further development is often then set to a dose below the MTD.

Assuming that the number (X) of patients experiencing a DLT in a group of six patients follows a binomial distribution with parameter p

X ∼ Bin(6, p)

a) Write down the probability that at most one patient experiences a DLT.

b) Tabulate this probability for values of p = 0.05, 0.10,..., 0.95.

How high does the parameter p need to be before the probability of at most one patient with DLT being observed becomes < 0.1? Use the above values only - you do not need to calculate it exactly.

c) What conclusion could you make about the underlying rate of DLT if we observe at most one patient with DLT in a cohort of 6 patients?

For a particularly severe outcome (toxic death), we wish to be confident that the population rate (pt) is at most 10%. Such an event is highly unlikely to be observed in a cohort of 6 patients if pt is as low as 10%, so the investigators propose a Bayesian monitoring rule for the next study. This is designed to trigger stopping of the trial if the posterior probability pt > 0.1 exceeds 75%.

We assume a prior Beta(1, 3) distribution for pt and assume that the number of toxic deaths follows a Bin(n, pt) distribution.

d) Suggest why a Beta prior distribution has been chosen.

e) What is the prior probability that pt > 0.1?

f) Give the posterior distribution for pt if d toxic deaths are observed in the first n patients.

g) If no toxic deaths are observed in the first 10 patients, what is the posterior probability that pt > 0.1? Suppose that the next two patients (i.e. patients 11 and 12) both experience toxic death. Would you consider stopping the study?

h) Tabulate the posterior probabilities that pt > 0.1 for 2/20, 4/40, 6/60, 8/80 and 10/100 observed toxic deaths.

What do you notice about these probabilities? Discuss this in terms of the behaviour of the posterior distribution as data accumulate.

i) Plot the prior distribution and the posterior distribution after 10/100 observed toxic deaths on a single graph.

j) If the study had continued to observe this number of events (10/100) what would you conclude about the prevalence of toxic death in the study?

Question 3 - The six minute walk test is used as a measure of exercise tolerance in a number of medical conditions. It involves walking as far as possible during a six minute time period on a flat straight track of length 30 metres. An exercise physiologist wishes to perform this test on a group of 48 children with Perthes disease.

It is not known whether there is any suitable parametric model for the walk distances so we will investigate non-parametric methods.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "SixMWT" are the observed distances walked by the 48 children.

a) Calculate appropriate summary statistics and thus give the parameters for a normal distribution that may be applicable to these data.

b) Using the observed data, calculate the empirical distribution function. Plot the empirical distribution function and the CDF of the normal distribution described in (a) on a single graph.

c) Do you think the normal distribution is an appropriate model for the data? Justify your answer.

The mean six-minute walk distance in healthy control children is 503 metres.

Carry out the Wilcoxon signed-rank test on these data to test the null hypothesis that the mean walk distance for children with Perthes disease is the same as for healthy controls.

d) Calculate the value of the test statistic and give the approximate normal distribution of the test statistic under the null hypothesis.

e) Calculate the p-value for the test assuming a two-sided alternative hypothesis. Interpret the p-value.

f) What do you conclude about the mean walk distance for children with Perthes disease compared to healthy controls?

g) Describe in a few sentences how you would calculate a 95% confidence interval for the mean distance without assuming any particular parametric model for the data.

You do not need to calculate the confidence interval.

Attachment:- Assignment & Data File.rar

Reference no: EM132315617

Questions Cloud

What is the maximum allowable deduction that lisa may take : Lisa Co. was organized on January 4, Year 2. For the year ended December 31, Year 2, Lisa had taxable income of $550,000 before charitable contributions.
What is the average collection period in days : What is the average collection period in days? Sales on credit - 40,000,000 Beginning accounts receivable - 1,000,000.
What impact does changing the discount rate have : What impact does changing increasing or decreasing the discount rate have on the calculation of impairment losses.
Advanced waste management in australia : Advanced waste management in Australia -Provide a clear and concise overview of project. Present, aims discussed, potential findings and conclusions to be drawn
Calculate the value of the test statistic : Principles of Statistical Inference (PSI) - Calculate the value of the test statistic and give the approximate normal distribution
Prepare the march 31 general journal entry to record : The overhead application rate is based on direct labor hours. Prepare the March 31 General Journal entry to record the factory overhead costs.
Calculate the year-end price for amid : Calculate the year-end price for AMID , computing the compound value of the beginning - of - year price of $5. 69 per share for 12 months.
Compute the annual rate of return for amid : Compute the annual rate of return for AMID using the beginning stock price for the period and the ending price i.e. $5. 69 and $1. 88.
How does thoroughly analyzing financial information help : How does thoroughly analyzing financial information help companies make good business decisions?

Reviews

Write a Review

Applied Statistics Questions & Answers

  What descriptive statistics about motion picture industry

Descriptive statistics for each of the four variables along with a discussion of what the descriptive statistics tell us about the motion picture industry

  Comparison of probabilities in your explanation

How many free premium channels should the research director recommend for inclusion in the 3-For-All service ?

  Do the data support the twin-blade manufacturers claim

Do the data support the twin-blade manufacturer's claim? ?=.05. This is a t-test: Two-sample independent groups assuming equal variances.

  The college board finds that the distribution of students

The College Board finds that the distribution of students

  Wilcoxon rank-sum test

Determine the significance of the difference between the groups and determine whether building systems helped reduce new cases of malaria.

  Perform a profit and sales analysis of the Western US region

Assignment Task - Perform a profit and sales analysis of the Western US region using the spreadsheet (global superstore) in tableau

  What variables provide a significant unique contributions

Do the independent variables correlate statistically significantly and practicallywith the dependent variable and is collinearity between the independent variables a concern?

  Discuss non-parametric distribution and tests

Discuss non-parametric distribution and tests, and distinguish between observed and expected datum - Choose a Chi Square test when appropriate per the study

  The table below displays a week''s worth of data

The table below displays a week's worth of data on daily sales at the Crank It Louder Music Store.  Over that time period, what was the mean daily level of sales?

  A probability density function

Show that f(x) = 1/x^2 on [1, infinity) is a probability density function, then find the expected value and standard deviation?

  What are the main objectives of experimental design

What are the main objectives of Experimental Design - What is Randomization? List the situation in which randomization is very important.

  The distribution of the cost of all trips is normal

Suppose the distribution of the cost of all trips is normal with an average of 112 dollars and a standard deviation of 11 dollars. What is the value of the cost that falls of the 7th percentile?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd