Calculate the likelihood equations

Assignment Help Applied Statistics
Reference no: EM132324315

Principles of Statistical Inference (PSI) Assignment -

Question 1 - Neurofibromatosis Type 1 (NF1) is a human genetic disorder. As well as physical symptoms, affected children often suffer from impaired cognition and learning. A learning task that involves recognising and remembering the location of patterns on a screen is administered. If the child makes an error the task is presented again, and the number of attempts recorded. We are interested in estimating the population mean number of unsuccessful attempts before solving the task correctly in children with NF1 and in healthy controls.

Although the Poisson distribution is often used for statistical models of count data, data which exhibit greater than expected variability ("overdispersion") may be modelled by the negative binomial distribution, which has probability function

fX(x) = P (X = x) = (Γ(k+x))/(Γ(x+1)Γ(K)) (μ/(k+μ))x(k/(k+μ))k

where

x = 0, 1, 2,...

μ > 0 is the mean

k > 0 is known as the dispersion parameter

Assume that n1 typically developing "control" children are each given this task, and the number of unsuccessful attempts taken by child i is described by the random variable Xi, which has a negative binomial distribution with mean μ1. A further n2 children with NF1 are given the task, and their number of unsuccessful attempts are described by the random variable Yi, which has a negative binomial distribution with mean μ2.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "NF1" are the observed values from attempting this task in two groups of children:

  • xi for n1 = 42 control children without NF1
  • yi for n2 = 107 children with NF1.

When evaluating with the above data, assume that the dispersion parameter k = 10 and is the same in both populations. However, derive all results in general terms for any value of k before evaluating numerical results for this known value of k.

We wish to test whether the population means are equal, i.e. to test the null hypothesis

H0: μ1 = μ2 versus H1: μ1 ≠ μ2

Carry out a likelihood ratio test of H0: μ1 = μ2

a) Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs for μ1 and μ2. Compute the MLEs for the observed data above. Obtain the maximum value achieved by the log-likelihood in the full model [the constant term should be omitted in this calculation].

b) Write down the likelihood function in the reduced model, i.e. under the assumption that a common parameter μ = μ1 = μ2 can be used to describe the number of unsuccessful attempts in both populations. Derive the MLE for μ, first in general terms and then for the data above. Obtain the maximum value achieved by the log-likelihood under the reduced model [as above, the constant must be omitted].

c) Using your results from parts (a) and (b) write down the likelihood ratio test statistic for testing the null hypothesis H0: μ1 = μ2, evaluate the test statistic and compute its p-value. What do you conclude about the mean number of unsuccessful attempts in the two populations?

Carry out a Wald test of H0 = μ1 = μ2.

d) Compute the expected information and the asymptotic variance-covariance matrix of the MLE's in the full model. (In general terms, i.e. don't substitute in data at this stage).

Consider the new parameter δ = μ1 - μ2 in the full model; then testing H0 is equivalent to testing δ = 0.

e) Using your answers for the previous questions only (do not rewrite the log-likelihood and derive these parameters), give the MLE for δ and its standard error in general terms. Justify your answers.

f) Calculate the MLE and its standard error for the data above and calculate a 95% confidence interval for δ.

g) Give the formula for an approximate Wald test of δ = 0. Calculate the value of the test statistic and its associated p-value, using the data. What can you conclude about the mean number of unsuccessful attempts in the two populations using only this information?

h) Are your conclusions from the two hypothesis tests consistent with each other?

Question 2 - In Phase I clinical trials the maximum tolerated dose (MTD) of a drug treatment is often chosen by first finding the dose at which no more than one patient in a cohort of six experiences any dose limiting toxicities (DLT). The recommended dose for further development is often then set to a dose below the MTD.

Assuming that the number (X) of patients experiencing a DLT in a group of six patients follows a binomial distribution with parameter p

X ∼ Bin(6, p)

a) Write down the probability that at most one patient experiences a DLT.

b) Tabulate this probability for values of p = 0.05, 0.10,..., 0.95.

How high does the parameter ???? need to be before the probability of at most one patient with DLT being observed becomes < 0.1? Use the above values only - you do not need to calculate it exactly.

c) What conclusion could you make about the underlying rate of DLT if we observe at most one patient with DLT in a cohort of 6 patients?

For a particularly severe outcome (toxic death), we wish to be confident that the population rate (pt) is at most 10%. Such an event is highly unlikely to be observed in a cohort of 6 patients if pt is as low as 10%, so the investigators propose a Bayesian monitoring rule for the next study. This is designed to trigger stopping of the trial if the posterior probability pt > 0.1 exceeds 75%.

We assume a prior Beta(1, 3) distribution for pt and assume that the number of toxic deaths follows a Bin(n, pt) distribution.

d) Suggest why a Beta prior distribution has been chosen.

e) What is the prior probability that pt > 0.1?

f) Give the posterior distribution for pt if ???? toxic deaths are observed in the first n patients.

g) If no toxic deaths are observed in the first 10 patients, what is the posterior probability that pt > 0.1? Suppose that the next two patients (i.e. patients 11 and 12) both experience toxic death. Would you consider stopping the study?

h) Tabulate the posterior probabilities that pt > 0.1 for 2/20, 4/40, 6/60, 8/80 and 10/100 observed toxic deaths.

What do you notice about these probabilities? Discuss this in terms of the behaviour of the posterior distribution as data accumulate.

i) Plot the prior distribution and the posterior distribution after 10/100 observed toxic deaths on a single graph.

j) If the study had continued to observe this number of events (10/100) what would you conclude about the prevalence of toxic death in the study?

Question 3 - The six minute walk test is used as a measure of exercise tolerance in a number of medical conditions. It involves walking as far as possible during a six minute time period on a flat straight track of length 30 metres. An exercise physiologist wishes to perform this test on a group of 48 children with Perthes disease.

It is not known whether there is any suitable parametric model for the walk distances so we will investigate non-parametric methods.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "SixMWT" are the observed distances walked by the 48 children.

a) Calculate appropriate summary statistics and thus give the parameters for a normal distribution that may be applicable to these data.

b) Using the observed data, calculate the empirical distribution function. Plot the empirical distribution function and the CDF of the normal distribution described in (a) on a single graph.

c) Do you think the normal distribution is an appropriate model for the data? Justify your answer.

The mean six-minute walk distance in healthy control children is 503 metres.

Carry out the Wilcoxon signed-rank test on these data to test the null hypothesis that the mean walk distance for children with Perthes disease is the same as for healthy controls.

d) Calculate the value of the test statistic and give the approximate normal distribution of the test statistic under the null hypothesis.

e) Calculate the p-value for the test assuming a two-sided alternative hypothesis. Interpret the p-value.

f) What do you conclude about the mean walk distance for children with Perthes disease compared to healthy controls?

g) Describe in a few sentences how you would calculate a 95% confidence interval for the mean distance without assuming any particular parametric model for the data. You do not need to calculate the confidence interval.

Attachment:- Assignment File.rar

Reference no: EM132324315

Questions Cloud

Prepare a draft related to your site using given details : Initial Draft - For this assignment, you're going to begin to work on your site. Based on your storyboard and client feedback (professor's comments).
Write a program to correctly import the data : STA 581 Programming Project - Write a program to correctly import the data as an .xls file and create a SAS data file, named 'weight_mult1'
Explain the importance of documentation in forensic analysis : Decide whether software-generated reports assist with this specific portion of the report writing process and provide a rationale for your response.
How the information could potentially be used as evidence : Describe the information that can be discovered in email headers and determine how this information could potentially be used as evidence in the investigation.
Calculate the likelihood equations : Principles of Statistical Inference - Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs
Discuss recommendation using a corporate blog for branding : Identify and briefly discuss one recommendation that should be followed when using a corporate blog for branding, marketing, or public relations purposes.
Creating a community health promotion resource : Imagine you are creating a community health promotion resource that addresses a disease of your choice. You are creating this resource for the general public.
Develop product service idea by social media and networking : Using social media and Networking: Develop a product service idea. Describe the product/service including the benefits of using the product/service.
Identify missing phrase to complete granular definition : To further enhance our knowledge and understanding of RM, ISO provided a more refined definition of RM to a granular level as "[the] field of management.

Reviews

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd