Economic Evaluation Assignment -

Exercise 1: Regression analysis

A cohort study was undertaken to explore the effectiveness and cost-effectiveness of a new treatment for patients with allergies. The study recruited 300 patients who were then allocated to receive a new treatment or the old one. The data contain information on the total pharmacy, primary care, and hospital costs (TC) in $AUD accumulated within 1 year. Age is the age of the patient in years and Male is equal to 1 for males and 0 for females. NI is equal to 1 if they were randomised to the new intervention and 0 if they were randomised to the old one. EQ-5D is the health-related quality of life at the start of the trial.

(a) What is the mean annual total cost and the standard deviation of this mean for the 300 patients?

(b) What is the mean annual total cost and the standard deviation of this mean for the 150 patients randomised to the new treatment?

(c) What is the mean annual total cost and the standard deviation of this mean for the 150 patients randomised to the old treatment?

(d) Estimate the linear regression model below (where the β's are parameters to be estimated and ? is random error;

TC = β01NI + ε

Write out the estimated equation in the form above.

(e) A health economist suggests that estimating the following equation would be better to estimate the impact of the new treatment on cost:

TC = β0 + β1NI + β2Age + β3Male + β4EQ5D + ε

Write out this estimated equation and discuss the merit of using this equation versus the simple equation estimated above.

(f) Based on the equations estimated above does the new treatment affect the health costs compared to the old one at the 5% level of significance? Explain why.

(g) Based on the new equation estimated in (c) predict the health care costs in the first year of a 70-year-old female with allergy who is on the new treatment and had an EQ-5D at baseline of 0.7. What are the sources of uncertainty in this prediction?

(h) Discuss the outputs of your regression in point (e).

(i) An expert suggests that the impact of the new inhaler on costs is likely to depend on severity of the individuals' asthma to begin with (where EQ-5D may provide a proxy for this). Comment on whether this is already being taken into account in the regression and if not how you might adjust the regression to take this into account (Tip: Think about interactions).

(ii) Interpret the findings from your regressions (d), (e), and (i).

Exercise 2: Capturing uncertainty using bootstrap in Stata

(a) Set the seed for the random generator at 12345 using the following command: "set seed 12345", then run the command "bsample" to create a new bootstrap sample. Use the summarize command to obtain mean total cost in the new sample. Load the original data set into Stata again, run the command "bsample" again and obtain mean total cost. Perform the bootstrap on the original data set for 10 times and record the sample means.

What is the mean of 10 sample means?

(b) It is time-consuming to answer question (a) by manually running the same set of commands, especially when the number of bootstrap replications is large. Stata provide the command "forvalues" to loop over the consecutive values, which can be used to automatically execute a block of commands repeatedly. Read help for this command and use it to answer question (a) without performing any manual calculation. The following steps serve as a guide:

  • Use command "matrix M = J(10, 1, .)" to create a matrix of 10 rows and 1 column which contains missing values
  • Set the seed for the random generator at 12345
  • Use the command "forvalues i = 1(1)10 { }" to loop over i 10 times
  • Inside the for-loop, i.e. inside {}, read in the original data set and create a new bootstrap sample. Then compute mean total cost using the command "summary" and extract the mean value using the command "r(mean)" and assign it to row i of column 1 of the matrix M. This can be done using the following command: matrix M['i', 1] = r(mean). Note the forward slanting single quote before i (on the key to the left of the number 1 key) and the single quote after i.
  • Outside the for-loop, save the matrix M to the last column of the current data set using the following command: svmat M

What is mean and standard deviation of the means?


Standard deviation:

(c) Use the technique in (b) to create 1000 bootstrap samples and compute mean and standard deviation of the sample means. Remember to set the seed at 12345 before the for-loop, and set the maximum size for the matrix using the command: "set matsize 1000" before creating the matrix.


Standard deviation:

(d) Compare your findings with the findings of Exercise 1.

(e) Create a matrix named C_all_with 1000 rows and 4 columns. Set the seed at 12345. Use the for-loop to create 1000 bootstrap samples from the original data set. For each bootstrap sample, fit a linear regression for total cost with 4 predictors being NI, Age, male and EQ5D. Type "help regress" in the command window to read help on this command. After the "regress" command, extract the coefficients of the fitted model and assign them to a matrix using the following command: matrix C_sample = e(b). Then transfer each of the coefficients in the matrix C_sample to the matrix C_all as follows:

matrix C_all['i' , 1] = C[1, 1]

matrix C_all['i' , 2] = C[1, 2]

matrix C_all['i' , 3] = C[1, 3]

matrix C_all['i' , 4] = C[1, 4]

Use the command "svmat C_all, names(col) " to save the matrix to the data set.

What are the means of the bootstrapped coefficients in columns c1-c4? How are these compared to those estimated in Exercise 1 (c)? Discuss.

Compute 2.5 and 97.5 percentiles of the sampled coefficients for NI (the variable c1) using the following commands:

egen p025 = pctile(c1), p(2.5)

egen p975 = pctile(c1), p(97.5)

Browse the data set and compare p025 and p075 with the lower and upper bounds of the 95% CI of the coefficient for NI estimated in Exercise 1 (c). Discuss your findings.

Attachment:- Assignment Files.rar

