Investigate the assumptions of the regression model

Assignment Help Applied Statistics
Reference no: EM132365764

Linear Models (LMR) Assignment -

Part A -

Dataset: Axial_2019.dta

Background: The data for this assignment was modified from the following article - Determination of iris thickness development in children using swept-source anterior-segment optical coherence tomography.

This article uses a linear regression to understand the relationships between patient characteristics and the physical properties of the eye in a group of children. This assignment question uses modified data from the article and restricts the variables to the three below:

  • pid: Participant id number
  • axial: The axial length of the eye (mm) - the outcome for this assignment
  • age: The age of the participant
  • sex: Sex of the participant (0 = males, 1 = female)

The purpose of this assignment question is to investigate the relationship between the axial length and age, and see if sex confounds or modifies this relationship. Investigate this by answering the questions below:

Note: You do not need to investigate the assumptions of regression analysis for questions 1 - 3, as this will be done in question 4.

Question 1: Ignoring sex for now, examine the evidence for an association between axial length (outcome) and age (predictor). Interpret the key results of this analysis.

Question 2: Describe the conditions under which we would expect sex to confound the relationship in Question 1. Have these conditions been met in this data?

Now using regression methodology, examine how taking account of sex changes the relationship between axial length and age. Do you consider sex to be acting as a confounder here?

Question 3: Use regression to test for interaction (effect modification) between age and sex on the outcome of axial length. Interpret each of the regression coefficients in this analysis.

Question 4: Investigate the assumptions of the regression model you believe to be most informative to report on. This should include a clear indication of whether you believe the assumptions have been met, and justifications for these conclusions with reference to appropriate diagnostic figures. This should also include some discussion of any outliers, leverage points or influential points that might affect the conclusions, and your recommendations for handling any such data points.

Question 5: Write a conclusion suitable for non-statistician researcher that summarises your findings andanalysis. This researcher would be familiar with introductory statistics concepts such as P-values and confidence intervals but would be unfamiliar with the technical details of a regression analysis.

Part B -

Dataset: virus_2019.dta

Background: The size of genomes across organisms vary enormously. For example, the human genome is approximately 3 billion nucleotides (nt) long (a nucleotide is one of the 4 letters that make up the genetic language), the genome of the Japanese flower Paris Japonica is approximately 150 billion nucleotides long, and the genome size of a nematode is about 100 million nucleotides long. In bacteria and viruses, the genomes are much smaller, but still vary in size considerably. For example, Influenza (the flu) is only 2400nt, whereas Pandoravirus is approximately 2.4 million nucleotides long. What drives this variation in genome sizes is an active area of research. The motivation for this assignment question comes from the article - An Allometric Relationship between the Genome Length and Virion Volume of Viruses.

A study investigating this topic sampled the genome size of 82 independent viruses. For each of these viruses, the volume of the virus (in terms of the physical space it occupies) was measured. The data has the following columns

  • glength: The genome length of the virus in nucleotides (nt)
  • vvolume: The volume of the virus (nm3)

(Note: These data has been modified from the original research study)

The study would like to investigate whether there is a relationship between genome length and volume of a virus.

Question 1: Examine the evidence for a relationship between genome length and virus volume where genome length (untransformed) is the outcome and virus volume (untransformed) is the predictor variable.

Investigate the validity of the assumptions of this regression analysis. Now examine the evidence for this relationship, and the assumptions of this analysis when both variables are log transformed. Which analysis (log transformed or untransformed) do you prefer and why?

Question 2: Provide a suitable interpretation of your preferred analysis in terms that you could explain to a non- statistician who is familiar with introductory statistics concepts such as P-values and confidence intervals but would be unfamiliar with the technical details of a regression analysis.

Question 3: You are interesting in predicting the genome length of a virus with a volume of 32000nm3. Use your final model interpreted in question 2 to give the predicted mean genome length and its 95% confidence interval on the log-scale; then provide a 95% confidence interval on the original scale.

Part C -

Dataset: dosevd_2019.xlsx

Background: This part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of a Vitamin D supplement given to individuals who are Vitamin-D deficient. They perform a randomised trial in which they allocate (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a period of three months, after which the serum levels of a key metabolite of Vitamin D called 25(OH)D are measured in each participant.

Question 1 - One possible analysis of the data described is to estimate the linear effect of dose, i.e. to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses of 1000IU, 2000IU and 3000UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term:

Yi = β0 + β1X1 + εi

To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1 = n, n2 = n, n3 = 4n), then the least squares estimate of β1 is

β1 = (4Y-3 - 3Y-1 - Y-2)/7

Where Y-i is the mean of the outcome in dose group i. For simplicity we will assume that Xi = 1 for i = 1, . . ., n, Xi = 2 for i = n + 1, . . . , 2n and Xi =3 for i = 2n + 1, . . . , 6n

i) Show that the overall means for X is X- = 5/2.

ii) Show that ∑(Xi - X-)2 = 7n/2.

iii) Use these results to prove the formula above for β1.

Question 2 - The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1 and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained for β1 in question 1 is true in this sample.

Attachment:- Linear Models Assignment Files.rar

Reference no: EM132365764

Questions Cloud

How this relates to cryptography today : discuss what you have learned on steganography and how this relates to cryptography today.
What is the balance in retained earnings : Amazing Gracie's Snacks reports the following on their balance sheet: assets of $650,000, liabilities of $275,000 and capital stock of $125,000.
The value of this stock on the companys balance sheet : Assuming no other stock was issued, which of the following is the value of this stock on the company's balance sheet on December 31?
What are the potential implications for a child : What are the potential implications for a child that is enmeshed with a parent? Discuss how this can be disruptive for the parental and familial relationship.
Investigate the assumptions of the regression model : Linear Models (LMR) Assignment - Investigate the assumptions of the regression model you believe to be most informative to report on
By what amount would lbm credit capital in excess of par : LBM, Inc. issues 25,000 shares of common stock for $20 per share. The stock has a par value of $1 per share. By what amount would LBM credit capital.
Red clay renovations which addresses planning : Prepare a two-page briefing paper for the senior leadership and corporate board of Red Clay Renovations which addresses planning
Prepare a bank reconciliation at 30 june 2019 : Prepare a bank reconciliation at 30 June 2019, assuming that items 4, 5 and 6 are already recorded in the cash journals.The cheque had been received from J.
Describe the types of experts in the field of psychology : Create a PowerPoint presentation of five (5) types of experts in the field of psychology that could testify in court proceedings. For each expert.

Reviews

len2365764

9/4/2019 9:59:53 PM

Answer the questions in an essay-style approach when appropriate. Make sure to include all relevant computer output (and exclude irrelevant output), that is presented neatly and integrated through the discussion and interpretation. Do not include an appendix. Do not repeat the assignment questions. Do not include an assignment cover page. Do not include your name in the assignment (so that I can mark blind, don’t worry eLearning will track whose is whose).

len2365764

9/4/2019 9:59:46 PM

Note that there are not necessarily unique correct answers for these questions, and marks will be awarded for appropriate analysis using regression models, with corresponding justifications and explanations. Marks may be subtracted for unfocussed or disorganised presentation of material. Where equations or formula are to be presented, please attempt to do this electronically using a word processor rather than including images of scanned in hand written work. When this is not possible, scanned in written work must be extremely neat and legible. Writing mathematics electronically is an important skill to learn and practice. The size of your assignment can be quite variable and will depend on things such as table formatting and size of plots. However concise presentation and discussion is encouraged.

Write a Review

Applied Statistics Questions & Answers

  A friend has offered to play a gambling game

A friend has offered to play a gambling game with you that involves flipping a coin that he has provided. Since a flip of heads will be to his advantage, you want to test the coin for fairness before you begin to play. Your friend is willing to..

  Brief literature review of factors influencing sales

Brief literature review of factors influencing sales - investigate the relationship between advertising expenditure and sales.

  The length of human pregnancies from conception

The length of human pregnancies from conception

  Find two different news stories in a mainstream media

find two different news stories in a mainstream media source cnn foxnews newsweek etc. that cite data from a recognized

  Considering a new method of assembling its golf cart

The management of White Industries is considering a new method of assembling its golf cart. The present method requires 42.3 minutes, on the average, to assemble a cart. The mean assembly time for a random sample of 64 carts, using the new method, wa..

  The actual amount of icing sugar

The actual amount of icing sugar that a filling machine puts into "180 gram" jars is well modeled as a normally distributed random variable with σ = 1.88 gram. If it is desired that only 1.2% (i.e. 0.012) of these jars are to contain less than 180.00..

  Formulate and solve a linear optimization model

Formulate and solve a linear optimization model using the auxiliary variable cells method and write a short memo to the production manager explaining the sensitivity information.

  Create a distribution portraying total estimated project

What information does the standard deviation offer us that helps us develop a better understanding of risk in this case?

  Find a confidence interval for the mean of possible yields

Using the Excel output in Figure 8.14, find a 95 percent confidence interval for the mean of all possible yields obtained using catalyst XA-100.

  What is the probability that those horses finish first

A horse race has 13 entries one person owns 3 of those horses. Assuming no ties, what is the probability that those horses finish first, second and third

  Summarize the type of research used in the article

Summarize the type of research used in the article, as defined by the text.Describe the sampling procedure that was used to recruit the participants, as defined

  What percent of the chinese giant salamanders

The lengths of Chinese giant salamanders can be modeled by a normal distribution with a mean of 113 cm and a standard deviation of 22 cm. 1. What percent of the Chinese giant salamanders do you expect to measure between 100 and 135 cm?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd