Investigate the assumptions of the regression model

Assignment Help Applied Statistics
Reference no: EM132365764

Linear Models (LMR) Assignment -

Part A -

Dataset: Axial_2019.dta

Background: The data for this assignment was modified from the following article - Determination of iris thickness development in children using swept-source anterior-segment optical coherence tomography.

This article uses a linear regression to understand the relationships between patient characteristics and the physical properties of the eye in a group of children. This assignment question uses modified data from the article and restricts the variables to the three below:

  • pid: Participant id number
  • axial: The axial length of the eye (mm) - the outcome for this assignment
  • age: The age of the participant
  • sex: Sex of the participant (0 = males, 1 = female)

The purpose of this assignment question is to investigate the relationship between the axial length and age, and see if sex confounds or modifies this relationship. Investigate this by answering the questions below:

Note: You do not need to investigate the assumptions of regression analysis for questions 1 - 3, as this will be done in question 4.

Question 1: Ignoring sex for now, examine the evidence for an association between axial length (outcome) and age (predictor). Interpret the key results of this analysis.

Question 2: Describe the conditions under which we would expect sex to confound the relationship in Question 1. Have these conditions been met in this data?

Now using regression methodology, examine how taking account of sex changes the relationship between axial length and age. Do you consider sex to be acting as a confounder here?

Question 3: Use regression to test for interaction (effect modification) between age and sex on the outcome of axial length. Interpret each of the regression coefficients in this analysis.

Question 4: Investigate the assumptions of the regression model you believe to be most informative to report on. This should include a clear indication of whether you believe the assumptions have been met, and justifications for these conclusions with reference to appropriate diagnostic figures. This should also include some discussion of any outliers, leverage points or influential points that might affect the conclusions, and your recommendations for handling any such data points.

Question 5: Write a conclusion suitable for non-statistician researcher that summarises your findings andanalysis. This researcher would be familiar with introductory statistics concepts such as P-values and confidence intervals but would be unfamiliar with the technical details of a regression analysis.

Part B -

Dataset: virus_2019.dta

Background: The size of genomes across organisms vary enormously. For example, the human genome is approximately 3 billion nucleotides (nt) long (a nucleotide is one of the 4 letters that make up the genetic language), the genome of the Japanese flower Paris Japonica is approximately 150 billion nucleotides long, and the genome size of a nematode is about 100 million nucleotides long. In bacteria and viruses, the genomes are much smaller, but still vary in size considerably. For example, Influenza (the flu) is only 2400nt, whereas Pandoravirus is approximately 2.4 million nucleotides long. What drives this variation in genome sizes is an active area of research. The motivation for this assignment question comes from the article - An Allometric Relationship between the Genome Length and Virion Volume of Viruses.

A study investigating this topic sampled the genome size of 82 independent viruses. For each of these viruses, the volume of the virus (in terms of the physical space it occupies) was measured. The data has the following columns

  • glength: The genome length of the virus in nucleotides (nt)
  • vvolume: The volume of the virus (nm3)

(Note: These data has been modified from the original research study)

The study would like to investigate whether there is a relationship between genome length and volume of a virus.

Question 1: Examine the evidence for a relationship between genome length and virus volume where genome length (untransformed) is the outcome and virus volume (untransformed) is the predictor variable.

Investigate the validity of the assumptions of this regression analysis. Now examine the evidence for this relationship, and the assumptions of this analysis when both variables are log transformed. Which analysis (log transformed or untransformed) do you prefer and why?

Question 2: Provide a suitable interpretation of your preferred analysis in terms that you could explain to a non- statistician who is familiar with introductory statistics concepts such as P-values and confidence intervals but would be unfamiliar with the technical details of a regression analysis.

Question 3: You are interesting in predicting the genome length of a virus with a volume of 32000nm3. Use your final model interpreted in question 2 to give the predicted mean genome length and its 95% confidence interval on the log-scale; then provide a 95% confidence interval on the original scale.

Part C -

Dataset: dosevd_2019.xlsx

Background: This part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of a Vitamin D supplement given to individuals who are Vitamin-D deficient. They perform a randomised trial in which they allocate (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a period of three months, after which the serum levels of a key metabolite of Vitamin D called 25(OH)D are measured in each participant.

Question 1 - One possible analysis of the data described is to estimate the linear effect of dose, i.e. to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses of 1000IU, 2000IU and 3000UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term:

Yi = β0 + β1X1 + εi

To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1 = n, n2 = n, n3 = 4n), then the least squares estimate of β1 is

β1 = (4Y-3 - 3Y-1 - Y-2)/7

Where Y-i is the mean of the outcome in dose group i. For simplicity we will assume that Xi = 1 for i = 1, . . ., n, Xi = 2 for i = n + 1, . . . , 2n and Xi =3 for i = 2n + 1, . . . , 6n

i) Show that the overall means for X is X- = 5/2.

ii) Show that ∑(Xi - X-)2 = 7n/2.

iii) Use these results to prove the formula above for β1.

Question 2 - The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1 and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained for β1 in question 1 is true in this sample.

Attachment:- Linear Models Assignment Files.rar

Reference no: EM132365764

Questions Cloud

How this relates to cryptography today : discuss what you have learned on steganography and how this relates to cryptography today.
What is the balance in retained earnings : Amazing Gracie's Snacks reports the following on their balance sheet: assets of $650,000, liabilities of $275,000 and capital stock of $125,000.
The value of this stock on the companys balance sheet : Assuming no other stock was issued, which of the following is the value of this stock on the company's balance sheet on December 31?
What are the potential implications for a child : What are the potential implications for a child that is enmeshed with a parent? Discuss how this can be disruptive for the parental and familial relationship.
Investigate the assumptions of the regression model : Linear Models (LMR) Assignment - Investigate the assumptions of the regression model you believe to be most informative to report on
By what amount would lbm credit capital in excess of par : LBM, Inc. issues 25,000 shares of common stock for $20 per share. The stock has a par value of $1 per share. By what amount would LBM credit capital.
Red clay renovations which addresses planning : Prepare a two-page briefing paper for the senior leadership and corporate board of Red Clay Renovations which addresses planning
Prepare a bank reconciliation at 30 june 2019 : Prepare a bank reconciliation at 30 June 2019, assuming that items 4, 5 and 6 are already recorded in the cash journals.The cheque had been received from J.
Describe the types of experts in the field of psychology : Create a PowerPoint presentation of five (5) types of experts in the field of psychology that could testify in court proceedings. For each expert.

Reviews

len2365764

9/4/2019 9:59:53 PM

Answer the questions in an essay-style approach when appropriate. Make sure to include all relevant computer output (and exclude irrelevant output), that is presented neatly and integrated through the discussion and interpretation. Do not include an appendix. Do not repeat the assignment questions. Do not include an assignment cover page. Do not include your name in the assignment (so that I can mark blind, don’t worry eLearning will track whose is whose).

len2365764

9/4/2019 9:59:46 PM

Note that there are not necessarily unique correct answers for these questions, and marks will be awarded for appropriate analysis using regression models, with corresponding justifications and explanations. Marks may be subtracted for unfocussed or disorganised presentation of material. Where equations or formula are to be presented, please attempt to do this electronically using a word processor rather than including images of scanned in hand written work. When this is not possible, scanned in written work must be extremely neat and legible. Writing mathematics electronically is an important skill to learn and practice. The size of your assignment can be quite variable and will depend on things such as table formatting and size of plots. However concise presentation and discussion is encouraged.

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd