Write a conclusion suitable for non-statistician researcher

Assignment Help Basic Statistics
Reference no: EM132372428

Linear Models (LMR)

Assignment 1

Part A

Background: The data for this assignment was modified from the following article.

Article- Determination of iris thickness development in children using swept-source anterior-segment optical coherence tomography By Shunsuke Nakakura and Yuki Nagata.

It is not necessary to read this for the assignment or course, and the link is provided purely for your own interest.

This article uses a linear regression to understand the relationships between patient characteristics and the physical properties of the eye in a group of children. This assignment question uses modified data from the article and restricts the variables to the three below:

• pid: Participant id number

• axial: The axial length of the eye (mm) – the outcome for this assigment

• age: The age of the participant

• sex: Sex of the participant (0 = males, 1 = female)

The purpose of this assignment question is to investigate the relationship between the axial length and age, and see if sex confounds or modifies this relationship. Investigate this by answering the questions below:

Note: You do not need to investigate the assumptions of regression analysis for questions 1 – 3, as this will be done in question 4.

Question 1:

Ignoring sex for now, examine the evidence for an association between axial length (outcome) and age (predictor). Interpret the key results of this analysis.

Question 2:

Describe the conditions under which we would expect sex to confound the relationship in Question

1. Have these conditions been met in this data?

Now using regression methodology, examine how taking account of sex changes the relationship between axial length and age. Do you consider sex to be acting as a confounder here?

Question 3:

Use regression to test for interaction (effect modification) between age and sex on the outcome of axial length. Interpret each of the regression coefficients in this analysis.

Question 4:

Investigate the assumptions of the regression model you believe to be most informative to report on. This should include a clear indication of whether you believe the assumptions have been met, and justifications for these conclusions with reference to appropriate diagnostic figures. This should also include some discussion of any outliers, leverage points or influential points that might affect the conclusions, and your recommendations for handling any such data points.

Question 5:

Write a conclusion suitable for non-statistician researcher that summarises your findings and analysis. This researcher would be familiar with introductory statistics concepts such as P-values and confidence intervals but would be unfamiliar with the technical details of a regression analysis.

Part B

Background: The size of genomes across organisms vary enormously. For example, the human genome is approximately 3 billion nucleotides (nt) long (a nucleotide is one of the 4 letters that make up the genetic language), the genome of the Japanese flower Paris Japonica is approximately 150 billion nucleotides long, and the genome size of a nematode is about 100 million nucleotides long. In bacteria and viruses, the genomes are much smaller, but still vary in size considerably. For example, Influenza (the flu) is only 2400nt, whereas Pandoravirus is approximately 2.4 million nucleotides long. What drives this variation in genome sizes is an active area of research. The motivation for this assignment question comes from the article which does not need to be read for this assignment, and is included for your interest only.

Article- An Allometric Relationship between the Genome Length and Virion Volume of Viruses By Jie Cui, Timothy E. Schlub and Edward C. Holmes.

A study investigating this topic sampled the genome size of 82 independent viruses. For each of these viruses, the volume of the virus (in terms of the physical space it occupies) was measured. The data has the following columns

• glength: The genome length of the virus in nucleotides (nt)

• vvolume: The volume of the virus (nm3)

(Note: These data has been modified from the original research study)

The study would like to investigate whether there is a relationship between genome length and volume of a virus.

Question 1:

Examine the evidence for a relationship between genome length and virus volume where genome length (untransformed) is the outcome and virus volume (untransformed) is the predictor variable. Investigate the validity of the assumptions of this regression analysis. Now examine the evidence for this relationship, and the assumptions of this analysis when both variables are log transformed. Which analysis (log transformed or untransformed) do you prefer and why?

Question 2:

Provide a suitable interpretation of your preferred analysis in terms that you could explain to a nonstatistician who is familiar with introductory statistics concepts such as P-values and confidence intervals but would be unfamiliar with the technical details of a regression analysis.

Question 3

You are interesting in predicting the genome length of a virus with a volume of 32000nm3. Use your final model interpreted in question 2 to give the predicted mean genome length and its 95% confidence interval on the log-scale; then provide a 95% confidence interval on the original scale.

Part C

Background: This part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of a Vitamin D supplement given to individuals who are Vitamin-D deficient. They perform a randomised trial in which they allocate (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a period of three months, after which the serum levels of a key metabolite of Vitamin D called 25(OH)D are measured in each participant.

Question 1

One possible analysis of the data described is to estimate the linear effect of dose, i.e. to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses of 1000IU, 2000IU and 3000UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term:

Yi = β0 + β1Xi +  ∈i

To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1 = n, n2 = n, n3 = 4n), then the least squares estimate of β1 is

β1 = (4Y3-3Y1-Y2)/7

Where Y is the mean of the outcome in dose group i. For simplicity we will assume that
Xi= 1 for i = 1, … , n, Xi = 2 for i = n + 1, … ,2n and Xi = 3 for i = 2n + 1, … ,6n

i) Show that the overall means for X is X = 5/2

ii) Show that ∑( Xi− X)2 = 7n/2

iii) Use these results to prove the formula above for β1

Question 2

The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1 and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained for β1 in question 1 is true in this sample.

Attachment:- Data File.rar

Reference no: EM132372428

Questions Cloud

Average worker in organization : Are U.S. Executives paid too much particularly compared to the average worker in their organization?
Why material selection is vital for building construction : Why material selection is vital for building construction?
Find three different images with the art movements : You have to find three different images with the Art Movements that I offer you in the power point.
What have you struggled with in these chapters : What have you struggled with in these chapters, and what tools or ways have you used to overcome the obstacle to learn the material
Write a conclusion suitable for non-statistician researcher : Examine the evidence for an association between axial length outcome and age predictor. Interpret the key results of this analysis.
What are your thoughts on our reliance on metals : Most of these metals are mined from third world countries and some with very limited supplies. What are your thoughts on our reliance on these metals
Report focused on the classical language of architecture : A report focused on The Classical Language of Architecture .
Do you know the reason ice floats in your drink : Do you know the reason ice floats in your drink? What have you observed within the last week that involves chemistry? What is that process/reaction
What are some examples of marketing activities : How does global marketing and the use of new digital marketing techniques facilitate marketing activities at the Olympics in Rio?

Reviews

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models

LINEAR PROGRAMMING MODELS

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd