What is the mean number of differences per base-pair

Assignment Help Applied Statistics
Reference no: EM132281135

Experimental Design and Statistics Assignment -

Section 1: Which test?

The Human Microbiome Project analyzed the diversity of the microbial communities that live in and on the human body by taking samples from healthy individuals, sequencing the DNA of the microbes that were present in different regions of the body (see image in attached file). This allowed the identification and of the different taxa of bacteria present in each region, as well as quantifying the relative number of each taxon.

There are many statistical questions that can be addressed with these data. For each research question below, state the null and alternate hypotheses and the test you would use, including the variables to be tested. Be as specific as possible, including whether the test should be one- or two- tailed where appropriate.

A) Is there a difference in the number of bacterial taxa present in the saliva of men and women? Assume that the number of taxa present follows a normal distribution, with the same standard deviation in men and women.

H0:

HA:

Test:

B) Do people tend to harbor more bacterial taxa on the skin behind their ears or in their elbows? Assume the measurements for the left and right sides for each area are combined for each person.

H0:

HA:

Test:

C) An earlier paper proposed that individuals could be classified as belonging to one of three "enterotypes" based on the types of bacteria present in their gut. Is there is a difference in the frequencies of the three enterotypes among meat-eaters and vegetarians?

H0:

HA:

Test:

D) The enterotype hypothesis come partly from the observation that the distribution of frequencies of some bacterial groups across individuals are bimodal; for one of these taxa, Prevotella, people have either fairly high frequencies of Prevotella, or nearly undetectable levels. Few people have moderate frequencies. You want to test whether the frequency of Prevotella in the gut is affected by dietary fat levels, so you talk to a friend who has been doing an unrelated study where subjects were randomly assigned to either a high or low fat diet. You do not know what each individual's Prevotella level was before the study began, but you can measure the current level.

H0:

HA:

Test:

Section 2: Snakes and Snails

A number of snake species in south-east Asia have evolved to prey extensively or exclusively on land snails, and have evolved special morphological features to facilitate extracting snails from their shells, including jaws with many teeth to grip the slippery, slimy beasts. Most snails' shells that coil to the right, so a snake with a similar asymmetry in its own morphology might have an advantage in predation.

Researchers measured the asymmetry in snake jaws across a number of snake species by counting the number of teeth on the right and left side of the jaw (R and L, respectively) and calculating an asymmetry index: 100 × (R-L )/(R+L) . This index was normally distributed within species.

A) Why did they calculated the asymmetry index rather than just using R-L?

B) One of the snake species, Pareas iwasakii, had a mean asymmetry index in a sample of 28 snakes of 17.5, with a standard deviation of 8.5. Perform an appropriate test to determine if P. iwasakii shows significant asymmetry in tooth number. Be sure to clearly state your conclusions.

C) To test whether an asymmetrical jaw was helpful in predation against coiled snails, the researchers tested snake predation success on a number of different snails with either left- or right-handed shells. The snakes were scored based on the frequency with which they successfully extracted and ate the snails. Each snake was tested only on one type of shell. Using the data below, test whether the snakes are better at extracting snails with coils of one direction or the other. You may assume the extraction frequencies follow a normal distribution in each group.

 

left-handed shells

 

right-handed shells

Success Rate (%)

80

68

57

79

82

92

91

75











D) To improve the experiment above, a scientist decides to (1) test more snakes, (2) have each snake attempt to open both left and right handed shells. To simplify planning, she tests each snake (3) first on left-handed shells, then on right handed shells. Describe the effects on sampling error and/or bias for each of the three modifications.

Section 3: False and False

All of the following statements are false. Please correct the statement or explain the error.

A) According to the Central Limit Theorem, the larger the sample size, closer a sample's distribution will be to the normal distribution.

B) An experiment with a larger sample size will always be more accurate, with less bias, than one with smaller sample.

C) A scientist observed grizzly bears fishing for salmon in a stream. After the bear has left, she collects the fish carcasses and measures the jawbones of the fish to estimate their sizes. In a sample of 10 fish, she finds a mean jawbone length of 6.8 cm, with a standard deviation of 1.2 cm. Assuming jaw lengths in the population are normally distributed, her 95% confidence interval for the mean is 6.05 - 7.54 cm.

D) 6.8 cm is an unbiased estimate of the mean jaw length of the salmon in the stream.

E) In a case-control study of rates of smoking and lung cancer in Beijing, 126 of 226 smokers were found to have lung cancer, as compared to 35 lung cancer cases in a sample of 96 non-smokers. This means the odds ratio for lung cancer associated with smoking (in Beijing) is 1.53.

F) The odds ratio for lung cancer associated with smoking is much lower in China than in the United States. This suggests that lung cancer rates in China are lower than in the United States.

Section 4: Oh, reporters!

Article - Putting a Value to 'Real' in Medical Research By NICHOLAS BAKALAR.

The first paragraph and the last one are mostly okay, though I might have some quibbles. The real trouble is in that middle paragraph.

A) Rewrite the first sentence of the second paragraph to make it accurate.

B) The last sentence of the second paragraph implies that a p-value of 0.06 indicates that a study's results "were probably due only to chance." Why is this incorrect?

C) If we actually wanted to quantify the probability that the results of an experiment were due to chance, what probability would we need to know in addition to the p-value?

Section 5: Elephant Evolution

Recently, it was discovered that African elephants, previously classified as one species, are actually two distinct species: the African forest elephant and the African savannah elephant. You want to get a sense of the rate at which differences have accumulated in DNA between the forest and savannah elephants, so you sequence 1000 base pair segments of DNA from each of 100 genetic regions in the two species, and count the number of differences between the species. The results appear below.

Differences

Number of regions

Expected number of regions

0

34

24.91

1

35

34.62

2

17

24.06

3

6

11.15

4

3

3.87

5

2

1.08

8

1

0.00861

9

1

0.00133

13

1

0.000000289

A) What is the mean number of differences per base-pair between the two species?

B) I have pre-calculated the expected values for the number of regions with a given number of differences. What distribution did I use? What is the null hypothesis associated with this set of expected values?

C) Perform the appropriate test of the null hypothesis and report your results.

D) By sequencing the Asian elephant and wooly mammoth, it is sometimes possible to identify whether a mutation that separates the two African species occurred in the ancestors of the forest elephants or savannah elephants. Looking at those mutations, we can then classify them by whether the mutation occurred in an amino acid coding region or between genes (intergenic). If 8 of 56 mutations that occurred in the forest elephant are from coding regions and 9 of 48 mutations in the savannah elephant are from coding regions, do the two species differ in the proportion of their mutations that occur in coding regions? Perform the appropriate test and report your results.

Attachment:- Assignment Files.rar

Reference no: EM132281135

Questions Cloud

How might you assess system requirements : How do these attributes impact the quality of requirements? How might you assess system requirements based off these attributes?
?control over price and an oligopoly leveraging : ?Control over price and an oligopoly leveraging it can be a problem, just look at what happened to the price for insulin.
Define arbitrage-economic data and analyses : Define arbitrage. Economic data and analyses have failed to explain international asset arbitrage behaviour or condition with economic variables
Make the entity-relationship diagram : "Zip Guys, Inc. runs a large network of auto part stores. Each part they offer for sale is identified by an SKU assigned by Zip Guys. For each part
What is the mean number of differences per base-pair : BIOL B215 Experimental Design and Statistics Assignment - What is the mean number of differences per base-pair between the two species
Flowchart and pseudocode for a program : Flowchart and pseudocode for a program that takes a user input consisting of an integer number between 0 and 99 and outputs its conversion as binary.
Child relationships between the processes : Identify that parent/child relationships between the processes.
Reflect on the struggles of these courageous people : Reflect on the struggles of these courageous people. They fought and sacrificed for the opportunity to get into school.
How could threads t1 and t3 communicate : How do threads T3 and T4 communicate? How could threads T1 and T3 communicate? Explain your answer.

Reviews

len2281135

4/12/2019 1:34:26 AM

Instructions: Circle your final answers, and be sure they are on the front of the page (near the question they are the answer to), so it is clear which part they correspond to. For statistical tests, always be sure to show test statistics, degrees of freedom (when appropriate), and p values. Include units where appropriate. There are 100 points on the exam. Just as before, if you guess within 10 points of your actual score, you will receive a 5 point bonus.

len2281135

4/12/2019 1:34:19 AM

Show your work! I can’t give partial credit if the only thing I see is an incorrect final answer. You may use all available space, including the back of the page, but make sure I am able to follow your work and where it was done, and, to repeat, put your final answer on the front. You will need to use a calculator, but you should still show all the steps of your calculations, in case you mistype something along the way (and so I can tell if your answer differs because of rounding errors). Good luck!

Write a Review

Applied Statistics Questions & Answers

  What is the impact of connection on the presence of chlorine

Omitted Variables Bias - Estimate the regression in equation (3). What is the impact of "connection" on the presence of chlorine

  What is the best predicted value for diastolic pressure

What is the best predicted value for diastolic pressure given that a woman has a systolic level of 100 and use the Excel Analysis ToolPak to find the linear correlation coefficient for the systolic and diastolic measurements.

  What would be the 80th percentile for height

What would be the 80th percentile for height?

  Calculate a ninety percent confidence interval for m

Calculate a 95 percent confidence interval for m. Can National Motors be 95 percent confident that m is less than 60 ft? Explain.

  Identify the null hypothesis and the alternative hypothesis

A 0.05 significance level is used for a hypothesis test of the claim that when parents use a particular method of gender selection, the proportion of baby girls is different from 0.5. Assume that sample data consists of 66 girls in 144 births, so the..

  Explain the assumptions of repeated measure anova

Explain the assumptions of Repeated Measure (RM) ANOVA and test the assumptions of RM-ANOVA (test for Sphericity) and report the results of these tests to determine if the data met the assumptions in correct APA format

  Define nonoverlapping classes for a frequency distribution

Define the nonoverlapping classes for a frequency distribution. Tally the number of ratings in each class and develop a frequency distribution.

  Create one observation for each year

Initialize each of the variables below to their current values, and use a DO LOOP to calculate their estimated values for the next ten years. For example, next year's wage expense will be this year's wage expense plus 6 percent of this year's amount;..

  Develop innovative analytics visualization solutions

ITECH7407- Real Time Analytic Individual Assignment. The topic will be on environmental issues. Your main task is to apply any of the analytical tools to develop innovative analytics visualization solutions and predictive models with regards to en..

  Create a pearsons -r-correlation coefficient

Create a Pearson's (r) correlation coefficient using the computational formula from Chapter 14 (top of page 456). Correlate GPA with the number of hours spent watching TV, leaving out the data from subject 30. Show your work for all steps: how you..

  Find the prediction interval

what portion of variation in stock price percentage change is explained by the percent change in profit and what is the approximate predicted value for tips if the total bill is $100?

  Find an article in the newspaper illustrating frequencies

1. Find an article in the newspaper illustrating frequencies.  Include a short summary of the article. Indicate the name of the newspaper, date, and name of the article. 2. Find an article in the newspaper illustrating mean, median, and mode.  Includ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd