Calculate the sample mean and the sample median

Assignment Help Programming Languages

Reference no: EM132665397

Nature of Data / Statistics for Data Science

Fill into this notebook your answers to the assignment.

Make sure to have each markdown text and R code segment in cells after each part of the question, together with executed results, so that we are able to independently verify the results. Don't leave the code out.

Question 1
Greenhouse.csv contains the photosynthetic performance of ten plants in two environments in a greenhouse (shady and sunny).

a) Plot the data in an appropriate graph or graphs to get a good visualisation of the perfor- mance.

b) Calculate the mean performance in the sun and in the shade

The hypothesis is that there is different average performance in the sunny environment.

c) Write down then the null and alternate hypotheses. Then use an appropriate statistic to calculate, at (p<0.04) significance level whether the data is consistent with the hypothesis. Can we accept the alternate hypothesis?

A new hypothesis is proposed that the performance in the sun is better.

d) Reformulate the null and alternate hypotheses, and verify again as in c) Somebody looked at the above data analysis and said it was a inefficient way to do it (they said it was "stupid"), as important information was neglected. This person was right.

e) What is this missing information? Do the analysis now, incorporating this information, with an appropriate statistic, calculate a p-value based on this statistic.

Question 2

The data for "height" is a sample from a population in country A in countryheight.csv. We want to estimate the population mean, and try to say something in general about the height distribution.

a) Calculate the sample mean and the sample median of the height variable. What does the relationship between the values of the sample mean and sample median suggest?

b) Calculate a 95% confidence interval on the population mean using bootstrapping

c) Calculate a 95% confidence interval on the population mean using the normal approxima- tion
The data scientist Jane believes the population might be consistent with a normal distribution.

d) Create an appropriate plot to test Jane's hypothesis.

e) Does the data agree with her hypothesis? Why/why not?
Jane got more height data -- this time a sample from country B. The measurements are in the variable "height2".
From previous height studies, it is believed that people in country B are, on average, taller than those in country A.

f) Formulate the null hypothesis and alternative hypothesis for this belief.

g) Use a test statistic to determine if the null hypothesis can be rejected, and calculate the p- value.

h) Can we conclude that the (population) mean height is statistically significantly different in country B to that of country A ? Justify your answer.

i) Suggest one improvement to this test to improve the quality of the possible conclusions, explaining why it would help.

Question 3

Consider the following data set of drivers who died in collision with a train, and the amount of crude oil exported from Norway to USA, for years 1999 to 2009.

(a) Plot the most appropriate graph to determine if the data is correlated

(b) Run the best test to determine linear correlation together with calculated 95% confidence intervals

(d) Perform a least-squares fit, plotting the original data points plus the appropriate line on the same graph

(e) Do you think looking at the line, that this fit is a good explanation? Please give reasons for your choice. Then, using a test given already in the course, plot a graph to demonstrate if this is indeed a good fit.

(f) Can we conclude that the number of drivers who died as a result of a train collion affects the amount of oil exported into the USA from Norway ? Explain your answer.

Attachment:- Statistics for Data Science.rar

Reference no: EM132665397

Questions Cloud

What ways would a reduction in inventory help the company : In what ways would a reduction in inventory help the company? In what ways would the change from LIFO to FIFO help the executive personally?

Examine an emerging issue in health care : The purpose of this paper is to examine an emerging/current issue in health care. Potential topics are but not limited to the following.

How additional information obtained might change : How do determine whether management's assertion is supportable and how additional information obtained might change your conclusion.

Nuclear weapons have been symbols of power : Nuclear weapons have been symbols of power and destructions that can rip and destroy the human composition,

Calculate the sample mean and the sample median : Write down then the null and alternate hypotheses and Calculate the sample mean and the sample median of the height variable.

What measures would like to see herb-o-lario undertake : Calculate the company's financial ratios. What measures would you like to see Herb-O-Lario undertake (if any) to improve its financial performance?

Critical infrastructure sectors : Member Associations within the Critical Infrastructure (CI) Sectors.

Positivism and naturalism : Aside from positivism, another paradigm for research is known as "naturalism."

National security responsibilities fall across : National security responsibilities fall across many entities in the federal government.

User Account

All Pages