Calculated the sample standard deviation

Assignment Help Basic Statistics
Reference no: EM131682224

Question 1 -

The following table lists some variables that might be of interest in your next data analysis. For each variable, complete the associated table indicating whether it is categorical (and if so, is it nominal or ordinal) or numerical (and if so, is it discrete or continuous).

Variable

Categorical

Continuous

Example

Eye Color

nominal X

ordinal

discrete

continuous

1a

Sex





1b

Number of runs scored in a baseball game





1c

Profession





1d

Temperature, measured in Farenheit





1e

Confidence in one's ability to to statistics as measured by "yes/no" to the statement: "I will do well"





1f

Number of siblings





1g

Distance an individual can run in five minutes





1h

Ethnicity





1i

Number of MD's - who also have a PhD





1j

Lack of coordination as measured by time it takes an individual to complete a certain puzzle.





Question 2 -

Here is a hypothetical situation. In 2015 a program aimed at reducing infant mortality was implemented in two regions, Pepi and Quepi. The following table (this is hypothetical, sorry) shows the numbers of births and infant deaths in two regions (Pepi and Quepi) in each of two years: 2014 and 2016.

 

Pepi

Quepi

 

Births

Infant Deaths

Births

Infant Deaths

2014

100,000

300

1,000,000

5000

2016

100,000

60

1,000,000

4000

2a. In which region is there more convincing evidence that the reduction in mortality was caused by the program?

2b. If the program can be continued in one region ONLY, which would you choose? In developing your answer, you may assume that the reductions shown were in fact caused by the program.

Question 3 -

The following are some data on some famous statisticians. Yes! Florence Nightingale, among her other talents, was a statistician!

Statistician

Gender

Year of Birth

Year of Death

Sir Francis Galton

2

1822

1911

Karl Pearson

2

1857

1936

William Sealy Gosset

2

1876

1937

Ronald Aylmer Fisher

2

1890

1962

Harald Cramer

2

1893

1985

Prasanta Mahalanobis

2

1893

1972

Jerzy Neyman

2

1894

1981

Egon S. Pearson

2

1895

1980

Gertrude Cox

1

1900

1978

Samuel S Wilks

2

1906

1964

Florence Nightingale

1

1909

1995

David John Tukey

2

1915

2000

3a. By any means you like (by hand is just fine), create a stem-and-leaf summary of the data on the variable YEAR OF BIRTH. Display it here. Then use this visual summary to answer questions #3b - #3e below.

3b. Are there any outliers (i.e., extreme values) in this distribution? Explain.

3c. How would you describe the shape of this distribution? Explain.

3d. What is/are the most frequently occurring score(s) in this distribution? How many times does it/do they occur?

3e. Can we use this stem-and-leaf to obtain the original set of values for this variable? Explain.

Question 4 -

4a. When a distribution is skewed to the right

i) TRUE or FALSE: The median is greater than the mean.

ii) TRUE or FALSE: The distribution is uni-modal

iii) TRUE or FALSE: The majority of observations are less than the mean.

4b. The shape of a frequency distribution can be described using:

i) TRUE or FALSE: A box and whisker plot.

ii) TRUE or FALSE: A table of frequencies

iii) TRUE or FALSE: A histogram

4c. For the sample 3, 1, 7, 2 and 2:

i) TRUE or FALSE: The sample mean is 3

ii) TRUE or FALSE: The sample median is 7

iii) TRUE or FALSE: The range is 1

iv) TRUE or FA.LSE: The sample variance is 5.5

Question 5 -

The following table shows the numbers of geriatric admissions, each week from May through September, to a certain facility in each of two years, 2012 and 2013.

Week

# Admissions
2012

# Admissions
2013

Week

# Admissions
2012

# Admissions
2013

1

24

20

12

11

25

2

22

17

13

6

22

3

21

21

14

10

26

4

22

17

15

13

12

5

24

22

16

19

33

6

15

23

17

13

19

7

23

20

18

17

21

8

21

16

19

10

28

9

18

24

20

16

19

10

21

21

21

24

13

11

17

20

22

15

29

5a. By any means you like (by hand is just fine), summarize these data graphically. Display it here. Then use this visual summary to answer question #5b.

5b. Why do you think these two years were different? Note - There is no single correct answer here. I will accept any well-reasoned interpretation. I'm looking for you to think about what you see!

Question 6 -

6a. You read that the median income of U.S. households in 2010 was $49,455. In 1-2 sentences at most, explain in plain language what "the median income" is.

6b. The Census Bureau website gives several choices for "average income" in its historical income data. In 2010, the median income of American households was $49,455. The mean household income was $67,530. The median income of families was $60,395, and the mean family income was $78,361. The Census Bureau says, "Households consist of all people who occupy a housing unit. The term family' refers to a group of two or more people related by birth, marriage, or adoption who reside together". In at most 5 sentences, explain carefully why mean incomes are higher than median incomes and why family incomes are higher than household incomes.

6c. A January 2012 magazine article reported that the average income for readers of the business magazine Forbes was $217,000. In your opinion, is the median wealth of these readers greater or less than $217,000? In at most 1-2 sentences, explain your reasoning.

6d. The distribution of individual incomes in the United States is strongly skewed to the right. In 2008, the mean and median incomes of the top 1% of Americans were $558,726 and $1,137,680. Which of these numbers is the mean and which is the median? In at most 1-2 sentences, explain your reasoning.

6e. By any means you like (by hand is fine) which of the following two data sets is more spread out? Show your work. In at most 1-2 sentences, explain your reasoning.

Data set "A": 4  0  1  4  3  6

Data set "B": 5  3  1  3  4  2

Question 7 -

A box plot is the graph of a five number summary. The central box spans the quartiles. The line in the box mark the median. The size of the box is a measure of spread. The lines extending out from the box give an indication of extremes, if any. Side-by-side box plots are useful for comparing two distributions. As an example, consider the following table. It lists the average month's temperature (Farenheit) of Springfield, Massachusetts and San Francisco, California.

Month

Ave Temp (F)
Springfield

Month

Ave Temp (F)
San Francisco

January

32

January

49

February

36

February

52

March

45

March

53

April

56

April

55

May

65

May

58

June

73

June

61

July

78

July

62

August

77

August

63

September

70

September

64

October

58

October

61

November

45

November

55

December

36

December

49

7a. Obtain the five number summary for the average monthly temperatures, separately for each data set, Springfield versus San Francisco. Use these values to complete the following table.


Springfield

San Francisco

Minimum



Q1



Q2 = median



Q3



Maximum



7b. By any means you like (by hand is fine), produce a side-by-side box and whisker plot of the two distributions of average monthly temperatures. You will use this visual to answer question #7c.

7c. i) Are the 2 cities similar in their typical (median) average temp?

ii) Are the 2 cities similar in terms of temperature spread? Explain

iii) Which city requires owning a larger wardrobe of clothes?

Question 8 -

This last exercise gives you practice working with the fundamentals of calculations of the sample mean, the sample variance and the sample standard deviation. It also gives you practice producing and interpreting a histogram.

On the next page is a table of data on X = blood glucose levels (mmol/L) obtained from a simple random sample of n=40 first year medical students. The students are indexed using a subscript "i" that ranges from i = 1 to i = 40.

8a. First calculate the sample mean. To do this, obtain the sum of the individual blood glucose values and divide this by the sample size.

i) i=140 xi =

ii) n =

iii) Sample mean = i=140xi/n = fill in/fill in =

8b. Next, calculate the individual squared values of individual blood glucose levels. In developing your answer complete the entries to the 3rd column of the table. All done? Now obtain the sum of the squared values of the individual blood glucose levels. Enter this total at the bottom.

8c. Next, calculate the individual squared values of the deviations of the individual blood glucose levels about the sample mean. In developing your answer complete the entries to the 4th and 5th columns of the table. All done? Now obtain the sum of the individual squared values of the deviations of the individual blood glucose values about the sample mean. Enter this total at the bottom of the 5th column.

i

xi

xi2

(xi - x-)

(xi - x-)2

1

4.7




2

4.2




3

3.9




4

3.4




5

3.6




6

4.1




7

4.8




8

4.0




9

3.8




10

4.4




11

3.3




12

3.8




13

2.2




14

5.0




15

3.3




16

4.1




17

4.7




18

3.7




19

3.6




20

3.8




21

4.1




22

3.6




23

4.6




24

4.4




25

3.6




26

2.9




27

3.4




28

4.9




29

4.0




30

3.7




31

4.5




32

4.9




33

4.4




34

4.7




35

3.3




36

4.3




37

5.1




38

3.4




39

4.0




40

6.0




Total of column





8d. Calculate the sample variance using the appropriate column totals in TWO ways. Show your work. Tip - You should get the same answer, thus illustrating a shortcut when doing calculations by hand and clarifying the confusion you might have encountered when encountering more than one formula for this calculation.

i) s2 = i=140(xi -x-)2/(n-1)

ii) s2 = [i=140xi2] - [n][x-2]/(n-1)

8e. Finally, calculated the sample standard deviation.

8f. By any means you like (by hand is fine), produce a histogram of these data.

8g. Calculate the mean ±1 standard deviation and the mean ±2 standard deviations. Indicate these points on your histogram.

8h. What term best describes the shape of the distribution of blood glucose in this sample: symmetrical, skewed to the right, or skewed to the left?

Reference no: EM131682224

Questions Cloud

Did kimball specifically intend to rob the store : James Kimball, the defendant, was charged with and convicted of attempted unarmed robbery, at a bench trial conducted in early August 1979.
What are some reasons for against more public involvement : What are some reasons for and against leaving decisions in this area to scientists. To business owners and executives
Evaluate portfolio performance and update ips : Evaluate portfolio performance and update IPS (feedback loop). Investment strategy to achieve risk-return goal of IPS with minimum risk. Portfolio construction.
Explain how these scientific laws apply to energy use : Explain how these scientific laws apply to energy use, energy conversions, and the need for energy efficiency
Calculated the sample standard deviation : Calculate the sample variance using the appropriate column totals in TWO ways. Finally, calculated the sample standard deviation
Harden networks and network operating systems : Describes vulnerabilities related to the failure to harden networks and network operating systems, including the use of supporting examples
Was the owner liable for the clerk renting pornos to a minor : Peter Tomaino, the owner of an adult video store, was convicted in the Court of Common Pleas, Butler County, of disseminating matter harmful to juveniles.
Information about ethernet and nonethernet networking : Conduct research using the library and the Internet to find information about ethernet and nonethernet networking. Then, answer the following:
Determining the computer and internet crime : Business managers, IT professional, and IT users face a number of ethical decisions regarding IT security, such as determining which information systems.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Determine which office has the highest mean tax due

Referring to part c, if you did conclude that a difference exists, use the appropriate test to determine which office has the highest mean tax due.

  Determine a p value when testing the hypothesis

For the following 10 bootstrap sample means, determine a p-value when testing the hypothesis H0: µ = 8.5.- What would be an appropriate 0.8 confidence interval for the population mean?

  Find whether variables are normally distributed or not

Describe the situation and the variables and determine whether the variables are normally distributed or not. How could you change these to a normally distributed dataset?

  Identify the null and alternative hypotheses test statistic

listed are the ages in years of randomly selected race car drivers use a 0.05 significance level to test the claim that

  Display a box plot of the 20 sample means

What is the median of the 20 sample means?-  What is the range of the 20 sample means?- Display a box-plot of the 20 sample means.

  Find the class frequencies that are missing

From the following frequency d istribution, find the class frequencies that are missing. Calculate mean, median and mode from the following data.

  Chi square test of independence of variables

Dogs were trained with urine samples from bladder cancer patients and people from a control group who did not have bladder cancer.

  Determine the least-squares regression equation

a. Determine the least-squares regression equation for y 5 total group revenue as a function of x1 5 num- ber of retail vehicles sold and x2 5 number of dealers in the group. Interpret each of the partial regression coefficients. b. At the 0.02 level..

  What would be probability that batteries randomly sampled

If the manufacturer of the battery is able to reduce the standard deviation of battery life from 10 to 9 hours, what would be the probability that 16 batteries.

  In a manufacturing process a random sample of 49 bolts

in a manufacturing process a random sample of 49 bolts manufactured has a mean length of 1.67 inches with a standard

  Eating habits scores and probability

A sports researcher gave a standard written test of eating habits to 12 randomly selected professionals, four each from baseball, football, and basketball. The results were as follows:

  Specification limits for the part

The width W (in inches) of duralumin forged parts is normally distributed with mean: mu=0.9000 and standard deviation sigma= 0.0040. The specification limits for the part is given as 0.9000 +/- 0.0060

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd