Calculate the variance of a series of values

Assignment Help Python Programming

Reference no: EM133091581

Describing and Visualising Statistical Data

Exercises

Question 1: In a new, empty .pyfIle, write a small program that calculates and prints out the mean, median, and modeof the following set of values:

1978, 1936, 1941, 1999, 2000, 2001, 2020, 2049, 2000, 1801, 1664

When calculating the modeit's easiest to use ‘from collections import Counter' to get access to a Counter object which will do the instance counting for you (see slides 10-12).

Question 2: Add the value 2001 to the above list of values. Now, both 2000 and 2001 occur twice - so the data has multiple modes. Modify your program to cater for this, i.e. if you ask it to calculate the mode of the list of values it will return a list containing both 2000 and 2001. See slide 13 if you need help.

Question 3: Modify your code to print out the highest and lowest values in the list, and from these values calculate and print the range of the values (i.e. the difference between the highest and lowest values).

Question 4: The following functions can be used to calculate the variance of a series of values, and from the variance you can calculate the standard deviation as the square root of the variance (in Python you can do this by raising a value to the power of 0.5, for example:value = 9, sqr_root = value ** 0.5)
defcalculate_mean(numbers):
s = sum(numbers)
N =len(numbers)
mean = s / N
return mean

deffind_differences(numbers):
mean =calculate_mean(numbers)

differences =[]
for num in numbers:
differences.append(num - mean)

return differences

defcalculate_variance(numbers):
differences =find_differences(numbers)

squared_diff=[]
for din differences:
squared_diff.append(d **2)

sum_squared_diff= sum(squared_diff)
variance =sum_squared_diff/len(numbers)

return variance
There is a file on your Moodle shell under this weeks' materials called: pokemon_num_name_height_metres_weight_kgs.csv
This file contains all the numbers, names, heights and weights of over 800 Pokemon (which I found here: https://pokemondb.net/pokedex/stats/height-weight). Take a look at this file in a text editor or excel to see the kind of data we're working with.
To open the file and split up each line into a list of four strings we can use code like this:
with open('pokemon_num_name_height_metres_weight_kg.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
# 0 is number, 1 is name, 2 is height (m), 3 is weight (kg)
print(row[0], row[1], row[2], row[3])

Now, using the above calculate_variancefunction, load the file and calculate the variance and standard deviation of the Pokemon heights and weights - and print them to the screen.
REMEMBER: Each row value (row[0], row[1] etc.) will be a string - so if we want to do any math with any of the numerical fields (which we do) then we'll need to cast them to be a float!
If you've done this right, you should see output like this:
Height variance: 1.2243628057924427
Height standard deviation: 1.1065092886155283
-----
Weight variance: 15580.033463417874
Weight standard deviation: 124.82000425980554

Question 5: As we have the heights and weights for our Pokemon - let's create a quick scatterplot of the data:

from pylab import plot, show, title, xlabel, ylabel

# Have access to thePokemon heights and weights here!

myPlot = plot(weights, heights, 'x')
title('Pokemon Height Vs. Weight')
xlabel('Weight in Kilograms')
ylabel('Height in Metres')
show(myPlot)
If everything's going as planned you should see a plot like this:

Question 6: Our final task for the day will be to determine if there is a statistically significant correlation between the height and the weight of a Pokemon - that is, are bigger Pokemon usually heavier? From looking at the plot, what do you think? Is there any correlation? Or maybe a weak positive, or weak negative correlation?
Here's some code we can use to determine the correlation coefficient of two sets of values:
deffind_correlation(x, y):
# Find the length of the lists
n =len(x)

# Find the sum of the products
products =[]
for xi,yiinzip(x, y):
products.append(xi *yi)
sum_products= sum(products)

# Find the sum of each list
sum_x= sum(x)
sum_y= sum(y)

# Find the squared sum of each list
squared_sum_x=sum_x**2
squared_sum_y=sum_y**2

# Find the sum of the squared lists
x_square=[]
for xi in x:
x_square.append(xi **2)
x_square_sum= sum(x_square)

y_square=[]
foryiin y:
y_square.append(yi**2)
y_square_sum= sum(y_square)

# Use formula to calculate correlation
numerator = n *sum_products-sum_x*sum_y
denominator1 = n *x_square_sum-squared_sum_x
denominator2 = n *y_square_sum-squared_sum_y
denominator =(denominator1 * denominator2)**0.5
correlation = numerator / denominator
return correlation
Use the above function to calculate and print the correlation between the height and weight of Pokemon - if you've done it correctly, you should get output similar to the following:
Correlation between height and weight is: 0.6424145098518806
Looking at the below, this means that there IS a positive correlation - that is, the height of a Pokemon is an indicator that can allow us to estimate its weight... but the correlation is weak, so any estimate that we come up with may have a large margin of error to the actual weight of the Pokemon!

Attachment:- Visualising Statistical Data.rar

Reference no: EM133091581

Questions Cloud

Traditional marketing communication strategies : Describe traditional marketing communication strategies for tangible goods and business communication strategies for services

How much is the total income tax expense for the year : MYRRH Company reported P9,000,000 income before provision for income tax. How much is the total income tax expense for the year

Prepare journal entries for each of the transactions : Preferred stock, $100 par value; authorized, 300,000 shares; issued, 32,500 shares $3,250,000. Prepare journal entries for each of the above transactions

Basic democratic values that underlie our society : What are the basic democratic values that underlie our society? How have they changed in recent years?

Calculate the variance of a series of values : Write a small program that calculates and prints out the mean, median, and modeof the set of values - correlation between the height and the weight of a Pokemon

Compute the net present value for each machine : The cost of each machine is $14,000 and neither is expected to have salvage value at the end of a 4-year useful life. Compute net present value for each machine

Calculate the cost of goods sold : Smith Corp uses its periodic inventory system and the following information is available: Inventory - Beginning 400. Calculate the cost of goods sold

Prepare journal entries to record the preceding transactions : Mayfair Co. completed the following transactions and uses a perpetual inventory system. Prepare journal entries to record the preceding transactions

Developing and managing brands : Identify the factors that need to be considered when developing and managing brands.

User Account

All Pages