Calculate the variance of a series of values

Assignment Help Python Programming
Reference no: EM133091581

Describing and Visualising Statistical Data

Exercises

Question 1: In a new, empty .pyfIle, write a small program that calculates and prints out the mean, median, and modeof the following set of values:

1978, 1936, 1941, 1999, 2000, 2001, 2020, 2049, 2000, 1801, 1664

When calculating the modeit's easiest to use ‘from collections import Counter' to get access to a Counter object which will do the instance counting for you (see slides 10-12).

Question 2: Add the value 2001 to the above list of values. Now, both 2000 and 2001 occur twice - so the data has multiple modes. Modify your program to cater for this, i.e. if you ask it to calculate the mode of the list of values it will return a list containing both 2000 and 2001. See slide 13 if you need help.

Question 3: Modify your code to print out the highest and lowest values in the list, and from these values calculate and print the range of the values (i.e. the difference between the highest and lowest values).

Question 4: The following functions can be used to calculate the variance of a series of values, and from the variance you can calculate the standard deviation as the square root of the variance (in Python you can do this by raising a value to the power of 0.5, for example:value = 9, sqr_root = value ** 0.5)
defcalculate_mean(numbers):
s = sum(numbers)
N =len(numbers)
mean = s / N
return mean

deffind_differences(numbers):
mean =calculate_mean(numbers)

differences =[]
for num in numbers:
differences.append(num - mean)

return differences

defcalculate_variance(numbers):
differences =find_differences(numbers)

squared_diff=[]
for din differences:
squared_diff.append(d **2)

sum_squared_diff= sum(squared_diff)
variance =sum_squared_diff/len(numbers)

return variance
There is a file on your Moodle shell under this weeks' materials called: pokemon_num_name_height_metres_weight_kgs.csv
This file contains all the numbers, names, heights and weights of over 800 Pokemon (which I found here: https://pokemondb.net/pokedex/stats/height-weight). Take a look at this file in a text editor or excel to see the kind of data we're working with.
To open the file and split up each line into a list of four strings we can use code like this:
with open('pokemon_num_name_height_metres_weight_kg.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
# 0 is number, 1 is name, 2 is height (m), 3 is weight (kg)
print(row[0], row[1], row[2], row[3])

Now, using the above calculate_variancefunction, load the file and calculate the variance and standard deviation of the Pokemon heights and weights - and print them to the screen.
REMEMBER: Each row value (row[0], row[1] etc.) will be a string - so if we want to do any math with any of the numerical fields (which we do) then we'll need to cast them to be a float!
If you've done this right, you should see output like this:
Height variance: 1.2243628057924427
Height standard deviation: 1.1065092886155283
-----
Weight variance: 15580.033463417874
Weight standard deviation: 124.82000425980554

Question 5: As we have the heights and weights for our Pokemon - let's create a quick scatterplot of the data:

from pylab import plot, show, title, xlabel, ylabel

# Have access to thePokemon heights and weights here!

myPlot = plot(weights, heights, 'x')
title('Pokemon Height Vs. Weight')
xlabel('Weight in Kilograms')
ylabel('Height in Metres')
show(myPlot)
If everything's going as planned you should see a plot like this:

Question 6: Our final task for the day will be to determine if there is a statistically significant correlation between the height and the weight of a Pokemon - that is, are bigger Pokemon usually heavier? From looking at the plot, what do you think? Is there any correlation? Or maybe a weak positive, or weak negative correlation?
Here's some code we can use to determine the correlation coefficient of two sets of values:
deffind_correlation(x, y):
# Find the length of the lists
n =len(x)

# Find the sum of the products
products =[]
for xi,yiinzip(x, y):
products.append(xi *yi)
sum_products= sum(products)

# Find the sum of each list
sum_x= sum(x)
sum_y= sum(y)

# Find the squared sum of each list
squared_sum_x=sum_x**2
squared_sum_y=sum_y**2

# Find the sum of the squared lists
x_square=[]
for xi in x:
x_square.append(xi **2)
x_square_sum= sum(x_square)

y_square=[]
foryiin y:
y_square.append(yi**2)
y_square_sum= sum(y_square)

# Use formula to calculate correlation
numerator = n *sum_products-sum_x*sum_y
denominator1 = n *x_square_sum-squared_sum_x
denominator2 = n *y_square_sum-squared_sum_y
denominator =(denominator1 * denominator2)**0.5
correlation = numerator / denominator
return correlation
Use the above function to calculate and print the correlation between the height and weight of Pokemon - if you've done it correctly, you should get output similar to the following:
Correlation between height and weight is: 0.6424145098518806
Looking at the below, this means that there IS a positive correlation - that is, the height of a Pokemon is an indicator that can allow us to estimate its weight... but the correlation is weak, so any estimate that we come up with may have a large margin of error to the actual weight of the Pokemon!

Attachment:- Visualising Statistical Data.rar

Reference no: EM133091581

Questions Cloud

Traditional marketing communication strategies : Describe traditional marketing communication strategies for tangible goods and business communication strategies for services
How much is the total income tax expense for the year : MYRRH Company reported P9,000,000 income before provision for income tax. How much is the total income tax expense for the year
Prepare journal entries for each of the transactions : Preferred stock, $100 par value; authorized, 300,000 shares; issued, 32,500 shares $3,250,000. Prepare journal entries for each of the above transactions
Basic democratic values that underlie our society : What are the basic democratic values that underlie our society? How have they changed in recent years?
Calculate the variance of a series of values : Write a small program that calculates and prints out the mean, median, and modeof the set of values - correlation between the height and the weight of a Pokemon
Compute the net present value for each machine : The cost of each machine is $14,000 and neither is expected to have salvage value at the end of a 4-year useful life. Compute net present value for each machine
Calculate the cost of goods sold : Smith Corp uses its periodic inventory system and the following information is available: Inventory - Beginning 400. Calculate the cost of goods sold
Prepare journal entries to record the preceding transactions : Mayfair Co. completed the following transactions and uses a perpetual inventory system. Prepare journal entries to record the preceding transactions
Developing and managing brands : Identify the factors that need to be considered when developing and managing brands.

Reviews

Write a Review

Python Programming Questions & Answers

  Design a program for the hollywood movie rating guide

Design a program for the Hollywood Movie Rating Guide, in which users continuously enter a value from 0 to 4 that indicates the number of stars.

  Write a program that prompts the user to enter an integer

Write a program that prompts the user to enter an integer for today's day of the week Also prompt the user to enter the number.

  Calculate and display the average of the 5 scores

Each gymnast in a competition receives scores from 5 judges. Write a Python program to do the following. calculate and display the average of the 5 scores.

  MATH2319 Machine Learning Assignment

MATH2319 Machine Learning Assignment Help and Solution - RMIT University, Australia - Assessment Writing Service

  Find the number of vowels in the string

Using list and one definition for find the Number of vowels in the String.

  How many times each word has occurred in the text file

The values in this dictionary should be a count of how many times each word has occurred in the text file. Second, add the word into a data structure that is keeping track of word information for the completion process.

  Create a python script that takes two parameters

Create a Python script that takes two parameters. List all files names, size, date created in the given folder. Use catch-exception block.

  The function should return the day name (''su'',''mo''..etc)

Write the function day(d,m) where m is an integer from 1 through 12 expressing a month, and d is an integer from 1 through 31 expressing the day part of a date in 2014.

  Compute second order differential equations for capacitor

Compute the 2nd order differential equations for capacitor voltage and inductor current in a series RLC circuit. Provide this derivation in your report.

  Calculate the values of the integrated schema

Calculate the values of the integrated schema. For example, to calculate the suburb, you can only use the shape files provided in the Google d rive

  Calculate the mean score from data on file

Read in the scores file. Calculate the mean score from data on file. Determine the grade for each score and count grade distribution.

  Design a prgram using python

Design a prgram USING PYTHON that students can use to calculate what score they need on final exam to get a certan final grade for a course.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd