Perform basic data analysis on a dataset

Assignment Help Python Programming
Reference no: EM132521546

ITC558 Programming Principles - Charles Sturt University

TASK

In this assignment, you will perform some basic data analysis on a dataset obtained from the Gapminder website which collects and presents authentic statistics of all countries worldwide.

Download this zip package which contains three dataset files: ‘life.csv', ‘bmi_men.csv' and ‘bmi_women.csv'. First file contains data about average life expectancy (in years) for most countries worldwide. Other two files contain data about men and women average Body Mass Index (BMI) for the same set of countries. These are plain text files with all data separated by commas. You can also open the files in a spreadsheet application to better understand their contents. All three files have a similar structure - first row contains the year headers and first column contains the country names. There is data about 186 countries for a period of 1980 to 2008.

Your program should perform the following steps.
(1) Read all the data from files and save into a 2D list and two dictionaries.
The life expectancy data should be stored in the form of two dimensional list where the outer list has 186 elements. Each inner list contains data for specific countries.
The BMI data from both files should be stored in two dictionaries which map country names to a list of data values. Both dictionaries will contain 186 keys, with each key associated with a list of 29 values (BMI data from 1980 to 2008).
Following diagram illustrates the required data structures. Note that all numbers have been converted from string to float data types.

You should use these collections for the next five steps - do not read the files again.
(2) Some users may be interested in gender neutral BMI data. For this purpose, create another Python dictionary bmi_all of the same structure and size as bmi_men (or bmi_women) and populate it with worldwide gender-average BMI values. For example bmi_all for Zimbabwe in 2008 would be 23.3.
(3) Use the bmi_all dictionary from step 2 to calculate worldwide statistics (min, max and median) for a user-selected year. See example in the sample-run below. Median value should be displayed with a precision of 3 decimal places.
(4) Compare the latest 5-year BMI data for men against women for the three most populous countries in the world (China, India, United States). First work out the 2004 to 2008 men's BMI average for these countries. Repeat the same for women's BMI. Then display the men and women BMI values and the percentage difference between the two. Display all values with 2 decimal places precision.
(5) Plot life expectancy trend of a user selected country. Your program will prompt the user for a country name (case insensitive) and then create a line chart showing life expectancy variation over the years. Sample run below shows an example.
(6) To explore the correlation between BMI and life expectancy, plot worldwide average values of the two on the same chart. For this purpose, your program will create two lists of 29 elements each to store worldwide average BMI and life expectancy data for each year.

For plotting charts in step 5 and 6, use the matplotlib library. Consult the textbook section 7-8 to learn how to draw simple charts. The chart for step 6 is rather complex because it contains two y-axis. For this part, please review and adapt the sample code below.
import matplotlib.pyplot as plt

x_data = [20, 21, 22, 23, 24]
y1_data = [1, 3, 5, 8, 10]
y2_data = [100, 150, 190, 180, 115]

fig = plt.figure()
ax1 = fig.add_subplot()
ax1.set_xlabel('X data')
ax1.plot(x_data, y1_data,'b*-')
ax1.tick_params(axis='y', labelcolor='b')
ax1.set_ylabel('Y1 data', color='b')

ax2 = ax1.twinx() # create a second axes that shares the same x-axis
ax2.plot(x_data, y2_data, 'ro-')
ax2.tick_params(axis='y', labelcolor='r')
ax2.set_ylabel('Y2 data', color='r')

plt.show()
Important Note: Other than matplotlib, you can NOT use any library module or third party module in this assessment.
Your program should be able handle following invalid inputs or error situations.
• Any of the three dataset files do not exist or can't be read.
• Non-numeric or out of range year value provided by user.
• Incorrect country name provided by user.
A sample run of the program is given below to clearly demonstrate all the requirements.
A simple data analysis program

--- Step 1 ---
All dataset has been read into memory.

--- Step 2 ---
Gender-average BMI data stored in a new dictionary.

--- Step 3 ---
Select a year to find statistics (1980 to 2008): garbage
<error> That is an invalid year.

Select a year to find statistics (1980 to 2008): 1990
In 1990, countries with minimum and maximum BMI values were 'Vietnam' and 'Tonga' respectively.
Median BMI value in 1990 was 24.450

--- Step 4 ---
Men vs women BMI in highest population countries:

*** China ***
Men: 22.82
Women: 22.86
Percent difference: 0.18%

*** India ***
Men: 20.92
Women: 21.22
Percent difference: 1.42%

*** United States ***
Men: 28.30
Women: 28.18
Percent difference: 0.42%

--- Step 5 ---
Enter the country to visualize life expectancy data: jupiter
<error> 'jupiter' is not a valid country.

Enter the country to visualize life expectancy data: sRilaNka
Plot for 'Sri Lanka' opens in a new window.

--- Step 6 ---
Correlation plot opens in a new window.

Your assignment should consist of following tasks.

Part 1
Draw a flowchart that represent the algorithms of step 2 and step 6. Include flowcharts of any functions that are called during these steps. You can draw the flowcharts with a pen/pencil on a piece of paper and scan it for submission, as long as the handwriting is clear and legible. However, it is strongly recommended to draw flowcharts using a drawing software.

Part 2
Select six sets of test data that will demonstrate the 'normal' operation of your program; that is, test data that will demonstrate what happens when a VALID input is entered. Select four sets of test data that will demonstrate the 'abnormal' operation of your program.
Set out the test cases in a tabular form as follows. It is important that the output listings (i.e., screenshots) are not edited in any way.

Test Data Table
Test data type Test data The reason it was selected The output expected due to the use of the test data The screenshot of actual output when the test data is used
Normal
Normal
Abnormal
Abnormal

Part 3
Implement your algorithm in Python. Comment on your code as necessary to explain it clearly. Run your program using the test data you have selected and complete the final column of test data table above.
Your submission will consist of:
1. Your algorithm through flowchart/s
2. The table recording your chosen test data and results (it should be a PDF file)
3. Source code for your Python implementation
Thus your directory for Assignment will at least contain two or three files (depending on whether you put the flowchart and the test table in the same file). Next, these files should be compressed into a single ZIP before uploading in TURNITIN.

It is critically important that your test runs are unmodified outputs from your program, and that these results should be reproducible by the marker running your saved .py python program.

RATIONALE

This assessment Part will work towards assessing the following learning outcome/s:
• be able to analyse the steps involved in a disciplined approach to problem-solving, algorithm development and coding.
• be able to demonstrate and explain elements of good programming style.
• be able to identify, isolate and correct errors; and evaluate the corrections in all phases of the programming process.
• be able to interpret and implement algorithms and program code.
• be able to apply sound program analysis, design, coding, debugging, testing and documentation techniques to simple programming problems.
• be able to write code in an appropriate coding language.

Attachment:- python.rar

Reference no: EM132521546

Questions Cloud

Make a direct materials purchases budget for month of may : Make a direct materials purchases budget for the month of May. Each ruler requires 0.25 pounds of resin. The cost of resin is $4.40 per pound.
Make direct labor budget for oswald company : Make direct labor budget for 2018. Oswald Company is preparing its direct labor budget for 2018 from the production budget Quarter.
Find the number of toys he needs to sell to break even are : $10 each, plus a 10% sales commission paid to the worker. Total fixed costs are $51,000. The number of toys he needs to sell to break even are
How much force is required to hold a 9.0-cm diameter : A fire hose exerts a force on the person holding it. This is because the water accelerates as it goes from the hose through the nozzle.
Perform basic data analysis on a dataset : Perform some basic data analysis on a dataset obtained from the Gapminder website which collects and presents authentic statistics of all countries worldwide
Rotational and vibrational modes of ideal classical gas : Your references would give the heat capacity for the translation, rotational and vibrational modes of ideal classical gas.
Describe how problem oriented policing : Describe how problem oriented policing differs from the professional crime fighting model? What are the eight reasons for problem oriented policing?
Potential difference between the ends of the wire : A copper wire of cross sectional area A=1.7 mm2 and Lenght L=1 m carries a current of I=6.4 A at the temperature of T1= 20 0C.
What evidence-based strategy or criminological theories : What evidence-based strategy or criminological theories would you use to address three-strikes laws and their application?

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd