Perform data analysis on countries in the region

Assignment Help Data Structure & Algorithms
Reference no: EM131479924 , Length: word count:1500

Introduction to Data Science

Task

Background

A research team planned to study the heath development of the world in the past 15 years. The team retrieved the dataset from World Bank (https://databank.worldbank.org) about Health and Population Statistics between 2001 and 2015.

The dataset contains the following attributes:
- Birth rate, crude (per 1,000 people)
- Fertility rate, total (births per woman)
- Adolescent fertility rate (births per 1,000 women ages 15-19)
- Death rate, crude (per 1,000 people)
- Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total)
- Cause of death, by injury (% of total)
- Cause of death, by non-communicable diseases (% of total)
- Mortality caused by road traffic injury (per 100,000 people)
- Health expenditure per capita (current US$)
- GNI per capita, Atlas method (current US$)
- Health expenditure, private (% of GDP)
- Health expenditure, public (% of GDP)
- Health expenditure, total (% of GDP)
- Maternal mortality ratio (national estimate, per 100,000 live births)
- Immunization, BCG (% of one-year-old children)
- Life expectancy at birth, male (years)
- Life expectancy at birth, female (years)
- Life expectancy at birth, total (years)
- School enrollment, primary (% gross)
- School enrollment, secondary (% gross)
- School enrollment, tertiary (% gross)
- School enrollment, tertiary, female (% gross)
- Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)
- Unemployment, female (% of female labor force) (modeled ILO estimate)
- Unemployment, male (% of male labor force) (modeled ILO estimate)
- Unemployment, total (% of total labor force) (modeled ILO estimate)

More details about the data attributes and data content can be found in the attached documents.

Assignment Task

You are a member of the team, and need to perform data analysis on countries in the region of East Asia & Pacific.

The team has not set any specific goal for the analysis. Therefore, you have the freedom to explore the data, and dig out anything you feel interesting or significant.

You have been requested to prepare a data analysis report about your work and explain your findings. The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.

To prepare the report, please follow the following outline:

1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from.

2. Data Setup
Describe how to load the data, and the libraries needed. Provide an overview of the data about its dimensions and structures.

3. Exploratory Data Analysis
Perform 3 one-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate.

Perform 2 two-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate

The analysis can be performed on all years and all countries, or on a subset of your interest.

4. Advanced Analysis
Clustering
Briefly explain the concept of clustering and k-means.
Try to do a clustering analysis to group countries according to some selected attributes.

Linear Regression
Briefly explain the concept of linear regression.
Try to do 2 linear regression analysis. Plot the learned models.

The analysis can be performed on all years and all countries, or on a subset of your interest.

5. Conclusion

6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.

For the data analysis, you need to provide both R code, and the explanation to the code and the result. For the section 2 - 4, please represent each R code snippet in a box with some comments. For example:

Report Format

Your report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. All comments and graph titles are counted.

The report MUST be formatted using the following guidelines:
- Paragraph text - 12 point Calibri single line spacing
- Headings - Arial in an appropriate type size
- Margins - 2.5cm on all margins
- Header - Report title
- Footer - page number (including the word "Page")
- Page numbering - roman numerals (i, ii, iii, iv) up to and including the Table of Contents, restart numbering using conventional numerals (1, 2, 3, 4) from the first page after the Table of Contents.
- Title Page - Must not contain headers or footers. Include your name as the report's author but DO NOT include any reference to your student ID, course code or course name.
- The report is to be created as a single Microsoft Word document. No other format is acceptable and doing so will result in the deduction of marks.

Verified Expert

Reference no: EM131479924

Questions Cloud

Is it about cost-cutting and managerial prerogative : Is it about cost-cutting and managerial prerogative? Does the current IR system encourage good HRM and high-performance work systems
What is brazil mpk relative to the united states : Assume that Brazil and the United States have different production functions q = f(k), where q is output per worker and k is capital per worker. Let q = Ak1/3.
Analyze the methods for establishing key risk indicators : Analyze the methods for establishing key risk indicators (KRIs). Summarize the COSO Risk Management Framework and COSO's ERM process.
Draw a production function diagram and mpk diagram : Use production function and MPK diagrams to examine Turkey and the EU. Assume that Turkey and the EU have different production functions q = f(k).
Perform data analysis on countries in the region : The team has not set any specific goal for the analysis. Therefore, you have the freedom to explore the data, and dig out anything you feel interesting or significant.
Calculate the total taxes owed by someone : Calculate the total taxes owed by someone with $50,000 of income who gives $5,000 to charity and buys a new home.
Explain the gaps in living standards across countries : This question continues from the previous problem, focusing on how risk premiums explain the gaps in living standards across countries.
How many books are sold each month by the publisher : How many books are sold each month by the publisher? This is important because quantity discounts are available from the publisher.
How does that employer maintain compliance with that statute : Apply that law to a business situation created by that employer. How does (or did) that employer maintain compliance with that statute?

Reviews

inf1479924

5/19/2017 4:53:43 AM

Thanks for this paper as well, it good. You all are paradise sent, I will prescribe you my cousin needs to do her theory next fall, so I educated her don't stress concerning it, and guaranteed her she will have a flawlessly composed case proposal.

inf1479924

5/19/2017 4:52:27 AM

These are the two data sources for the project 23124186_1Health and Population Statistics Data.csv 23124154_2Health and Population Statistics Definition and Source.csv Regarding the experts previous question the data in "Health and Population Statistics_Data.csv" is the file with the data set. I attached to my post on the I have attached this file again to this post I have also atatched the addional file "Health and Population Statistics_Definition and Source.csv" which provides additional information to the data set 23124110_1Health and Population Statistics Data.csv 23124110_2Health and Population Statistics Definition and Source.csv Just confirming project is still under development?

len1479924

5/1/2017 3:54:14 AM

This assignment will take a number of weeks to complete and will require a good understanding of data science and management for successful completion. It is imperative that students take heed of the following points in relation to doing this assignment: 1. Ensure that you clearly understand the requirements for the assignment – what has to be done and what are the deliverables. 2. If you do not understand any of the assignment requirements – Please ASK the course coordinator or your tutor. 3. Each time you work on any aspect of the assignment reread the assignment requirements to ensure that what is required is clearly understood.

len1479924

5/1/2017 3:54:06 AM

Outstanding: High Distinction: Distinction: An outstanding attempt – well formatted and professionally presented piece of work. An excellent piece of work that meets all the specified criteria with very minor omissions or mistakes More than competently meets the criteria specified with only minor mistakes or omissions. 2 references for the explanation of Clustering and 2 for linear regression are required. These references should follow the Harvard method of referencing. Note that ALL references should be from journal articles, conference papers, technical papers or a recognized expert in the field. DO NOT use Wikipedia as a reference. The use of unqualified references will result in the deduction of marks.

len1479924

5/1/2017 3:53:49 AM

Project must be completed with The R Project for Statistical Computing Submit your assignment to Blackboard Task 2. Please follow the submission instructions on Blackboard. The assignment will be marked out of a total of 100 marks and forms 30% of the total assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign system provided by Blackboard automatically. Refer to your Course Outline or the Course Web Site for a copy of the “Student Misconduct, Plagiarism and Collusion” guidelines. Assignment submission extensions will only be made using the official Faculty of Arts, Business and Law Guidelines.

Write a Review

Data Structure & Algorithms Questions & Answers

  What is global or per process page replacement algorithms

What is better global or per process page replacement algorithms?

  Write an algorithm that includes the three control structure

This project provides you with the opportunity to examine algorithms, identify the inaccuracies in the algorithms, and finally, to modify the algorithms with the correct details. Your next step will be to write an algorithm that includes the three..

  Why knapsack problem known as zero-one knapsack problem

Why Knapsack Problem explained as 0/1 Knapsack Problem. Skecth Dynamic Programming Tables (one for calculating optimal value and one for keeping track of items used.

  Java program to find largest and smallest numbers

Create a Java program that will search a text document of strings representing numbers of type int and will write the largest and the smallest numbers to screen.

  Develop a number of classification models

First task you should complete is a data investigation exercise, where you will document the characteristics and other information that you can determine about each Feature.

  Show the order to names for a preorder traversal

If we deleted Eddie from the tree, what would the new tree look like (use one of the current names to replace Eddie and the tree redone)?

  List of common data structures

Make a list of some of the common data structures provided by C#. You should have a minimum of 4 different data types.

  Write about algorithms and pseudocode

Write about algorithms and pseudocode. Reflect back on this course- what element of this course has helped you analyze these algorithms most? Why? What resource have you found that would help you analyze an algorithm.

  Consider and explain whether or not you can use a sort

1.consider and explain whether or not you can use a sort routine to sort unstructured data.2.contrast and compare an

  Create a graph showing best average and worst case

Create a graph showing best, average, and worst case T scores for your code in number 3 above for n = powers of 2 from 21 to 216. If there are any results you cannot provide be specific as to the reason. You can modify the best/worst by modifying ..

  Writing algorithm which ?nds xbest

Provide an O(n) algorithm which ?nds xbest such that distbest:= ∑i=1 to n|xbest - xi| is as small as possible.

  Sorting arrays of name in descending order

Then sort arrays so that records are in descending order by purchase amount for month. Output lists the names of the top five customers.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd