Reference no: EM132011792
Background
A research team planned to study the heath development of the world in the past 15 years. The team retrieved the dataset from World Bank about Health and Population Statistics between 2001 and 2015.
The dataset contains the following attributes:
- Birth rate, crude (per 1,000 people)
- Fertility rate, total (births per woman)
- Adolescent fertility rate (births per 1,000 women ages 15-19)
- Death rate, crude (per 1,000 people)
- Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total)
- Cause of death, by injury (% of total)
- Cause of death, by non-communicable diseases (% of total)
- Mortality caused by road traffic injury (per 100,000 people)
- Health expenditure per capita (current US$)
- GNI per capita, Atlas method (current US$)
- Health expenditure, private (% of GDP)
- Health expenditure, public (% of GDP)
- Health expenditure, total (% of GDP)
- Maternal mortality ratio (national estimate, per 100,000 live births)
- Immunization, BCG (% of one-year-old children)
- Life expectancy at birth, male (years)
- Life expectancy at birth, female (years)
- Life expectancy at birth, total (years)
- School enrollment, primary (% gross)
- School enrollment, secondary (% gross)
- School enrollment, tertiary (% gross)
- School enrollment, tertiary, female (% gross)
- Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)
- Unemployment, female (% of female labor force) (modeled ILO estimate)
- Unemployment, male (% of male labor force) (modeled ILO estimate)
- Unemployment, total (% of total labor force) (modeled ILO estimate)
More details about the data attributes and data content can be found in the attached documents.
Assignment Task
You are a member of the team, and need to perform data analysis on countries in the region of East Asia & Pacific.
The team has not set any specific goal for the analysis. Therefore, you have the freedom to explore the data, and dig out anything you feel interesting or significant.
You have been requested to prepare a data analysis report about your work and explain your findings. The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.
To prepare the report, please follow the following outline:
1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from.
2. Data Setup
Describe how to load the data, and the libraries needed. Provide an overview of the data about its dimensions and structures.
3. Exploratory Data Analysis
Perform 3 one-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate.
Perform 2 two-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate
The analysis can be performed on all years and all countries, or on a subset of your interest.
4. Advanced Analysis
Clustering
Briefly explain the concept of clustering and k-means.
Try to do a clustering analysis to group countries according to some selected attributes.
Linear Regression
Briefly explain the concept of linear regression.
Try to do 2 linear regression analysis. Plot the learned models.
The analysis can be performed on all years and all countries, or on a subset of your interest.
5. Conclusion
6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.
For the data analysis, you need to provide both R code, and the explanation to the code and the result. For the section 2 - 4, please represent each R code snippet in a box with some comments. For example:
# Draw a boxplot on the attribute "Income" boxplot(MyData$income)
Report Format
Your report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. All comments and graph titles are counted.
The report MUST be formatted using the following guidelines:
- Paragraph text - 12 point Calibri single line spacing
- Headings - Arial in an appropriate type size
- Margins - 2.5cm on all margins
- Header - Report title
- Footer - page number (including the word "Page")
- Page numbering - roman numerals (i, ii, iii, iv) up to and including the Table of Contents, restart numbering using conventional numerals (1, 2, 3, 4) from the first page after the Table of Contents.
- Title Page - Must not contain headers or footers. Include your name as the report's author but DO NOT include any reference to your student ID, course code or course name.
- The report is to be created as a single Microsoft Word document. No other format is acceptable and doing so will result in the deduction of marks.
For advise on report writing, the following book provides good advices:
Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.
Attachment:- Health and Population Statistics Data.rar