Create an economy feature that combines population and gdp

Assignment Help Other Subject
Reference no: EM133707402

Homework: Public Health Factors have the Greatest Impact on Life Expectancy

Learning Objective

I. Perform exploratory data analysis using suitable visualization tools.

II. Learn data preparation to build various ML algorithms.

III. Understand data strategy for addressing different business problems.

IV. Develop classification algorithms such as logistic regression, decision tree learning, and random forest to improve sales conversion.

V. Understand the importance of clustering and build clusters using techniques such as K-means clustering and hierarchical clustering.

VI. Identify cluster characteristics and corresponding business insights.

VII. Demonstrate the application of recommender systems in cross-selling to customers.

Which Public Health Factors have the Greatest Impact on Life Expectancy?

Life expectancy is the crucial metric for evaluating population health. It provides the average number of years that a group of people in a population is estimated to live. This factor is estimated based on various public health factors. The task of this project is to determine what are the various factors which can help in determining life expectancy.

Data Source:

The raw data was extracted from Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps track of the health status. The various features of the dataset include:

Features include:

Country

HIV\AIDS

Measles

Year

Hepatitis B

Body Mass Index (BMI)

Life expectancy

Polio

Status

Adult mortality

Diphtheria

Prevalence for malnutrition 5-9

Infant mortality

Gross Domestic Product (GDP)

Education

Alcohol consumption

Population

Total expenditure on health

Expenditure on health (%)

Prevalence for malnutrition 1-19

Status

Task I:

Read the raw data from the source file in Python.

Perform feature engineering:

A. Population Size - Create a population range that includes three categories:

a. Small - a population between 1,000 and 29,999,
b. Medium - a population between 30,000 and 99,999, and
c. Large - a population of 100,000 or more.

B. Lifestyle - Create a lifestyle feature that combines alcohol consumption and BMI.

C. Economy - Create an economy feature that combines population and GDP.

D. Death Ratio - Determine the death ratio between adult and infant mortality.

Task II:

Perform data cleaning by either removing any fragmented observations or by imputing missing values as necessary. Generate scatter plots between each predictor with the target variable to check the linear relationship and apply data transformations like log transform, if necessary.

Task III:

Generate a correlation heat map to assess multicollinearity with the threshold set as 0.75. All variables above 0.75 need to be dropped.

Task IV:

Eliminate possible outliers by generating box-whisker plots.

Task V:

Perform data analysis to answer the following questions:

A. Should a country having a lower life expectancy value (<65) increase its healthcare expenditure to improve its average lifespan?

B. What is the impact of schooling on the lifespan of humans?

C. Does Life Expectancy have a positive or negative relationship with drinking alcohol?

D. Do densely populated countries tend to have a lower life expectancy?

Task VI:

Split the remaining data into around 75% for training and 25% for the test set. Train the linear regression model and assess the performance on the training set, test set, and the entire dataset.

For assessing model performance, use various metrics such as Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and R2 Score.

Draw a residual scatter plot between the target variable on the x-axis and predicted values on the y-axis. The scatter plot should contain an ideal unity line that represents the cases when predicted values are the same as target values. The plot will contain dotted error lines corresponding to +/- 5 colored as yellow and +/- 10 years colored as red. These lines will provide easier visualization of data performance to see data scatter.

Draw residual histogram.

Perform appropriate cross-validation to check if the linear regression model has data overfit. Generate a box plot to display model performance for each fold. Also, determine the mean and standard deviation of overall performance.

Task VII:

Determine the minimum number of features and which features need to be included to ensure that all the data is bound within the error lines mentioned above.

Reference no: EM133707402

Questions Cloud

About to engage in normal daily activities : This stressor is usually caused by bacteria or virus. It could impact person living with dementia by them not being about to engage in normal daily activities
Did the change have the expected or predicted effect : explain the factor: what caused these two factors to change, and how your firm dealt with the change. Did the change have the expected or predicted effect
Environmental policies and economic development : provide me with a list of other key authors who would have written on the relationship between environmental policies and economic development
Calculated nominal wage but the value of marginal product : What happens when the value of marginal product in manufacturing is greater than the calculated nominal wage but the value of marginal product in food is less
Create an economy feature that combines population and gdp : Create a lifestyle feature that combines alcohol consumption and BMI. Create an economy feature that combines population and GDP.
What do you think about the current trends of household : What do you think about the current trends of household incomes in the U.S. Which groups of the population have been affected the most. Do you think it will get
Patient complained of sore throat-having husky voice : Patient complained of sore throat, having husky voice and also a lump in his left leg. Requested to see Dr. burger for fortnightly check-up.
What is the extra profit earned by producers : What is the extra profit earned by producers when supply of a good is limited artificially by a limit imposed on the amount of goods allowed to be brought
What does theory suggest about the likely short-run : What does theory suggest about the likely short-run effect on real income growth this change? Do we observe such an effect in the data

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd