Explore the data to gain insights

Assignment Help Python Programming

Reference no: EM132943296

Applied Machine Learning Report

1 Report Overview
Given the "Life Expectancy Data.csv" dataset, build a model to predict a country's expectancy-who"Life expectancy" using some of the following features from the "Life Expectancy Data.csv" dataset1:
• Year;
• Status;
• Adult Mortality;
• infant deaths;
• Alcohol;
• percentage expenditure;
• Hepatitis B;
• Measles;
• BMI;
• under-five deaths;
• Polio;
• Total expenditure;
• Diphtheria;
• HIV/AIDS;
• GDP;
• Population;
• thinness 1-19 years;
• thinness 5-9 years;
• Income composition of resources;
• Schooling.

Ideally, your report should contain the following contents corresponding to the machine learning project checklist we discussed during Week 2's lecture.

2.1 Frame the Problem
At this initial step, you may first consider what type of machine learning solution would the problem take, e.g.:
• supervised or unsupervised learning;
• batch or mini-batch/online learning;
• instance-based or model based,

2.2 Get the Data
Preferably, the data can be loaded automatically from a fixed folder within your local machine 2, e.g., see the download script from Slide No. 134 of Week 2 lecture. It is also a good idea to convert the dataset into a panda frame format.
Examine the general dataset structure and perhaps consider missing (null) values within the columns (attributes) of some instances (you may also profile you data set using the info() method of panda data frame objects). Recall also that it is at this step where you should create your test set.

2.3 Explore the Data to Gain Insights
Visualise the data to look for possible correlations 3. You may also want to experiment with different attribute combinations.

2.4 Prepare the Data for Machine Learning Algorithms
At this step, you may consider:
Data cleansing: null/missing values cannot be handled by some machine learning algorithms.
Handling non-numerical data: convert text/categorical data into numerical.
Custom transformers: creating your own custom transformers, e.g., see the code in Slide No.
322 in Week 2's lecture that introduces combined attributes as new features.
Feature scaling: some machine learning algorithms (e.g., SVMs) are sensitive to unscaled fea- tures, perhaps you may consider scaling the features for these algorithms.
Transformation pipelines: ideally, automate the whole data transformation and training pro- cesses, e.g., see Slide No. 348 of Week 4's lecture.

2.5 Select and Train a Model
2.5.1 Consider several models and evaluate using cross-validation
For this step, you may further consider training several models, e.g.:
• Linear/logistic/softmax regression;
• Polynomial regression;
• SVM regression;
• Decision trees/random forests;
• Ensemble learning;
• Artificial neural networks,
etc. Each model can be further evaluated using cross-validation, e.g., see Slides No. 378-415. Preferably, you should also discuss why you have not considered some of the models above in your machine learning solution. Also, consider the computation cost of training and generating the predictions from your models.

2.5.2 Fine-tuning the model
You may further consider fine-tuning your model using:
• Grid/Randomized search;
• Performance measures, e.g.: accuracy, precision, f1 scores, mean square error, etc;
• Ensemble methods;
• Evaluating on the test set.

3 Machine Learning Solution Format
Your machine learning solution should be coded under Python and where the machine learning algorithm classes are from the scikit-learn library. You should submit a zipped folder containing both your Report document and your Python codes. Your report should contain enough empirical evaluations and arguments to show that your machine learning model is indeed fit- enough.

Attachment:- Applied Machine Learning Report.rar

Reference no: EM132943296

Questions Cloud

Conduct research to gather data on career salaries : Conduct research to gather data on career salaries, and you will practice analyzing that data using descriptive statistics

By how much will misvalue the firm : By how much will you misvalue the firm if its beta is actually 0.6? (Round your answer to the nearest cent. Enter your answer as positive value.)

What should the purchaser record as the acquisition cost : What should the purchaser record as the acquisition cost of the new truck? A company purchases a new delivery truck, paying $45,000 to the vendor.

Which the company should for pp-e asset : The accumulated depreciation account had a balance of $150,000 after the current year's depreciation of $37,500 had been recorded. The company should

Explore the data to gain insights : Explore the Data to Gain Insights - Visualise the data to look for possible correlations 3. You may also want to experiment with different attribute combination

How much annual depreciation expense should be recognized : How much annual depreciation expense should be recognized for 2016, using straight-line depreciation? On Jan. 1, 2014, a company placed into service a machine.

Which newly developed products are likely to follow a : Nestlé sells over 2,000 food and consumer brands, including Lean Cuisine frozen food and Gerber baby food. Their newly developed products are likely to follow a

Which maps should be based on : Home Depot has decided to use OS and AR perceptual maps to analyze their marketplace. Which of These maps should be based on

Which of characteristics is unique to morphological matrix : A new mobile applications firm is hoping to tap into an expert outside source of ready-made new product concepts. Who should they turn to?

User Account

All Pages