CE802 Machine Learning Assignment

Assignment Help Other Subject
Reference no: EM133071740 , Length: word count:1500

CE802 Machine Learning - University of Essex

Assignment: Design and Application of a Machine Learning System for a Practical Problem

Assignment objective: This document specifies the coursework assignment to be submitted by students taking CE802. Aims of this assignment are: a) to learn to identify machine learning techniques appropriate for a particular practical problem; and b) to undertake a comparative evaluation of several machine learning procedures when applied to the specific problem.

Pilot-Study Proposal:
Imagine that you work as Machine Learning independent consultant, providing scientific advisory and consulting services to companies seeking to apply data analytics to their business activities. The manager of a large hotel chain contacts you to investigate the feasibility of using machine learning to predict whether a new hotel opened in a given location will be profitable or not (i.e. whether it will make a profit or a loss, regardless of the amount). The manager has access to historical data of successful and unsuccessful hotels opened under the chain's brand and she offers to provide you with geographical and socio-economical data about the locations and neighbourhoods.

In the first part of your assignment, you are asked to write a detailed proposal for a pilot study to investigate whether machine learning procedures could be used to successfully solve this problem. Your report should discuss several aspects of the problem, including the following main points:
ˆ the type of predictive task that must be performed (e.g., classification, regression, clustering, rules mining, ...);
ˆ examples of possibly informative features that you would like to be provided with (what type of information that the company could obtain about the location is likely to be a good predictor?);
ˆ the learning procedure or procedures (e.g., DTs, k-NN, k-means, linear regression, Apriori, SVMs, ...) you would choose and the reason for your choice;
ˆ how you would evaluate the performance of your system before deploying it.

You can assume that the manager has some knowledge of machine learning and you do not need to explain how the recommended learning method works. Simply discuss your recommendation and back it with sound arguments and/or references.

This document should consist of approximately 500-750 words of narrative (i.e. excluding references, pictures, and diagrams).

2. Comparative Study

Thanks to the convincing arguments in your pilot-study proposal, the company decides to collect the data that you suggested and to hire you to perform the proposed study. They provide you with a training set of historical data with features for each hotel opened in the past and a label representing whether the hotel was profitable or not. These data are available in the CE802_P2_Data.zip archive available from the CE802 moodle page. In this part of the assignment, you are asked to perform the following two tasks.

a) Investigate the performance of a number of machine learning procedures on this dataset. Using the data in the file CE802_P2_Data.csv contained in the CE802_P2_Data.zip archive, you are required to perform a comparative study of the following machine learning proce- dures:
ˆ a Decision Tree classifier;
ˆ at least two more ML technique to predict if the hotel will make a profit.

You will notice that one of the features is missing for some of the instances. You are therefore required to deal with the problem of missing features before you can proceed with the prediction step. As a baseline approach you may try to discard the feature altogether and train on the remaining features. You are then encouraged to experiment with different imputation methods.
The company uses Python internally and therefore Python with scikit-learn is the required

language and machine learning library for the problem. For this task, you are expected to submit a Jupyter Notebook called CE802_P2_Notebook.ipynb containing the Python code used to perform the comparative analysis and produce the results, as well as the code used to perform the predictions described in task "b" below.

b) Prediction on a hold-out test set. An additional dataset, CE802_P2_Test.csv, is provided inside the CE802_P2_Data.zip archive. Binary outcomes are withheld for this test set (i.e. the "Class" column is empty). In this second task you are required to produce class predictions of the records in the test set using one approach of your choice among those tested in task "a" (for example the one achieving the best performance). These data must not be used other than to test the algorithm trained on the training data.
As part of your submission you should submit a file called CE802_P2_Test_Predictions.csv

in CSV format, which must be identical to CE802_P2_Test.csv except that the missing class is replaced with the output predictions obtained using your chosen approach. This second task will be marked based on the prediction accuracy on the test set.

3. Additional Comparative Study

Thanks to the good results obtained in the comparative study, the hotel chain has deployed your system and is obtaining good profit. Now a similar company would like to hire you to design a similar system for them but, unlike the first system, they would like you to predict not only if a new business will be profitable but also its annual profit.

They provide you with a training set of historical data containing features of each business and a numerical value representing the profit (which may be positive or negative). These data are available in the CE802_P3_Data.zip archive available from the CE802 moodle page. In this part of the assignment, you are asked to perform the following two tasks.

a) Investigate the performance of a number of machine learning procedures on this dataset. Using the data in the file CE802_P3_Data.csv contained in the CE802_P3_Data.zip archive, you are required to perform a comparative study of the following machine learning proce- dures:
ˆ Linear Regression;
ˆ at least two more ML technique to predict the profit.

This company too uses Python internally and therefore Python with scikit-learn is the required language and machine learning library for the problem. For this task, you are expected to submit a Jupyter Notebook called CE802_P3_Notebook.ipynb containing the Python code used to perform the comparative analysis and produce the results as well as the code used to perform the predictions described in task "b" below.

b) Prediction on a hold-out test set. An additional dataset, CE802_P3_Test.csv, is provided inside the CE802_P3_Data.zip archive. Target values are withheld for this test set (i.e. the "Target" column is empty). In this second task you are required to produce predictions of the records in the test set using one approach of your choice among those tested in task "a" (for example the one achieving the best performance). These data must not be used other than to test the algorithm trained on the training data.
As part of your submission you should submit a file called CE802_P3_Test_Predictions.csv

in CSV format, which must be identical to CE802_P3_Test.csv except that the missing "Target" column is replaced with the output predictions obtained using your chosen approach. This second task will be marked based on the root mean squared error on the test set.

4. Report on the Investigation

After conducting the studies in parts 2 and 3, you are asked to write a report containing an account of your investigation. There should be a brief summary of the experiments performed followed by one or more tables and/or graphs summarizing the performance of the different solutions. Any numerical data that you include should be in a suitable graphical or tabular form. The rest of the report should concentrate on your interpretation of the results and what they tell you about the relative strengths and weaknesses of the alternative methods when applied to the given data.

This document should consist of approximately 750-1500 words of narrative (i.e. excluding references, pictures, and diagrams). Please report your word count on the title page. The document must be submitted in PDF format with file name CE802_Report.pdf.

Attachment:- Machine Learning.rar

Reference no: EM133071740

Questions Cloud

Explain the bond yield-to-maturity : List and define 3 reasons that one bond's yield to maturity (YTM) would be different than another bond's yield-to-maturity?
Calculate the delta of the option : Stock A is trading for $150. Consider an option that allows you to buy the stock for $150. What type of option is it, and in which state is it at?
Prepare the entry to record the cost of the ore mine : Question - Diamond Company acquires on are mine at a cost of $1,300,000. Prepare the entry to record the cost of the ore mine
What does the cost of capital represent : Instruction: Please show your work (explain how you arrived at the answer). You are encouraged to use Excel for computation. You can submit your report in Word,
CE802 Machine Learning Assignment : CE802 Machine Learning Assignment Help and Solution, University of Essex - Assessment Writing Service - Design and Application of a Machine Learning System
What is the value of inventory turnover : What is the value of Inventory Turnover(INVT)? (Choose the closest)
Find Hartman unlevered beta : Hartman Motors has $20 million in assets, which were financed with $5 million of debt and $15 million in equity. Find Hartman unlevered beta
Calculate xyz firm cash conversion cycle : You need to compute XYZ company's cash conversion cycle. Looking at the financial statements, you observe that last year average inventory was $23,600, accounts
Explain the short-term financing : Assuming the company maintains its target cash balance at $135,000, what sales growth rate would result in a zero need for short-term financing?

Reviews

len3071740

1/21/2022 11:38:13 PM

Please start the assignment. There are 6 files needed to be uploaded. Just do it right Please read the assignment questionarie fully and do it. Thanks. I did the previous sem assignments.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd