Reference no: EM132206822
Machine Learning Assignment -
Machine learning is an active area of research with a high level of impact on real-world problems.
The objective of this assignment is to allow you to explore an interesting and relevant machine learning dataset using Scikit-Learn. More specifically you will be required to perform pre-processing, build and evaluate machine learning models and write a report on the results.
You will also be required to pick a specific area to research. This research should be integrated into your methodology and evaluation (more detail on this below).
Dataset - Your initial task will be to select an appropriate dataset. You should select either a regression or classification dataset. Ideally you should pick a dataset where machine learning algorithms have already been applied (although this is not essential). Clearly when selecting a dataset you should identify the column that will act as the classification or regression target for the model.
- Avoid using time series, text classification, image or audio data.
- Avoid datasets where you have to spend time merging a number of disparate datasets.
- I recommend that you limit the size of your dataset to 25MB in size. Just to give you an example, a 14MB file took 13 seconds to run 10 fold cross validation, a 20MB file took about 20 second to run 10 fold cross validation, a 25MB file took 25 seconds. These tests were carried out on an I5 and using a DecisionTreeModel. Clearly these times will vary significantly depending on the model and the characteristics of the data. You should keep in mind that you will need to do hyper-parameter optimization, which will take much longer. Please note this is just a recommendation and if you are really interested in adopting a bigger dataset please let me know.
Project Overview -
The project requires you to build machine learning models for your chosen dataset. You will need to perform pre-processing on your data. Follow the pre-processing steps outlined in the Scikit Learn lecture notes. You will need to build and comprehensively evaluate a range of machine learning models. The most promising models should then undergo hyper-parameter optimization.
You are also required to pick a specific topic to research and then incorporate the result of this research into your models and evaluate the impact. For example, if your dataset is imbalanced your research could focus on the techniques that are commonly used to address imbalance. You would then proceed to incorporate some of these into your evaluation and assess the impact on your results.
You should also compose a research report detailing the work you have undertaken and the overall findings. You will find a template for the research paper in the assignment folder. This template adheres to the Springer paper specification. The paper you submit should contain the following sections:
(i) Abstract
(ii) Introduction
(iii) Research
(iv) Methodology
(v) Evaluation
(vi) Conclusions and Future Work
Attachment:- Assignment Files.rar