Implement a ML solution for a classification problem

Assignment Help Other Subject
Reference no: EM133129425 , Length: word count:3500

COMP1804 Applied Machine Learning

Learning Outcome 1: Rationalise appropriate scenarios for Machine Learning applications and evaluate the choice of machine learning methods for given application requirements.

Learning Outcome 2: Demonstrate competency in using appropriate libraries/toolkits to solve given real- world Machine Learning problems and develop and evaluate suitable application.

Learning Outcome 3: Understand and apply the relevant input data preparation and processing required for the Machine Learning models used, and quantitatively evaluate and qualitatively interpret the learning outcome.

Learning Outcome 4: Recognise and critically address the ethical, legal, social and professional issues that can arise when applying Machine Learning technologies.

Sub-task 1: Text Classification/regression - peer reviews.

This task is to implement a ML solution for text classification/regression (long texts). It uses a dataset of ML paper peer reviews from the International Conference of Learning Representation (in the years between 2017 and 2020) [1,2].
Specifically, you will use as input a text document concatenating: the title of the paper, the abstract of the paper, the review comments, the final acceptance/rejection comment. Such input should be used to predict the following attributes:
• Acceptance status (‘Accept' or ‘Reject')
• Review score (Integer number between 1 and 10).
Note that for the latter attribute you can choose whether to use multiclass classification or regression. You can choose whether to predict both features simultaneously or separately.

Additionally, the dataset is provided with a further attribute, the reviewer confidence score (an integer number between 1 and 5), which is optional to use. If you want to explore the data further, a separate dataset with the text field split into the original fields "review comments", "paper title", "paper abstract" and "final acceptance/rejection comment" can be provided upon request.

Sub-task 2: Image classification - skin lesions.

This task is to implement a ML solution for a classification problem from images. Specifically, you are provided with images of skin lesions [3] and your task is to correctly predict the following attributes:
• Whether a skin lesion is benign or malign (1 for ‘is_benign', 0 for ‘is_malign')
• The fine-grained diagnosis for the skin lesion (7 possible categories).
You can choose whether to predict both features simultaneously or separately. Additionally, the dataset is provided with a further attribute, the location of the skin lesion (for example, "scalp"), which is optional to use. If you want to explore the data further, a separate dataset with more attributes can be provided upon requests. The dataset has been adapted to the requirements of this module; the original dataset was released under the terms of the CC BY-NC 4.0 licence by Tschandl et al. [3].

Sub-task 3: Image classification - advertisements.

This task is to implement a ML solution for a classification problem from images. Specifically, you are provided with images of advertisements [4] and your task is to correctly predict the topic of each advertisement.
• Images are of different sizes and there are 39 possible topic categories.
• You may choose to group together some of the categories (keeping no less than 12 categories). You should thoroughly discuss (and will be evaluated on) the reasons behind and the implications of grouping together different categories.

Sub-task 4: Text classification - amazon reviews.

This task is to implement a ML solution for a multi-task classification problem from text data (mostly short texts). Specifically, you are provided with Amazon reviews [5] (the text is the review title and the review main body joined together) and your task is to predict the following attributes:
• The number of stars associated with the review (on a scale of 1 to 5).
• Whether a product is from the category "Video Games" ("video_games") or "Musical Instrument" ("musical_instrument").
Note that for the first attribute you can choose whether to use multiclass classification or regression. You can choose whether to predict both features simultaneously or separately.

Additionally, the dataset is provided with a further attribute: whether the review is verified or not (either True or False), which is optional to use. If you want to explore the data further, a separate dataset with the text field split into the original fields "review title", "review main body" can be provided upon request.

Tasks:

1. Practical Assignment (complete code that is executed without errors). The source code must be well documented and error free (i.e. no debugging necessary to run). For each dataset, the assignment includes:
o Exploratory Data Analysis (e.g. label distributions per attribute and per set).
o Data cleaning.
o Data Splitting (in training and test sets, but see below) and Data Pre-processing (where appropriate: normalization/standardization, data augmentation, over/under-sampling, text processing).
o 2 ML Methodologies (a basic one & an additional one): appropriate ML methods should be used that have coherent implementations and sound pipelines, without any errors; (if the basic ML method is a Neural Network, the additional one can be another Neural Network).
o Systematic experimentation: you should choose one parameter/attribute to change for each ML methodology (the attribute/parameter can be the same or different across the two methodologies) and show how it affects the results using clear and well formatted figures and tables. Bonus points are given for experimenting using a validation dataset.
o Evaluation of the 2 methods using at least 2 metrics and showing 3-10 examples from the test dataset.

2. Written Report:
• Document in IEEE conference format. Use template available on Moodle or on Overleaf (make a copy of the Overleaf template).
• Should include references (citing other work) where appropriate (when images, data, code, or any other resources have been used from other sources)
• Document structure:
o Abstract: Briefly summarise what the report contains. That is: the task you are solving and why it is important; the outline of the ML methods you implemented and the systematic experimentation performed; the summary of your results and your conclusions. The abstract should be between 100 and 200 words.
o Introduction and related work: This section should talk about the following:
• The problem to be solved, why and to whom it matters, why it is challenging.
• Existing work related to your chosen task (it can be about the exact same task or a similar one).
• A brief overview of the dataset and the data pre-processing steps implemented.
• Your chosen ML implementations and a brief overview about why they are appropriate.
• What your systematic experiment is.
o Ethical discussion: Identify and discuss some of the social, ethical and legal implications of your chosen task, from data collection and processing to the ML prediction. The discussion should take into account communities and people that may be affected by the ML system.
o Dataset preparation: Describe exploratory data analysis, data cleaning, splitting and pre-processing and the reasons behind your design choices.

o ML methods: Describe and explain the 2 methods used and the reasons behind your design choices.
o Experiments and evaluation. Describe the systematic experimentation implemented for each ML method. Based on the experiments, evaluate, present, analyse and explain method performance and metrics used (why are the metrics appropriate?).
o Discussion and future work: Reflections on a) what worked well and what worked less well; b) reasons behind the performance obtained; c) how your work could be extended in the future and what addition can be made to it.
o Conclusions: A brief summary of the work done and what the main highlights were.
o References: All existing works and resources (code/images/etc) you used or talked about in your report must be cited properly.

Attachment:- Applied Machine Learning.rar

Reference no: EM133129425

Questions Cloud

Improve profits over the high-cost strategy : Given the preferences, would bundling improve profits over the high-cost strategy? Support your conclusion by showing if (by how much) profits differ under eac
List one characteristic of demand : When might it be beneficial for a company to use the FIFO method? When is the weighted-average more practical?
Capital market line and security market line : Explain the difference between capital market line (CML) and security market line (SML)
Qualitative descriptive research design : You have just been assigned as a project lead to a research team that is tasked with framing a potential qualitative descriptive study.
Implement a ML solution for a classification problem : Recognise and critically address the ethical, legal, social and professional issues that can arise when applying Machine Learning technologies
Proposed dissertation research study topic : A doctoral learner has decided to do a qualitative descriptive study for his/her proposed dissertation research study topic because it is believed to be the bes
What percentage of their social security benefits is taxable : Social Security $18,000 They did not have any adjustments to income. What percentage of their Social Security benefits is taxable
Calculate the sample proportion of insomnia sufferers : Calculate the sample proportion of insomnia sufferers who did not improve after the therapy. Round to three decimal places.
Monetary policy decision statement : Using the static Aggregate Demand - Aggregate Supply model discussed in the unit, illustrate and explain how there could be inflationary pressure in the economy

Reviews

Write a Review

Other Subject Questions & Answers

  What is corporate social responsibility to a company

What is Corporate social responsibility (CSR) to a company? What is the relationship between CSR and profitability to a company?

  Paper based on best instructional practices

You will write a scholarly paper based on best instructional practices. Frame your paper so that you are focused on a specific context (e.g. high school biology) so that you are able to generate ideas for application as your write your paper.

  Natural human languages

How does the example under PLAY 3 illustrate a fundamental failing of a word chain grammar as a model for accounting for sentences in natural human languages?

  Supporting the abolition of indeterminate sentence

What were some of the major criticisms that led some states to abandon the indeterminate sentence and parole?

  Analyze pros on whether position would best be recruited

Analyze the pros and cons and make a recommendation on whether the position would best be recruited internally or externally and support your decision with your

  What health disparity statistics can share with the group

What health disparity statistics can you share with the group? This would allow readers to understand the critical disparities among your population of interest

  What happens during the discovery phase of litigation

In Unit 2, you are learning about the resolution of private disputes and what happens during the discovery phase of litigation.

  What will be the long-run effect of the proposal

At a university faculty meeting in 2012, a proposal was made to increase the housing benefits for new faculty to keep pace with the high cost of housing.

  How the physical environment can promote wellbeing

P3 -T/602/3174-Lead and Manage Group Living for Adults-Pearson Edexcel Level 5 Diplomas in Leadership for Health and Social Care.

  How supportive and interpersonal psychotherapies are similar

Briefly describe how supportive and interpersonal psychotherapies are similar. Explain at least three differences between these therapies.

  How could the issues be eliminated or minimized

What problems might the multiple stakeholders or members of the intervention team encounter in using a shared-decision making model? How could those issues be.

  Differences between community and correctional setting

Differences between community setting and correctional setting case management

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd