COS60008 Introduction to Data Science Assignment

Assignment Help Other Subject
Reference no: EM132548991

COS60008 Introduction to Data Science - Swinburne University of Technology

Project Overview

You are provided with a dataset chocolate.csv on chocolate bars. Your goal is to develop a machine learning model which takes the properties of a specific chocolate bar (e.g. the percentage of cocoa, the origin of beans), and output the rating. The dataset contains the relevant information of a number of chocolate bars, along with expert ratings as the ground truth.

Data source
The dataset is from Brady Brelinski, Founding Member of the Manhattan Chocolate Society. The data is also used in a Kaggle competition.

Columns description
• Company (Maker-if known): name of the company (string).
• Specific Bean Origin: the geographical origin for the chocolate bar (string).
• REF: a value indicating when the review was entered in the database. A higher value indicates more recently entered (integer).
• Review Year: the year of the review published (integer).
• Cocoa Percentage: cocoa percentage of the chocolate bar (string).
• Company Location: the country of the manufacturer (string).
• Rating: expert rating for the chocolate bar (float). This is the label to be predicted by the model. It is a number from 1 (lowest quality) to 5 (highest quality).
• Bean Type: the type of cocoa bean used (string).
• Broad Bean Origin: the broader geographical origin of the cocoa bean (string).

Dataset dimension
• Samples (rows): 1500
• Attributes (columns): 9 (including the target: rating)

Tasks

Your team will need to accomplish the following tasks. You should apply the suitable techniques covered in the lectures and tutorials.

1. Perform data pre-processing. This includes but is not limited to checking typos, dealing with missing values and creating dummy variables.
2. Formulate the problem as a machine learning task.
3. Select three learning algorithms based on the previous task and identify the corresponding hyperparameters if any. There must be at least one hyperparameter (to be optimised in Task 5).
4. Perform data partitioning. This will split the data into the training data and the test data. The training data will be used for model development, with the test data for performance evaluation.
5. Perform model development

o List all your learning algorithms by expanding on the hyperparameters. For example, you might select RandomForest, K-Nearest Neighbours (K-NN) and Artificial Neural Networks (ANN) as the three learning algorithms. You nominate the number of neighbours N as the hyperparameter and proposed 5 possible values (e.g. 6, 7, 8, 9 ,10). Hence effectively, you will have the following algorithms:
• RandomForest (0 hyperparameters, 1 model)
• ANN (0 hyperparameters, 1 model)
• K-NN (1 hyperparameters with 5 possible values, 5 models)
» K-NN (N=6)
» K-NN (N=7)
» K-NN (N=8)
» K-NN (N=9)
» K-NN (N=10)
o Assess each learning algorithm on the training data. For a given learning algorithm L, you will assess its validation performance as follows:
• Define an n-fold cross validation within the training data, where n is from 3 to 5.
• In each fold, identify the actual training data trData and the validation data vlData. Train L on the trData and test on the vlData to get the validation performance P.
• Obtain the average of P over all folds, which is the final performance of L.
o Select the model M with the highest validation performance.
6. Perform performance assessment
o Apply M on the test data to get the prediction.
o Calculate the accuracy and the confusion matrix.
7. Conduct other analysis to be decided by the team members. For example:
o Identify the most predictive attributes.
o Map out the chocolate rating geographically on a map.

Attachment:- Introduction to Data Science.rar

Reference no: EM132548991

Questions Cloud

Calculate the sales price per unit and the variable expenses : Calculate the sales price per unit and the variable expenses per unit. Compute the company's break-even point in units and dollars.
Estimate variable overhead efficiency variance : Compute variable overhead spending variance and a variable overhead efficiency variance. Universal Parcel provides parcel delivery services to many merchants.
Briefly explain the cyber-risk function : What is an open port and Why is it important to limit the number of open ports a system has to only those that are absolutely essential
Find what is cash budget for the period april through june : Find What is the cash budget for the period April through June, by month and in total? What is proforma balance sheet as of June 30.
COS60008 Introduction to Data Science Assignment : COS60008 Introduction to Data Science Assignment Help and Solution, Swinburne University of Technology - Assessment Writing Service
Healthcare delivery and information management : Opportunities regarding your focus area within healthcare technology being used to improve healthcare delivery and information management.
Draw a graph that shows corresponding objective function : Using the symbol P to represent total profit, you are given the following objective function for a linear programming model with decision variables x1 and x2:
Discusses situational awareness : Discusses situational awareness. Much of the security efforts of the past have been centered around prevention and protection.
Effectiveness of vulnerability management programs : Examine the effectiveness of vulnerability management programs of organizations when utilizing third party vendors for threat intel

Reviews

Write a Review

Other Subject Questions & Answers

  Discuss the impacts of the psychological make-up

Discuss the impacts of the psychological make-up of offenders has on the functional responsibilities of incarceration facilities.

  Historical african diaspora to the contemporary caribbean

Contrast The Historical African Diaspora To The Contemporary Caribbean One. What Patterns Are Formed By These Two Distinct Population Movements?

  How to increase positive patient outcomes

Patient-centered care has been shown to increase positive patient outcomes. Provide an example of how collaboration within the health care team has enhanced.

  What does persistence mean to you

JuJu the bulldog is very persistent. What does persistence mean to you? Where's an area in your life where you have been persistent and it has paid off?

  Distinction between individual and collective right models

What is the distinction between the individual and collective rights models of the Second Amendment?

  Which arguments against capital punishment are the strongest

Second, consider the case against capital punishment. In your view, which arguments against capital punishment are the strongest?

  Briefly describe the nature of the study

Briefly describe the nature of the study (problem, research questions or hypotheses, research methodology, procedures, participants, and results)

  What is a homicide

What is a homicide? Identify the basic principles involved in the initiation of homicide investigation

  What is threat modeling

Discuss the following: What is threat modeling? Search the Internet and provide an example of how organizations use stride in threat modeling

  What did erikson believe in psychology

What did Erikson believe in psychology? Was it a 3 month old baby should cry it out OR social interactions were important in development OR most cognitive development was internal OR sexual development was the defining factor in people's lives

  Does the act infringe on civil rights and liberties

utilize the Constitution, established case law, and scholarly sources to evaluate the Patriot Act. Does the Act infringe on civil rights and liberties?

  What is the yield to maturity of a corporate bond

What is the yield to maturity of a corporate bond with 10 years to maturity, a coupon rate of 6% per year,

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd