Normalize the training and testing data

Assignment Help Other Subject
Reference no: EM132108691

Purpose

In this assessment, you need to demonstrate your skills for applying regularized logistic regression to perform two-class and multi-class classification for real-world tasks. You also need to demonstrate your skill in recognizing under-fitting/overfitting situations

Instructions

This is group assessment task. Students will be required to analyse a given real-world scenario and contribute to the classifier design.

The group response to problem solution should not exceed 30 pages. Students will be required to consolidate their individual solutions and propose best solution that evidences each group member's contribution along with a rationale for the group's response to solving the problem.

Task A - Binary Classification

For this problem, we will use a subset of here. Note that this dataset has some information missing.

1.1 Data Munging

Cleaning the data is essential when dealing with real world problems. Training and testing data is stored in "data/wisconsin_data" folder. You have to perform the following:

- Read the training and testing data. Print the number of features in the dataset.

- For the data label, print the total number of 1's and 0's in the training and testing data. Comment on the class distribution. Is it balanced or unbalanced?

- Print the number of features with missing entries.

- Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries.

- Normalize the training and testing data.

1.2 Logistic Regression Train logistic regression models with L1 regularization and L2 regularization using alpha = 0.1

and lambda = 0.1. Report accuracy, precision, recall, f1-score and print the confusion matrix.
1.3 Choosing the best hyper-parameter
For L1 model, choose the best alpha value from the following set:

{0.1,1,3,10,33,100,333,1000, 3333, 10000, 33333}.

For L2 model, choose the best lambda value from the following set:

{0.001, 0.003, 0.01, 0.03, 0.1,0.3,1,3,10,33}.

To choose the best hyperparameter (alpha/lambda) value, you have to do the following:

- For each value of hyperparameter, perform 100 random splits of training data into training and validation data.

- Find the average validation accuracy for each 100 train/validate pairs. The best hyperparameter will be the one that gives maximum validation accuracy. Use the best alpha and lambda parameter to re-train your final L1 and L2 regularized model. Evaluate the prediction performance on the test data and report the following:

- Precision

- Accuracy

- The top 5 features selected in decreasing order of feature weights.

- Confusion matrix

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning.

Task B Multiclass Classification

For this experiment, we will use a small subset of MNIST dataset for handwritten digits. This dataset has no missing data. You will have to implement one-versus-rest scheme to perform multi-class classification using a binary classifier based on L1 regularized logistic regression.

2.1 Read and understand the data, create a default One-vs-Rest Classifier
1- Use the data from the file reduced_mnist.csv in the data directory. Begin by reading the data. Print the following information:

- Number of data points

- Total number of features

- Unique labels in the data

2- Split the data into 70% training data and 30% test data. Fit a One-vs-Rest Classifier (which uses Logistic regression classifier with alpha=1) on training data, and report accuracy, precision, recall on testing data.

2.2 Choosing the best hyper-parameter

1- As in section 1.3 above, now create 10 random splits of training data into training and validation data. Choose the best value of alpha from the following set: {0.1, 1, 3, 10, 33, 100, 333, 1000, 3333, 10000, 33333}. To choose the best alpha hyperparameter value, you have to do the following:

- For each value of hyperparameter, perform 10 random splits of training data into training and validation data as said above.

- For each value of hyperparameter, use its 10 random splits and find the average training and validation accuracy.

- On a graph, plot both the average training accuracy (in red) and average validation accuracy (in blue) w.r.t. each hyperparameter setting. Comment on this graph by identifying regions of overfitting and underfitting.

- Print the best value of alpha hyperparameter.

2- Evaluate the prediction performance on test data and report the following:

- Total number of non-zero features in the final model.

- The confusion matrix

- Precision, recall and accuracy for each class.

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning

Attachment:- Machine learning.zip

Reference no: EM132108691

Questions Cloud

Discuss about the organizational stressors : Can you discuss these topics in a meaningful way either in a casual conversation or in a more formal interview setting?
Substance taken into body may have significant effects : Any substance taken into the body may have significant effects. Analyze whether this early substance use has a disproportionate impact down the road.
Principles of management to the context of diversity : Considering the demographic trends of the United States and the global workforce, apply the most important principles of management
After a workplace project was completed : After a workplace project was completed, you were rightfully upset. You and two other team members did all of the work
Normalize the training and testing data : Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries
Personal patient data-laboratory data and financial data : Sensitive information in a physician's office setting includes: personal patient data, laboratory data, and financial data.
Why employee well being would be a crucial factor : Having conducted a job analysis for commercial pilots and examined their training and development needs, you now need to examine other factors that may affect.
What are the purposes of specifications : What are the purposes of specifications; what common problems are encountered in developing specifications, and how can specifications limit competition?
Suggest two strategic marketing recommendations : When conducting a review on any business, the first thing that needs to completed is a SWOT Analysis (strengths, weaknesses, opportunities and threats).

Reviews

len2108691

9/9/2018 11:18:48 PM

Criteria 2: 7 marks 5 marks 4 marks 0 mark • Create 10 random splits of training data into training and validation data. For L1 Successfully Successfully Successfully Failed to complete any model, choose the best alpha value from the provide set of values. completed all three completed any two of completed any one of given task. • Evaluate the prediction performance on test data and report requested results. tasks. the three tasks. the three tasks. • Discuss if there is any sign of underfitting or overfitting with appropriate reasoning

len2108691

9/9/2018 11:18:42 PM

PART 2 Excellent Good Fair Unsatisfactory Criteria 1: 3 marks 2 marks 1 mark 0 mark • Read and report requested properties of the provided data set. Successfully Successfully Successfully Failed to complete any • Split the data into 70% training data and 30% test data. Fit a One-vs-Rest Classifier. completed all three completed any two of completed any one of given task. tasks. the three tasks. the three tasks.

len2108691

9/9/2018 11:18:33 PM

Criteria 3: 5 marks 3 marks 2 marks 0 mark • For L1 model, choose the best alpha value from the provide set of values. Successfully Successfully Successfully Failed to complete any • For L2 model, choose the best lambda value from the provided set of values. completed all three completed any two of completed only one of given task. • Evaluate the prediction performance on test data, report results and discuss if there tasks. the three tasks. the three tasks. is any sign of underfitting or overfitting with appropriate reasoning.

len2108691

9/9/2018 11:18:23 PM

Criteria 2: 5 marks 3 marks 2 marks 0 mark • Train logistic regression model with L1 regularization using alpha = 0.1. Successfully Successfully Successfully Failed to complete any • Train logistic regression model with L2 regularization using lambda = 0.1. completed all three completed any two of completed any one of given task. • Report accuracy, precision, recall, f1-score and print the confusion matrix. tasks. the three tasks. the three tasks.

len2108691

9/9/2018 11:17:58 PM

Criteria Excellent Good Fair Unsatisfactory PART 1 Criteria 1: 3 marks 2 marks 1 mark 0 mark • Read the training and testing data. Print the number of features in the dataset. Successfully Successfully Successfully Failed to complete any • For the data label, print the total number of 1's and 0's in the training and testing completed all four completed at least 2 completed only one task satisfactorily. data. Comment on the class distribution. Is it balanced or unbalanced? tasks. tasks and satisfactorily task. • Print the number of features with missing entries. tried other tasks. • Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries. • Normalize the training and testing data.

len2108691

9/9/2018 11:17:41 PM

This document supplies detailed information on assessment tasks for this unit. Key information • Due: 5th by 11.30pm AEST • Weighting: 25% • Word count: Max 30 pages Learning Outcomes This assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate Learning Outcomes (GLO): Unit Learning Outcome (ULO) Graduate Learning Outcome (GLO) ULO 2: Work collaboratively and apply linear and GLO 1: Discipline knowledge and capabilities logistic regression, and linear Support Vector GLO 4: Critical thinking Machines for designing accurate classifier. GLO 5: Problem solving ULO 5: Implement model selection and compute GLO 1: Discipline knowledge and capabilities relevant evaluation measure for a given problem. GLO 4: Critical thinking

Write a Review

Other Subject Questions & Answers

  Respond to the given posts

Respond to EACH post (3 total) MINIMUM 150 words each and using at least TWO reference sources EACH (not the same ones for each).Write whether or not you agree and why. How informative the post was, etc

  Create a resume detailing your license and earned degree

Create a resume detailing your license(s), earned degree(s), certification(s), professional experiences, previous positions held, membership in professional organizations, publications, and skills.

  What three features accompany opinions

What three features accompany opinions included in objective descriptions?

  Describe the four components of learning

Describe the four components of learning as they appear in each approach. Concentrate on the different areas that each approach emphasizes or deems.

  What remedies does he have for the faulty heater

Steve injured his leg, so he decided to return to his room. The heater was not working (and it was in the middle of winter).

  Organization recruit for new talent and qualified personnel

How does your organization recruit for new talent and qualified personnel, How would you describe the human resources department

  Competitive analysis of the airline industry

Deregulation in 1978 totally restructured and revamped the airlineindustry. Before the industry deregulated, the airline wereassigned routes and fares and healthy profits for all competitorswere practically guaranteed.But with deregulation came the f..

  What are the differences between these technologies

Give a specific example of when a police officer might use a tactical information technology and what that technology would be

  What is the age of the unit in the middle of the syncline

What is the age of the unit in the middle of the syncline? What type of fault is the Williams Range-Elkhorn Fault? Are there any Paleozoic units on this map?

  Examine your description of the challenges of leadership

Examine your description of the challenges of leadership. How well equipped are you at the end of this course to handle those challenges? Have you achieved your goals for this course?

  Discuss ways to proactively avoid or prevent negligence

Professional Liability-select an allied health (nonphysician) profession; this can be your own profession or another profession that interests you. Some examples of allied health professions include physical therapy, respiratory therapy, pharmacy,..

  Which position will you support- gordons or beverlys

Gordon is reluctant to change because Bright-Lite has developed an excellent reputation for meeting emergency requests for inventory. As Bright-Lite's president, which position will you support: Gordon's or Beverly's? Explain.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd