Experiment with various neural network parameters

Assignment Help Other Subject
Reference no: EM131746067

The point: this coursework is designed to give you experience with, and hence more understanding of:

- Overfitting: finding a classifier that does very well on your training data doesn't mean it will do well on unseen (test) data.
- The relationship between overfitting and complexity of the classifier - the more degrees of freedom in your classifier, the more chances it has to overfit the training data.
- The relationship between overfitting and the size of the training set.
- Bespoke machine learning: you don't have to just use one of the standard types of classifier - the ‘client' may specifically want a certain type of classifier (here, a ruleset that works in a certain way), and you can develop algorithms that try to find the best possible such classifier.

Students wishing to complete the below tasks in other languages, such as R, Matlab, Python are welcome to do so, assuming they have prior knowledge of these languages.

In the below task spec, the assumption is made that the majority of the class uses Weka. Please adapt the below instructions accordingly if you use a different programming language.

1. Convert the above files into arff format and load them to Weka.

Dealing with big data sets: in CW2, you were given several options how to deal with large data sets in Weka (increasing heap size for Weka GUI, using Weka command line with increased heap, wrapping Weka command line within scripts that automate the experiments, or just reducing the size of the data set using Weka methods of randomization and attribute selections). You will have to make one such decision for this coursework, too.

2. Create folders on your computer to store classifiers, screenshots and results of all your experiments, as explained below.

Your coursework will consist of two parts - in Part-1 you will work with Decision trees and in Part -2 - with Linear Classifiers and Neural Networks.

For each of the two parts, you will do the following:

3. Using the provided data sets, and Weka's facility for 10-fold cross validation, run the classifier, and note its accuracy for varying learning parameters provided by Weka. (Below you will find more instructions on those.) Record all your findings and explain them. Make sure you understand and can explain logically the meaning of the confusion matrix, as well as the information contained in the "Detailed Accuracy" field: TP Rate, FP Rate, Precision, Recall, F Measure, ROC Area.

4. Use Visualization tools to analyze and understand the results: Weka has comprehensive tools for visualization of, and manipulation with, Decision trees and Neural Networks.

5. Repeat steps 3 and 4, this time using testing data set instead of Weka's cross validation.

6. Make new training and testing sets, by moving 3000 of the instances in the testing set into the training set. Then, repeat steps 3 and 4.

7. Make new training and testing sets again, this time enlarging the training set with 6000 instances from the testing set, and again repeat steps 3 and 4.

8. Analyse your results from the point of view of the problem of classifier over-fitting.

Part 1. Decision tree learning.

In this part, you are asked to explore the following three decision tree algorithms implemented in Weka
1. J48 Algorithm
2. User Classifier (This option allows you to construct decision trees semi-manually)
3. One other Decision tree algorithm.
You should compare their relative performance on the given data set. For this:
- Experiment with various decision tree parameters: binary splits or multiple branching, prunning, confidence threshold for pruning, and the minimal number of instances permissible per leaf.
- Experiment with their relative performance based on the output of confusion matrices as well as other metrics (TP Rate, FP Rate, Precision, Recall, F Measure, ROC Area). Note that different algorithms can perform differently on various metrics. Does it happen in your experiments? - Discuss.
- When working with User Classifier, you will learn to work with both Data and Tree Visualizers in Weka. Please reduce the number of attributes as in CW2 to prototype more efficiently in Visualizers.
- Record all the above results by going through the steps 3-8.

Part 2. Neural Networks.

In this part, you will work with the MultilayerPerceptron algorithm in Weka.

- Run MultilayerPerceptron. Experiment with various Neural Network parameters: add or remove

nodes, layers and connections, vary the learning rate, epochs and momentum, and validation threshold.
- You will need to work with Weka's Neural Network Visualiser in order to perform some of the above tasks. You are allowed to use smaller data sets when working with the Visualiser.
- Experiment with relative performance of Neural Networks and changing parameters. Base your comparative study on the output of confusion matrices as well as other metrics (TP Rate, FP Rate, Precision, Recall, F Measure, ROC Area).
- Record all the above results by going through the steps 3-8.

9. Deep Learning and Deep Neural Networks have gained popularity recently. Do some research (using the www and the recommended textbook) to find out more about Deep Learning. Use algorithms and tools available in Weka or on-line. Write a one page essay comparing Neural Networks and Deep Neural Networks.

Reference no: EM131746067

Questions Cloud

Define absorption and variable costing : Define absorption and variable costing and discuss its effects on production. Provide 3-4 examples that support your research and effects of the costing method.
Demonstrate mastery of the program : Describe the topic for your literature review and why you chose this topic. Explain why you think it is important.
What part of the fugitive slave act : What part of the Fugitive Slave Act of 1850 could many Northerners interpret as a direct infringement of their First Amendment rights? Quote from the act
What is employment taxes and what is payroll taxes : What is Employment Taxes? and what is payroll Taxes? and what is deferent between them? How much is my standard deduction? What is my filing status?
Experiment with various neural network parameters : F21DL - Data Mining and Machine Learning - Experiment with various Neural Network parameters - Deep Neural Networks have gained popularity
What was the purpose of the neutrality acts : What was the purpose of the Neutrality Acts? Why were they challenging to maintain?
Lack of research and development of virtual reality devices : The average firm size is also very small because there isn’t enough motivation due to the lack of research and development of virtual reality devices
Define a beginning balance in the retained earnings : assume a beginning balance in the Retained Earnings account of $366,000. Also assume that $345,500 of dividends were declared.
How the use of the lifo method prepared in compliance : How the use of the LIFO method to value its inventories will be impacted if a switch to financial statements prepared in compliance with IFRS will be made.

Reviews

len1746067

12/1/2017 4:34:29 AM

1. Convert the above files into arff format and load them to Weka. Dealing with big data sets: in CW2, you were given several options how to deal with large data sets in Weka (increasing heap size for Weka GUI, using Weka command line with increased heap, wrapping Weka command line within scripts that automate the experiments, or just reducing the size of the data set using Weka methods of randomization and attribute selections). You will have to make one such decision for this coursework, too. 2. Create folders on your computer to store classifiers, screenshots and results of all your experiments, as explained below.

len1746067

12/1/2017 4:33:58 AM

The point: this coursework is designed to give you experience with, and hence more understanding of: • Overfitting: finding a classifier that does very well on your training data doesn’t mean it will do well on unseen (test) data. • The relationship between overfitting and complexity of the classifier – the more degrees of freedom in your classifier, the more chances it has to overfit the training data. • The relationship between overfitting and the size of the training set. • Bespoke machine learning: you don’t have to just use one of the standard types of classifier – the ‘client’ may specifically want a certain type of classifier (here, a ruleset that works in a certain way), and you can develop algorithms that try to find the best possible such classifier.

len1746067

12/1/2017 4:33:37 AM

2. “Variation in performance with change in the learning paradigm (Decision trees versus Neural Nets)” 3. “Variation in performance with varying learning parameters in Decision Trees” 4. “Variation in performance with varying learning parameters in Neural Networks” 5. “Variation in performance according to different metrics (TP Rate, FP Rate, Precision, Recall, F Measure, ROC Area)” 6. (Level 11 students) ``Comparative analysis of Neural Networks and Deep Neural Networks” In each of these sections you will speculate on the reasons that might underpin the performance variations that you see, considering general issues and also issues pertaining to this specific task. You are recommended to represent all your results in one or two big tables – to which you will refer from these five specific sections.

len1746067

12/1/2017 4:33:31 AM

You will submit: (a) All sources with the evidence of conducted experiments: data sets, scripts, tables comparing the accuracy, screenshots, etc. Give a web link to them (github, bitbucket, Dropbox, own webpage…). (b) A report of maximum FOUR sides of A4 (11 pt font, margins 2cm on all sides) for Honours BSc students and FIVE sides of A4 (11 pt font, margins 2cm on all sides) for MSc students. Using the results and screenshots you recorded when completing the steps 3-8, write five sections, respectively entitled: 1. “Variation in performance with size of the training and testing sets”

len1746067

12/1/2017 4:33:15 AM

You will get up to 69 points (up to B1 grade) for completing the tasks 1-9 well and thoroughly (task 9 is for level 11 only) and giving a reasonable explanation of the obtained results. In order to get an A grade (70 points and higher), you will need to do well in tasks 1-8(9) but in addition, you will need to show substantial skill in either research or programming: • Research skills: The submission must show original thinking and give thorough, logical and technical description of the results that shows mastery of the tools and methods, and understanding of the underlying problems. The student should show an ability to ask his/her own research questions based on the CW material and successfully answer them. • Programming skills: You will need to produce a sizeable piece of software produced to cover some tasks 1-8/9.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd