Avoiding local minima of multi-layered networks, Computer Engineering

Assignment Help:

Avoiding Local Minima of multi-layered networks-Artificial intelligence :

The error rate of multi-layered networks over a training set could be calculated as the number of mis-classified instance. However, remembering that there are various output nodes, all of which could potentially misfire (for example, giving a value near to 1 when it would have output 0, and vice- versa), we may be more sophisticated in our error evaluation. In practice the whole network error is calculated as:

78_Avoiding Local Minima.png

This is not as complexes as it first seem. Simply The calculation involves working out the difference between the observed output for each output unit and the target output and squaring this to make sure it is +ve, then adding all these squared differences for each output unit and for each example.

Back propagation may be seen as utilizing finding a space of network configurations (weights) in order to find a configuration with the least error calculated in the above fashion. The more complexes network structure means that the error surface which is finding may have local minima, and it is a problem for multi-layer networks, and we look at ways around it below.  Even if a learned network is in a local minima, yet it can perform sufficiently, and multi-layer networks have been used to great effect in real world situations (see Tom Mitchell's book for a description of an ANN which can drive a car!)

One way solve the problem of local minima is to use random re-start as discussed in the chapter on search techniques. Different first random weightings for the network  can mean that it converges to different local minima, and the best of these may be taken for the learned ANN. otherwise, as described in Mitchell's book, a "committee" of networks could be learned with the (possibly weighted) average of their decisions taken as an overall decision for a given test example. Another option is to try and skip over some of the smaller local minima, as explained below.

Adding Momentum

Assume a ball rolling down a hill. As it does so, it achieves momentum, so that its speed increases high and it becomes harder to stop. As it rolls down the hill towards the valley floor (the global minimum), it may occasionally wander into local hollows. However, it can be that the momentum it has obtained keeps it rolling up and out of the hollow and back on track to the valley floor.

The crude analogy discussed one heuristic technique for avoiding local minima, called adding momentum, funnily sufficient The method is simple: for each weight remember the previous value of Δ which was added on to the weight in the final epoch. While updating that weight for the current epoch, add on a little of the previous Δ. How little to make the additional extra is controlled by a parameter α called the momentum, which is put to a value between 0 and 1.

To see why this must help bypass local minima, note that if the weight change carries on in the direction it was going in the previous epoch, then the movement shall be a little more pronounced in the current  epoch. This effect will be compounded as the search continues in the similar direction. Finally when the trend reverses, then the search might be at the global minimum, in which case it is chanced that the momentum would not be adequate to take it anywhere other than where it is. On the other hand, the search may be at a fairly narrow local minimum. In this case, even though the back propagation algorithm dictates that Δ will change direction, it might be that the additional extra from the previous epoch (the momentum) can be sufficient to counteract this effect for a few steps. These few steps can be all that is needed to bypass the local minimum.

In addition to getting over some local minima, when the gradient is constant in 1 direction, adding momentum will increase the size of the weight change after each epoch, and the network might converge quicker. Notice that it is possible to have cases where (a) the momentum is not adequate to carry the search out of a local minima or (b) the momentum carries the find out of the global minima into a local minima. This is why this technique is a heuristic method and should be used somewhat carefully (it is used in practice a great deal).


Related Discussions:- Avoiding local minima of multi-layered networks

Determine number of final selector in 10000 line exchange, In a 10000 line ...

In a 10000 line exchange, 0000 to 2999 is allotted to x group of subscribers, out of which 40% are active during busy hour. The remaining numbers are domestic numbers out of which

Compare excess 3 codes and gray code, Compare excess 3 codes and gray cod...

Compare excess 3 codes and gray code. Ans. Excess 3 Codes 1. This is the other form of BCD code. All decimal digits are coded in 4 bit binary code. 2. The code

Explain the accuracy of an ADC, Explain the Accuracy o f an ADC. Ans...

Explain the Accuracy o f an ADC. Ans Accuracy- The accuracy of D to A converter is the difference among actual output voltage and the expected output voltage in D to A c

General principles of pruning, General principles of pruning: The gene...

General principles of pruning: The general principles are such that: 1. Given a node N that can be chosen by player one, thus if there is another node, X, along any path,

Target_parent, TARGET = "_parent" "_parent" is used in a situation whe...

TARGET = "_parent" "_parent" is used in a situation where a frameset file is nested inside another frameset file. A link in one of the inner frameset documents that uses "_par

Visual basic application, Name the platforms by which visual basic applicat...

Name the platforms by which visual basic applications are available? Ans) Most of the visual basic applications are available on 32 bit Intel platforms. These applications also

What is mini frame size where propagation speed is 200 m µs, A CSMA/CD bus ...

A CSMA/CD bus spans a distance of 1.5 Km. If data is 5 Mbps, What is minimum frame size where propagation speed in LAN cable is 200 m µs. Usual propagation speed in LAN cables

Routines which handle dynamic processes, Q. Routines which handle dynamic p...

Q. Routines which handle dynamic processes? number of routines which handle dynamic processes:  int pvm_joingroup( char *group ) Enrolls calling process in a na

What is collation, What is Collation Collation refers to a set of rules...

What is Collation Collation refers to a set of rules that verify how data is sorted and compared. Character data is sorted using rules that explain  the correct character se

What are the steps in executing the program, What are the steps in executin...

What are the steps in executing the program? 1.Fetch 2.Decode 3.Execute 4.Store

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd