fbpx
Wikipedia

Learning rule

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment.[1] A learning rule may accept existing conditions (weights and biases) of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias.[2] Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

The learning rule is one of the factors which decides how fast or how accurately the artificial network can be developed. Depending upon the process to develop the network there are three main models of machine learning:

  1. Unsupervised learning
  2. Supervised learning
  3. Reinforcement learning

Background edit

A lot of the learning methods in machine learning work similar to each other, and are based on each other, which makes it difficult to classify them in clear categories. But they can be broadly understood in 4 categories of learning methods, though these categories don't have clear boundaries and they tend to belong to multiple categories of learning methods[3] -

  1. Hebbian - Neocognitron, Brain-state-in-a-box[4]
  2. Gradient Descent - ADALINE, Hopfield Network, Recurrent Neural Network
  3. Competitive - Learning Vector Quantisation, Self-Organising Feature Map, Adaptive Resonance Theory
  4. Stochastic - Boltzmann Machine, Cauchy Machine


It is to be noted that though these learning rules might appear to be based on similar ideas, they do have subtle differences, as they are a generalisation or application over the previous rule, and hence it makes sense to study them separately based on their origins and intents.

Hebbian Learning edit

Developed by Donald Hebb in 1949 to describe biological neuron firing. In the mid-1950s it was also applied to computer simulations of neural networks.

 

Where   represents the learning rate,   represents the input of neuron i, and y is the output of the neuron. It has been shown that Hebb's rule in its basic form is unstable. Oja's Rule, BCM Theory are other learning rules built on top of or alongside Hebb's Rule in the study of biological neurons.

Perceptron Learning Rule (PLR) edit

The perceptron learning rule originates from the Hebbian assumption, and was used by Frank Rosenblatt in his perceptron in 1958. The net is passed to the activation (transfer) function and the function's output is used for adjusting the weights. The learning signal is the difference between the desired response and the actual response of a neuron. The step function is often used as an activation function, and the outputs are generally restricted to -1, 0, or 1.

The weights are updated with

  where "t" is the target value and "o" is the output of the perceptron, and   is called the learning rate.

The algorithm converges to the correct classification if: [5]

  • the training data is linearly separable*
  •   is sufficiently small (though smaller   generally means a longer learning time and more epochs)

*It should also be noted that a single layer perceptron with this learning rule is incapable of working on linearly non-separable inputs, and hence the XOR problem cannot be solved using this rule alone[6]

Backpropagation edit

Seppo Linnainmaa in 1970 is said to have developed the Backpropagation Algorithm[7] but the origins of the algorithm go back to the 1960s with many contributors. It is a generalisation of the least mean squares algorithm in the linear perceptron and the Delta Learning Rule.

It implements gradient descent search through the space possible network weights, iteratively reducing the error, between the target values and the network outputs.

Widrow-Hoff Learning (Delta Learning Rule) edit

Similar to the perceptron learning rule but with different origin. It was developed for use in the ADALAINE network, which differs from the Perceptron mainly in terms of the training. The weights are adjusted according to the weighted sum of the inputs (the net), whereas in perceptron the sign of the weighted sum was useful for determining the output as the threshold was set to 0, -1, or +1. This makes ADALINE different from the normal perceptron.

Delta rule (DR) is similar to the Perceptron Learning Rule (PLR), with some differences:

  1. Error (δ) in DR is not restricted to having values of 0, 1, or -1 (as in PLR), but may have any value
  2. DR can be derived for any differentiable output/activation function f, whereas in PLR only works for threshold output function

Sometimes only when the Widrow-Hoff is applied to binary targets specifically, it is referred to as Delta Rule, but the terms seem to be used often interchangeably. The delta rule is considered to a special case of the back-propagation algorithm.

Delta rule also closely resembles the Rescorla-Wagner model under which Pavlovian conditioning occurs.[8]

Competitive Learning edit

Competitive learning is considered a variant of Hebbian learning, but it is special enough to be discussed separately. Competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data.

Models and algorithms based on the principle of competitive learning include vector quantization and self-organizing maps (Kohonen maps).

See also edit

References edit

  1. ^ Simon Haykin (16 July 1998). "Chapter 2: Learning Processes". Neural Networks: A comprehensive foundation (2nd ed.). Prentice Hall. pp. 50–104. ISBN 978-8178083001. Retrieved 2 May 2012.
  2. ^ S Russell, P Norvig (1995). "Chapter 18: Learning from Examples". Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall. pp. 693–859. ISBN 0-13-103805-2. Retrieved 20 Nov 2013.
  3. ^ Rajasekaran, Sundaramoorthy. (2003). Neural networks, fuzzy logic, and genetic algorithms : synthesis and applications. Pai, G. A. Vijayalakshmi. (Eastern economy ed.). New Delhi: Prentice-Hall of India. ISBN 81-203-2186-3. OCLC 56960832.
  4. ^ Golden, Richard M. (1986-03-01). "The "Brain-State-in-a-Box" neural model is a gradient descent algorithm". Journal of Mathematical Psychology. 30 (1): 73–80. doi:10.1016/0022-2496(86)90043-X. ISSN 0022-2496.
  5. ^ Sivanandam, S. N. (2007). Principles of soft computing. Deepa, S. N. (1st ed.). New Delhi: Wiley India. ISBN 978-81-265-1075-7. OCLC 760996382.
  6. ^ Minsky, Marvin, 1927-2016. (1969). Perceptrons; an introduction to computational geometry. Papert, Seymour. Cambridge, Mass.: MIT Press. ISBN 0-262-13043-2. OCLC 5034.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  7. ^ Schmidhuber, Juergen (January 2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
  8. ^ Rescorla, Robert (2008-03-31). "Rescorla-Wagner model". Scholarpedia. 3 (3): 2237. Bibcode:2008SchpJ...3.2237R. doi:10.4249/scholarpedia.2237. ISSN 1941-6016.

learning, rule, artificial, neural, network, learning, rule, learning, process, method, mathematical, logic, algorithm, which, improves, network, performance, training, time, usually, this, rule, applied, repeatedly, over, network, done, updating, weights, bia. An artificial neural network s learning rule or learning process is a method mathematical logic or algorithm which improves the network s performance and or training time Usually this rule is applied repeatedly over the network It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment 1 A learning rule may accept existing conditions weights and biases of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias 2 Depending on the complexity of actual model being simulated the learning rule of the network can be as simple as an XOR gate or mean squared error or as complex as the result of a system of differential equations The learning rule is one of the factors which decides how fast or how accurately the artificial network can be developed Depending upon the process to develop the network there are three main models of machine learning Unsupervised learning Supervised learning Reinforcement learning Contents 1 Background 1 1 Hebbian Learning 1 1 1 Perceptron Learning Rule PLR 1 2 Backpropagation 1 2 1 Widrow Hoff Learning Delta Learning Rule 1 3 Competitive Learning 2 See also 3 ReferencesBackground editA lot of the learning methods in machine learning work similar to each other and are based on each other which makes it difficult to classify them in clear categories But they can be broadly understood in 4 categories of learning methods though these categories don t have clear boundaries and they tend to belong to multiple categories of learning methods 3 Hebbian Neocognitron Brain state in a box 4 Gradient Descent ADALINE Hopfield Network Recurrent Neural Network Competitive Learning Vector Quantisation Self Organising Feature Map Adaptive Resonance Theory Stochastic Boltzmann Machine Cauchy Machine It is to be noted that though these learning rules might appear to be based on similar ideas they do have subtle differences as they are a generalisation or application over the previous rule and hence it makes sense to study them separately based on their origins and intents Hebbian Learning edit Developed by Donald Hebb in 1949 to describe biological neuron firing In the mid 1950s it was also applied to computer simulations of neural networks D w i h x i y displaystyle Delta w i eta x i y nbsp Where h displaystyle eta nbsp represents the learning rate x i displaystyle x i nbsp represents the input of neuron i and y is the output of the neuron It has been shown that Hebb s rule in its basic form is unstable Oja s Rule BCM Theory are other learning rules built on top of or alongside Hebb s Rule in the study of biological neurons Perceptron Learning Rule PLR edit The perceptron learning rule originates from the Hebbian assumption and was used by Frank Rosenblatt in his perceptron in 1958 The net is passed to the activation transfer function and the function s output is used for adjusting the weights The learning signal is the difference between the desired response and the actual response of a neuron The step function is often used as an activation function and the outputs are generally restricted to 1 0 or 1 The weights are updated withw new w old h t o x i displaystyle w text new w text old eta t o x i nbsp where t is the target value and o is the output of the perceptron and h displaystyle eta nbsp is called the learning rate The algorithm converges to the correct classification if 5 the training data is linearly separable h displaystyle eta nbsp is sufficiently small though smaller h displaystyle eta nbsp generally means a longer learning time and more epochs It should also be noted that a single layer perceptron with this learning rule is incapable of working on linearly non separable inputs and hence the XOR problem cannot be solved using this rule alone 6 Backpropagation edit Seppo Linnainmaa in 1970 is said to have developed the Backpropagation Algorithm 7 but the origins of the algorithm go back to the 1960s with many contributors It is a generalisation of the least mean squares algorithm in the linear perceptron and the Delta Learning Rule It implements gradient descent search through the space possible network weights iteratively reducing the error between the target values and the network outputs Widrow Hoff Learning Delta Learning Rule edit Main article Delta rule Similar to the perceptron learning rule but with different origin It was developed for use in the ADALAINE network which differs from the Perceptron mainly in terms of the training The weights are adjusted according to the weighted sum of the inputs the net whereas in perceptron the sign of the weighted sum was useful for determining the output as the threshold was set to 0 1 or 1 This makes ADALINE different from the normal perceptron Delta rule DR is similar to the Perceptron Learning Rule PLR with some differences Error d in DR is not restricted to having values of 0 1 or 1 as in PLR but may have any value DR can be derived for any differentiable output activation function f whereas in PLR only works for threshold output function Sometimes only when the Widrow Hoff is applied to binary targets specifically it is referred to as Delta Rule but the terms seem to be used often interchangeably The delta rule is considered to a special case of the back propagation algorithm Delta rule also closely resembles the Rescorla Wagner model under which Pavlovian conditioning occurs 8 Competitive Learning edit Competitive learning is considered a variant of Hebbian learning but it is special enough to be discussed separately Competitive learning works by increasing the specialization of each node in the network It is well suited to finding clusters within data Models and algorithms based on the principle of competitive learning include vector quantization and self organizing maps Kohonen maps See also editMachine learning Decision tree learning Pattern recognition Bias variance dilemma Bias of an estimator Expectation maximization algorithmReferences edit Simon Haykin 16 July 1998 Chapter 2 Learning Processes Neural Networks A comprehensive foundation 2nd ed Prentice Hall pp 50 104 ISBN 978 8178083001 Retrieved 2 May 2012 S Russell P Norvig 1995 Chapter 18 Learning from Examples Artificial Intelligence A Modern Approach 3rd ed Prentice Hall pp 693 859 ISBN 0 13 103805 2 Retrieved 20 Nov 2013 Rajasekaran Sundaramoorthy 2003 Neural networks fuzzy logic and genetic algorithms synthesis and applications Pai G A Vijayalakshmi Eastern economy ed New Delhi Prentice Hall of India ISBN 81 203 2186 3 OCLC 56960832 Golden Richard M 1986 03 01 The Brain State in a Box neural model is a gradient descent algorithm Journal of Mathematical Psychology 30 1 73 80 doi 10 1016 0022 2496 86 90043 X ISSN 0022 2496 Sivanandam S N 2007 Principles of soft computing Deepa S N 1st ed New Delhi Wiley India ISBN 978 81 265 1075 7 OCLC 760996382 Minsky Marvin 1927 2016 1969 Perceptrons an introduction to computational geometry Papert Seymour Cambridge Mass MIT Press ISBN 0 262 13043 2 OCLC 5034 a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link CS1 maint numeric names authors list link Schmidhuber Juergen January 2015 Deep Learning in Neural Networks An Overview Neural Networks 61 85 117 arXiv 1404 7828 doi 10 1016 j neunet 2014 09 003 PMID 25462637 S2CID 11715509 Rescorla Robert 2008 03 31 Rescorla Wagner model Scholarpedia 3 3 2237 Bibcode 2008SchpJ 3 2237R doi 10 4249 scholarpedia 2237 ISSN 1941 6016 Retrieved from https en wikipedia org w index php title Learning rule amp oldid 1183361867, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.