fbpx
Wikipedia

Mathematics of artificial neural networks

An artificial neural network (ANN) combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play. ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways.

Structure Edit

Neuron Edit

A neuron with label   receiving an input   from predecessor neurons consists of the following components:[1]

  • an activation  , the neuron's state, depending on a discrete time parameter,
  • an optional threshold  , which stays fixed unless changed by learning,
  • an activation function   that computes the new activation at a given time   from  ,   and the net input   giving rise to the relation
 
  • and an output function   computing the output from the activation
 

Often the output function is simply the identity function.

An input neuron has no predecessor but serves as input interface for the whole network. Similarly an output neuron has no successor and thus serves as output interface of the whole network.

Propagation function Edit

The propagation function computes the input   to the neuron   from the outputs  and typically has the form[1]

 

Bias Edit

A bias term can be added, changing the form to the following:[2]

  where   is a bias.

Neural networks as functions Edit

Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision)   or a distribution over   or both   and  . Sometimes models are intimately associated with a particular learning rule. A common use of the phrase "ANN model" is really the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity).

Mathematically, a neuron's network function   is defined as a composition of other functions  , that can further be decomposed into other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions. A widely used type of composition is the nonlinear weighted sum, where  , where   (commonly referred to as the activation function[3]) is some predefined function, such as the hyperbolic tangent, sigmoid function, softmax function, or rectifier function. The important characteristic of the activation function is that it provides a smooth transition as input values change, i.e. a small change in input produces a small change in output. The following refers to a collection of functions   as a vector  .

 
ANN dependency graph

This figure depicts such a decomposition of  , with dependencies between variables indicated by arrows. These can be interpreted in two ways.

The first view is the functional view: the input   is transformed into a 3-dimensional vector  , which is then transformed into a 2-dimensional vector  , which is finally transformed into  . This view is most commonly encountered in the context of optimization.

The second view is the probabilistic view: the random variable   depends upon the random variable  , which depends upon  , which depends upon the random variable  . This view is most commonly encountered in the context of graphical models.

The two views are largely equivalent. In either case, for this particular architecture, the components of individual layers are independent of each other (e.g., the components of   are independent of each other given their input  ). This naturally enables a degree of parallelism in the implementation.

 
Two separate depictions of the recurrent ANN dependency graph

Networks such as the previous one are commonly called feedforward, because their graph is a directed acyclic graph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in the manner shown at the top of the figure, where   is shown as dependent upon itself. However, an implied temporal dependence is not shown.

Backpropagation Edit

Backpropagation training algorithms fall into three categories:

Algorithm Edit

Let   be a network with   connections,   inputs and   outputs.

Below,   denote vectors in  ,   vectors in  , and   vectors in  . These are called inputs, outputs and weights, respectively.

The network corresponds to a function   which, given a weight  , maps an input   to an output  .

In supervised learning, a sequence of training examples   produces a sequence of weights   starting from some initial weight  , usually chosen at random.

These weights are computed in turn: first compute   using only   for  . The output of the algorithm is then  , giving a new function  . The computation is the same in each step, hence only the case   is described.

  is calculated from   by considering a variable weight   and applying gradient descent to the function   to find a local minimum, starting at  .

This makes   the minimizing weight found by gradient descent.

Learning pseudocode Edit

To implement the algorithm above, explicit formulas are required for the gradient of the function   where the function is  .

The learning algorithm can be divided into two phases: propagation and weight update.

Propagation Edit

Propagation involves the following steps:

  • Propagation forward through the network to generate the output value(s)
  • Calculation of the cost (error term)
  • Propagation of the output activations back through the network using the training pattern target to generate the deltas (the difference between the targeted and actual output values) of all output and hidden neurons.

Weight update Edit

For each weight:

  • Multiply the weight's output delta and input activation to find the gradient of the weight.
  • Subtract the ratio (percentage) of the weight's gradient from the weight.

The learning rate is the ratio (percentage) that influences the speed and quality of learning. The greater the ratio, the faster the neuron trains, but the lower the ratio, the more accurate the training. The sign of the gradient of a weight indicates whether the error varies directly with or inversely to the weight. Therefore, the weight must be updated in the opposite direction, "descending" the gradient.

Learning is repeated (on new batches) until the network performs adequately.

Pseudocode Edit

Pseudocode for a stochastic gradient descent algorithm for training a three-layer network (one hidden layer):

initialize network weights (often small random values) do for each training example named ex do prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute   for all weights from hidden layer to output layer // backward pass compute   for all weights from input layer to hidden layer // backward pass continued update network weights // input layer not modified by error estimate until error rate becomes acceptably low return the network 

The lines labeled "backward pass" can be implemented using the backpropagation algorithm, which calculates the gradient of the error of the network regarding the network's modifiable weights.[5]

References Edit

  1. ^ a b Zell, Andreas (2003). "chapter 5.2". Simulation neuronaler Netze [Simulation of Neural Networks] (in German) (1st ed.). Addison-Wesley. ISBN 978-3-89319-554-1. OCLC 249017987.
  2. ^ DAWSON, CHRISTIAN W (1998). "An artificial neural network approach to rainfall-runoff modelling". Hydrological Sciences Journal. 43 (1): 47–66. doi:10.1080/02626669809492102.
  3. ^ . www.cse.unsw.edu.au. Archived from the original on 2018-08-26. Retrieved 2019-08-18.
  4. ^ M. Forouzanfar; H. R. Dajani; V. Z. Groza; M. Bolic & S. Rajan (July 2010). Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation. 4th Int. Workshop Soft Computing Applications. Arad, Romania: IEEE.
  5. ^ Werbos, Paul J. (1994). The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY: John Wiley & Sons, Inc.

mathematics, artificial, neural, networks, main, article, artificial, neural, network, artificial, neural, network, combines, biological, principles, with, advanced, statistics, solve, problems, domains, such, pattern, recognition, game, play, anns, adopt, bas. Main article Artificial neural network An artificial neural network ANN combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game play ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways Contents 1 Structure 1 1 Neuron 1 2 Propagation function 1 3 Bias 2 Neural networks as functions 3 Backpropagation 3 1 Algorithm 4 Learning pseudocode 4 1 Propagation 4 2 Weight update 4 3 Pseudocode 5 ReferencesStructure EditNeuron Edit A neuron with label j displaystyle j nbsp receiving an input p j t displaystyle p j t nbsp from predecessor neurons consists of the following components 1 an activation a j t displaystyle a j t nbsp the neuron s state depending on a discrete time parameter an optional threshold 8 j displaystyle theta j nbsp which stays fixed unless changed by learning an activation function f displaystyle f nbsp that computes the new activation at a given time t 1 displaystyle t 1 nbsp from a j t displaystyle a j t nbsp 8 j displaystyle theta j nbsp and the net input p j t displaystyle p j t nbsp giving rise to the relationa j t 1 f a j t p j t 8 j displaystyle a j t 1 f a j t p j t theta j nbsp dd and an output function f out displaystyle f text out nbsp computing the output from the activationo j t f out a j t displaystyle o j t f text out a j t nbsp dd Often the output function is simply the identity function An input neuron has no predecessor but serves as input interface for the whole network Similarly an output neuron has no successor and thus serves as output interface of the whole network Propagation function Edit The propagation function computes the input p j t displaystyle p j t nbsp to the neuron j displaystyle j nbsp from the outputs o i t displaystyle o i t nbsp and typically has the form 1 p j t i o i t w i j displaystyle p j t sum i o i t w ij nbsp Bias Edit A bias term can be added changing the form to the following 2 p j t i o i t w i j w 0 j displaystyle p j t sum i o i t w ij w 0j nbsp where w 0 j displaystyle w 0j nbsp is a bias Neural networks as functions EditSee also Graphical models Neural network models can be viewed as defining a function that takes an input observation and produces an output decision f X Y displaystyle textstyle f X rightarrow Y nbsp or a distribution over X displaystyle textstyle X nbsp or both X displaystyle textstyle X nbsp and Y displaystyle textstyle Y nbsp Sometimes models are intimately associated with a particular learning rule A common use of the phrase ANN model is really the definition of a class of such functions where members of the class are obtained by varying parameters connection weights or specifics of the architecture such as the number of neurons number of layers or their connectivity Mathematically a neuron s network function f x displaystyle textstyle f x nbsp is defined as a composition of other functions g i x displaystyle textstyle g i x nbsp that can further be decomposed into other functions This can be conveniently represented as a network structure with arrows depicting the dependencies between functions A widely used type of composition is the nonlinear weighted sum where f x K i w i g i x displaystyle textstyle f x K left sum i w i g i x right nbsp where K displaystyle textstyle K nbsp commonly referred to as the activation function 3 is some predefined function such as the hyperbolic tangent sigmoid function softmax function or rectifier function The important characteristic of the activation function is that it provides a smooth transition as input values change i e a small change in input produces a small change in output The following refers to a collection of functions g i displaystyle textstyle g i nbsp as a vector g g 1 g 2 g n displaystyle textstyle g g 1 g 2 ldots g n nbsp nbsp ANN dependency graphThis figure depicts such a decomposition of f displaystyle textstyle f nbsp with dependencies between variables indicated by arrows These can be interpreted in two ways The first view is the functional view the input x displaystyle textstyle x nbsp is transformed into a 3 dimensional vector h displaystyle textstyle h nbsp which is then transformed into a 2 dimensional vector g displaystyle textstyle g nbsp which is finally transformed into f displaystyle textstyle f nbsp This view is most commonly encountered in the context of optimization The second view is the probabilistic view the random variable F f G displaystyle textstyle F f G nbsp depends upon the random variable G g H displaystyle textstyle G g H nbsp which depends upon H h X displaystyle textstyle H h X nbsp which depends upon the random variable X displaystyle textstyle X nbsp This view is most commonly encountered in the context of graphical models The two views are largely equivalent In either case for this particular architecture the components of individual layers are independent of each other e g the components of g displaystyle textstyle g nbsp are independent of each other given their input h displaystyle textstyle h nbsp This naturally enables a degree of parallelism in the implementation nbsp Two separate depictions of the recurrent ANN dependency graphNetworks such as the previous one are commonly called feedforward because their graph is a directed acyclic graph Networks with cycles are commonly called recurrent Such networks are commonly depicted in the manner shown at the top of the figure where f displaystyle textstyle f nbsp is shown as dependent upon itself However an implied temporal dependence is not shown Backpropagation EditBackpropagation training algorithms fall into three categories steepest descent with variable learning rate and momentum resilient backpropagation quasi Newton Broyden Fletcher Goldfarb Shanno one step secant Levenberg Marquardt and conjugate gradient Fletcher Reeves update Polak Ribiere update Powell Beale restart scaled conjugate gradient 4 Algorithm Edit Let N displaystyle N nbsp be a network with e displaystyle e nbsp connections m displaystyle m nbsp inputs and n displaystyle n nbsp outputs Below x 1 x 2 displaystyle x 1 x 2 dots nbsp denote vectors in R m displaystyle mathbb R m nbsp y 1 y 2 displaystyle y 1 y 2 dots nbsp vectors in R n displaystyle mathbb R n nbsp and w 0 w 1 w 2 displaystyle w 0 w 1 w 2 ldots nbsp vectors in R e displaystyle mathbb R e nbsp These are called inputs outputs and weights respectively The network corresponds to a function y f N w x displaystyle y f N w x nbsp which given a weight w displaystyle w nbsp maps an input x displaystyle x nbsp to an output y displaystyle y nbsp In supervised learning a sequence of training examples x 1 y 1 x p y p displaystyle x 1 y 1 dots x p y p nbsp produces a sequence of weights w 0 w 1 w p displaystyle w 0 w 1 dots w p nbsp starting from some initial weight w 0 displaystyle w 0 nbsp usually chosen at random These weights are computed in turn first compute w i displaystyle w i nbsp using only x i y i w i 1 displaystyle x i y i w i 1 nbsp for i 1 p displaystyle i 1 dots p nbsp The output of the algorithm is then w p displaystyle w p nbsp giving a new function x f N w p x displaystyle x mapsto f N w p x nbsp The computation is the same in each step hence only the case i 1 displaystyle i 1 nbsp is described w 1 displaystyle w 1 nbsp is calculated from x 1 y 1 w 0 displaystyle x 1 y 1 w 0 nbsp by considering a variable weight w displaystyle w nbsp and applying gradient descent to the function w E f N w x 1 y 1 displaystyle w mapsto E f N w x 1 y 1 nbsp to find a local minimum starting at w w 0 displaystyle w w 0 nbsp This makes w 1 displaystyle w 1 nbsp the minimizing weight found by gradient descent Learning pseudocode EditTo implement the algorithm above explicit formulas are required for the gradient of the function w E f N w x y displaystyle w mapsto E f N w x y nbsp where the function is E y y y y 2 displaystyle E y y y y 2 nbsp The learning algorithm can be divided into two phases propagation and weight update Propagation Edit Propagation involves the following steps Propagation forward through the network to generate the output value s Calculation of the cost error term Propagation of the output activations back through the network using the training pattern target to generate the deltas the difference between the targeted and actual output values of all output and hidden neurons Weight update Edit For each weight Multiply the weight s output delta and input activation to find the gradient of the weight Subtract the ratio percentage of the weight s gradient from the weight The learning rate is the ratio percentage that influences the speed and quality of learning The greater the ratio the faster the neuron trains but the lower the ratio the more accurate the training The sign of the gradient of a weight indicates whether the error varies directly with or inversely to the weight Therefore the weight must be updated in the opposite direction descending the gradient Learning is repeated on new batches until the network performs adequately Pseudocode Edit Pseudocode for a stochastic gradient descent algorithm for training a three layer network one hidden layer initialize network weights often small random values do for each training example named ex do prediction neural net output network ex forward pass actual teacher output ex compute error prediction actual at the output units compute D w h displaystyle Delta w h nbsp for all weights from hidden layer to output layer backward pass compute D w i displaystyle Delta w i nbsp for all weights from input layer to hidden layer backward pass continued update network weights input layer not modified by error estimate until error rate becomes acceptably low return the network The lines labeled backward pass can be implemented using the backpropagation algorithm which calculates the gradient of the error of the network regarding the network s modifiable weights 5 References Edit a b Zell Andreas 2003 chapter 5 2 Simulation neuronaler Netze Simulation of Neural Networks in German 1st ed Addison Wesley ISBN 978 3 89319 554 1 OCLC 249017987 DAWSON CHRISTIAN W 1998 An artificial neural network approach to rainfall runoff modelling Hydrological Sciences Journal 43 1 47 66 doi 10 1080 02626669809492102 The Machine Learning Dictionary www cse unsw edu au Archived from the original on 2018 08 26 Retrieved 2019 08 18 M Forouzanfar H R Dajani V Z Groza M Bolic amp S Rajan July 2010 Comparison of Feed Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation 4th Int Workshop Soft Computing Applications Arad Romania IEEE Werbos Paul J 1994 The Roots of Backpropagation From Ordered Derivatives to Neural Networks and Political Forecasting New York NY John Wiley amp Sons Inc Retrieved from https en wikipedia org w index php title Mathematics of artificial neural networks amp oldid 1163042892, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.