fbpx
Wikipedia

Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function

the minimum of is obtained when the gradient is 0:

.

Whereas linear conjugate gradient seeks a solution to the linear equation , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there.

Given a function of variables to minimize, its gradient indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

with an adjustable step length and performs a line search in this direction until it reaches the minimum of :

,

After this first iteration in the steepest direction , the following steps constitute one iteration of moving along a subsequent conjugate direction , where :

  1. Calculate the steepest direction: ,
  2. Compute according to one of the formulas below,
  3. Update the conjugate direction:
  4. Perform a line search: optimize ,
  5. Update the position: ,

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters and are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys, where the steepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas for are named after their developers:

  • Fletcher–Reeves:[1]
  • Polak–Ribière:[2]
  • Hestenes–Stiefel:[3]
.

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is , which provides a direction reset automatically.[5]

Algorithms based on Newton's method potentially converge much faster. There, both step direction and length are computed from the gradient as the solution of a linear system of equations, with the coefficient matrix being the exact Hessian matrix (for Newton's method proper) or an estimate thereof (in the quasi-Newton methods, where the observed change in the gradient during the iterations is used to update the Hessian estimate). For high-dimensional problems, the exact computation of the Hessian is usually prohibitively expensive, and even its storage can be problematic, requiring memory (but see the limited-memory L-BFGS quasi-Newton method).

The conjugate gradient method can also be derived using optimal control theory.[6] In this accelerated optimization theory, the conjugate gradient method falls out as a nonlinear optimal feedback controller,

for the double integrator system,

The quantities and are variable feedback gains.[6]

See also edit

References edit

  1. ^ Fletcher, R.; Reeves, C. M. (1964). "Function minimization by conjugate gradients". The Computer Journal. 7 (2): 149–154. doi:10.1093/comjnl/7.2.149.
  2. ^ Polak, E.; Ribière, G. (1969). "Note sur la convergence de méthodes de directions conjuguées". Revue Française d'Automatique, Informatique, Recherche Opérationnelle. 3 (1): 35–43.
  3. ^ Hestenes, M. R.; Stiefel, E. (1952). "Methods of Conjugate Gradients for Solving Linear Systems". Journal of Research of the National Bureau of Standards. 49 (6): 409–436. doi:10.6028/jres.049.044.
  4. ^ Dai, Y.-H.; Yuan, Y. (1999). "A nonlinear conjugate gradient method with a strong global convergence property". SIAM Journal on Optimization. 10 (1): 177–182. doi:10.1137/S1052623497318992.
  5. ^ Shewchuk, J. R. (August 1994). "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain" (PDF).
  6. ^ a b Ross, I. M. (2019). "An Optimal Control Theory for Accelerated Optimization". arXiv:1902.09004 [math.OC].

nonlinear, conjugate, gradient, method, numerical, optimization, nonlinear, conjugate, gradient, method, generalizes, conjugate, gradient, method, nonlinear, optimization, quadratic, function, displaystyle, displaystyle, displaystyle, displaystyle, minimum, di. In numerical optimization the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization For a quadratic function f x displaystyle displaystyle f x f x Ax b 2 displaystyle displaystyle f x Ax b 2 dd the minimum of f displaystyle f is obtained when the gradient is 0 xf 2AT Ax b 0 displaystyle nabla x f 2A T Ax b 0 dd Whereas linear conjugate gradient seeks a solution to the linear equation ATAx ATb displaystyle displaystyle A T Ax A T b the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient xf displaystyle nabla x f alone It works when the function is approximately quadratic near the minimum which is the case when the function is twice differentiable at the minimum and the second derivative is non singular there Given a function f x displaystyle displaystyle f x of N displaystyle N variables to minimize its gradient xf displaystyle nabla x f indicates the direction of maximum increase One simply starts in the opposite steepest descent direction Dx0 xf x0 displaystyle Delta x 0 nabla x f x 0 dd with an adjustable step length a displaystyle displaystyle alpha and performs a line search in this direction until it reaches the minimum of f displaystyle displaystyle f a0 arg minaf x0 aDx0 displaystyle displaystyle alpha 0 arg min alpha f x 0 alpha Delta x 0 x1 x0 a0Dx0 displaystyle displaystyle x 1 x 0 alpha 0 Delta x 0 dd After this first iteration in the steepest direction Dx0 displaystyle displaystyle Delta x 0 the following steps constitute one iteration of moving along a subsequent conjugate direction sn displaystyle displaystyle s n where s0 Dx0 displaystyle displaystyle s 0 Delta x 0 Calculate the steepest direction Dxn xf xn displaystyle Delta x n nabla x f x n Compute bn displaystyle displaystyle beta n according to one of the formulas below Update the conjugate direction sn Dxn bnsn 1 displaystyle displaystyle s n Delta x n beta n s n 1 Perform a line search optimize an arg minaf xn asn displaystyle displaystyle alpha n arg min alpha f x n alpha s n Update the position xn 1 xn ansn displaystyle displaystyle x n 1 x n alpha n s n With a pure quadratic function the minimum is reached within N iterations excepting roundoff error but a non quadratic function will make slower progress Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations or sooner if progress stops However resetting every iteration turns the method into steepest descent The algorithm stops when it finds the minimum determined when no progress is made after a direction reset i e in the steepest descent direction or when some tolerance criterion is reached Within a linear approximation the parameters a displaystyle displaystyle alpha and b displaystyle displaystyle beta are the same as in the linear conjugate gradient method but have been obtained with line searches The conjugate gradient method can follow narrow ill conditioned valleys where the steepest descent method slows down and follows a criss cross pattern Four of the best known formulas for bn displaystyle displaystyle beta n are named after their developers Fletcher Reeves 1 bnFR DxnTDxnDxn 1TDxn 1 displaystyle beta n FR frac Delta x n T Delta x n Delta x n 1 T Delta x n 1 dd Polak Ribiere 2 bnPR DxnT Dxn Dxn 1 Dxn 1TDxn 1 displaystyle beta n PR frac Delta x n T Delta x n Delta x n 1 Delta x n 1 T Delta x n 1 dd Hestenes Stiefel 3 bnHS DxnT Dxn Dxn 1 sn 1T Dxn Dxn 1 displaystyle beta n HS frac Delta x n T Delta x n Delta x n 1 s n 1 T Delta x n Delta x n 1 dd Dai Yuan 4 bnDY DxnTDxn sn 1T Dxn Dxn 1 displaystyle beta n DY frac Delta x n T Delta x n s n 1 T Delta x n Delta x n 1 dd These formulas are equivalent for a quadratic function but for nonlinear optimization the preferred formula is a matter of heuristics or taste A popular choice is b max 0 bPR displaystyle displaystyle beta max 0 beta PR which provides a direction reset automatically 5 Algorithms based on Newton s method potentially converge much faster There both step direction and length are computed from the gradient as the solution of a linear system of equations with the coefficient matrix being the exact Hessian matrix for Newton s method proper or an estimate thereof in the quasi Newton methods where the observed change in the gradient during the iterations is used to update the Hessian estimate For high dimensional problems the exact computation of the Hessian is usually prohibitively expensive and even its storage can be problematic requiring O N2 displaystyle O N 2 memory but see the limited memory L BFGS quasi Newton method The conjugate gradient method can also be derived using optimal control theory 6 In this accelerated optimization theory the conjugate gradient method falls out as a nonlinear optimal feedback controller u k x x ga xf x gbx displaystyle u k x dot x gamma a nabla x f x gamma b dot x for the double integrator system x u displaystyle ddot x u The quantities ga gt 0 displaystyle gamma a gt 0 and gb gt 0 displaystyle gamma b gt 0 are variable feedback gains 6 See also editGradient descent Broyden Fletcher Goldfarb Shanno algorithm Conjugate gradient method L BFGS limited memory BFGS Nelder Mead method Wolfe conditionsReferences edit Fletcher R Reeves C M 1964 Function minimization by conjugate gradients The Computer Journal 7 2 149 154 doi 10 1093 comjnl 7 2 149 Polak E Ribiere G 1969 Note sur la convergence de methodes de directions conjuguees Revue Francaise d Automatique Informatique Recherche Operationnelle 3 1 35 43 Hestenes M R Stiefel E 1952 Methods of Conjugate Gradients for Solving Linear Systems Journal of Research of the National Bureau of Standards 49 6 409 436 doi 10 6028 jres 049 044 Dai Y H Yuan Y 1999 A nonlinear conjugate gradient method with a strong global convergence property SIAM Journal on Optimization 10 1 177 182 doi 10 1137 S1052623497318992 Shewchuk J R August 1994 An Introduction to the Conjugate Gradient Method Without the Agonizing Pain PDF a b Ross I M 2019 An Optimal Control Theory for Accelerated Optimization arXiv 1902 09004 math OC Retrieved from https en wikipedia org w index php title Nonlinear conjugate gradient method amp oldid 1217385492, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.