fbpx
Wikipedia

Barzilai-Borwein method

The Barzilai-Borwein method[1] is an iterative gradient descent method for unconstrained optimization using either of two step sizes derived from the linear trend of the most recent two iterates.  This method, and modifications, are globally convergent under mild conditions,[2][3] and perform competitively with conjugate gradient methods for many problems.[4] Not depending on the objective itself, it can also solve some systems of linear and non-linear equations.

Method edit

To minimize a convex function   with gradient vector   at point  , let there be two prior iterates,   and  , in which   where   is the previous iteration's step size (not necessarily a Barzilai-Borwein step size), and for brevity, let   and  .

A Barzilai-Borwein (BB) iteration is   where the step size   is either

[long BB step]  , or

[short BB step]  .

Barzilai-Borwein also applies to systems of equations   for   in which the Jacobian of   is positive-definite in the symmetric part, that is,   is necessarily positive.

Derivation edit

Despite its simplicity and optimality properties, Cauchy's classical steepest-descent method[5] for unconstrained optimization often performs poorly.[6] This has motivated many to propose alternate search directions, such as the conjugate gradient method. Jonathan Barzilai and Jonathan Borwein instead proposed new step sizes for the gradient by approximating the quasi-Newton method, creating a scalar approximation of the Hessian estimated from the finite differences between two evaluation points of the gradient, these being the most recent two iterates.

In a quasi-Newton iteration,

 

where   is some approximation of the Jacobian matrix of   (i.e. Hessian of the objective function) which satisfies the secant equation  . Barzilai and Borwein simplify   with a scalar  , which usually cannot exactly satisfy the secant equation, but approximate it as  . Approximations by two least-squares criteria are:

[1] Minimize   with respect to  , yielding the long BB step, or

[2] Minimize   with respect to  , yielding the short BB step.

Properties edit

In one dimension, both BB step sizes are equal and same as the classical secant method.

The long BB step size is the same as a linearized Cauchy step, i.e. the first estimate using a secant-method for the line search (also, for linear problems).  The short BB step size is same as a linearized minimum-residual step.  BB applies the step sizes upon the forward direction vector for the next iterate, instead of the prior direction vector as if for another line-search step.

Barzilai and Borwein proved their method converges R-superlinearly for quadratic minimization in two dimensions. Raydan[2] demonstrates convergence in general for quadratic problems. Convergence is usually non-monotone, that is, neither the objective function nor the residual or gradient magnitude necessarily decrease with each iteration along a successful convergence toward the solution.

If   is a quadratic function with Hessian  ,   is the Rayleigh quotient of   by vector  , and   is the Rayleigh quotient of   by vector   (here taking   as a solution to  , more at Definite matrix).

Fletcher[4] compared its computational performance to conjugate gradient (CG) methods, finding CG tending faster for linear problems, but BB often faster for non-linear problems versus applicable CG-based methods.

BB has low storage requirements, suitable for large systems with millions of elements in  .

 angle between   and  .

Modifications and related methods edit

Since being demonstrated by Raydan,[3] BB is often applied with the non-monotone safeguarding strategy of Grippo, Lampariello, and Lucidi.[7] This tolerates some rise of the objective, but excessive rise initiates a backtracking line search using smaller step sizes, to assure global convergence. Fletcher[4] finds that allowing wider limits for non-monotonicity tend to result in more efficient convergence.

Others[8][9][10][11] have identified a step size being the geometric mean between the long and short BB step sizes, which exhibits similar properties.

References edit

  1. ^ Barzilai, Jonathan; Borwein, Jonathan M. (1988). "Two-Point Step Size Gradient Methods". IMA Journal of Numerical Analysis. 8: 141–148. doi:10.1093/imanum/8.1.141.
  2. ^ a b Raydan, Marcos (1993). "On the Barzilai and Borwein choice of steplength for the gradient method". IMA Journal of Numerical Analysis. 13 (3): 321–326. doi:10.1093/imanum/13.3.321. hdl:1911/101676.
  3. ^ a b Raydan, M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal of Optimization 7, pp 26-33. 1997
  4. ^ a b c Fletcher, R. (2005). "On the Barzilai–Borwein Method". In Qi, L.; Teo, K.; Yang, X. (eds.). Optimization and Control with Applications. Applied Optimization. Vol. 96. Boston: Springer. pp. 235–256. ISBN 0-387-24254-6
  5. ^ A. Cauchy. Méthode générale pour la résolution des systèmes d’équations simultanées. C. R. Acad. Sci. Paris, 25:536–538, 1847.
  6. ^ H. Akaike, On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method, Ann. Inst. Statist. Math Tokyo, 11 (1959), pp. 1–17
  7. ^ L. Grippo, F. Lampariello, and S. Lucidi, “A nonmonotone line search technique for Newton’s method,” SIAM J. Numer. Anal., vol. 23, pp. 707–716, 1986
  8. ^ Varadhan R, Roland C (2008). Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2), 335-353.
  9. ^ Y. H. Dai, M. Al-Baali, and X. Yang, “A positive Barzilai-Borwein-like stepsize and an extension for symmetric linear systems,” in Numerical Analysis and Optimization. Cham, Switzerland: Springer, 2015, pp. 59-75.
  10. ^ Dai, Yu-Hong; Huang, Yakui; Liu, Xin-Wei (2018). "A family of spectral gradient methods for optimization". arXiv:1812.02974 [math.OC].
  11. ^ Shuai Huang, Zhong Wan, A new nonmonotone spectral residual method for nonsmooth nonlinear equations, Journal of Computational and Applied Mathematics 313, pp 82-101, Elsevier, 2017

External links edit

  • Jonathan Barzilai

barzilai, borwein, method, iterative, gradient, descent, method, unconstrained, optimization, using, either, step, sizes, derived, from, linear, trend, most, recent, iterates, this, method, modifications, globally, convergent, under, mild, conditions, perform,. The Barzilai Borwein method 1 is an iterative gradient descent method for unconstrained optimization using either of two step sizes derived from the linear trend of the most recent two iterates This method and modifications are globally convergent under mild conditions 2 3 and perform competitively with conjugate gradient methods for many problems 4 Not depending on the objective itself it can also solve some systems of linear and non linear equations Contents 1 Method 2 Derivation 3 Properties 4 Modifications and related methods 5 References 6 External linksMethod editTo minimize a convex function f R n R displaystyle f mathbb R n rightarrow mathbb R nbsp with gradient vector g displaystyle g nbsp at point x displaystyle x nbsp let there be two prior iterates g k 1 x k 1 displaystyle g k 1 x k 1 nbsp and g k x k displaystyle g k x k nbsp in which x k x k 1 a k 1 g k 1 displaystyle x k x k 1 alpha k 1 g k 1 nbsp where a k 1 displaystyle alpha k 1 nbsp is the previous iteration s step size not necessarily a Barzilai Borwein step size and for brevity let D x x k x k 1 displaystyle Delta x x k x k 1 nbsp and D g g k g k 1 displaystyle Delta g g k g k 1 nbsp A Barzilai Borwein BB iteration is x k 1 x k a k g k displaystyle x k 1 x k alpha k g k nbsp where the step size a k displaystyle alpha k nbsp is either long BB step a k L O N G D x D x D x D g displaystyle alpha k LONG frac Delta x cdot Delta x Delta x cdot Delta g nbsp or short BB step a k S H O R T D x D g D g D g displaystyle alpha k SHORT frac Delta x cdot Delta g Delta g cdot Delta g nbsp Barzilai Borwein also applies to systems of equations g x 0 displaystyle g x 0 nbsp for g R n R n displaystyle g mathbb R n rightarrow mathbb R n nbsp in which the Jacobian of g displaystyle g nbsp is positive definite in the symmetric part that is D x D g displaystyle Delta x cdot Delta g nbsp is necessarily positive Derivation editDespite its simplicity and optimality properties Cauchy s classical steepest descent method 5 for unconstrained optimization often performs poorly 6 This has motivated many to propose alternate search directions such as the conjugate gradient method Jonathan Barzilai and Jonathan Borwein instead proposed new step sizes for the gradient by approximating the quasi Newton method creating a scalar approximation of the Hessian estimated from the finite differences between two evaluation points of the gradient these being the most recent two iterates In a quasi Newton iteration x k 1 x k B 1 g x k displaystyle x k 1 x k B 1 g x k nbsp where B displaystyle B nbsp is some approximation of the Jacobian matrix of g displaystyle g nbsp i e Hessian of the objective function which satisfies the secant equation B k D x k D g k displaystyle B k Delta x k Delta g k nbsp Barzilai and Borwein simplify B displaystyle B nbsp with a scalar 1 a displaystyle 1 alpha nbsp which usually cannot exactly satisfy the secant equation but approximate it as 1 a D x D g displaystyle frac 1 alpha Delta x approx Delta g nbsp Approximations by two least squares criteria are 1 Minimize D x a D g 2 displaystyle Delta x alpha Delta g 2 nbsp with respect to a displaystyle alpha nbsp yielding the long BB step or 2 Minimize D x a D g 2 displaystyle Delta x alpha Delta g 2 nbsp with respect to a displaystyle alpha nbsp yielding the short BB step Properties editIn one dimension both BB step sizes are equal and same as the classical secant method The long BB step size is the same as a linearized Cauchy step i e the first estimate using a secant method for the line search also for linear problems The short BB step size is same as a linearized minimum residual step BB applies the step sizes upon the forward direction vector for the next iterate instead of the prior direction vector as if for another line search step Barzilai and Borwein proved their method converges R superlinearly for quadratic minimization in two dimensions Raydan 2 demonstrates convergence in general for quadratic problems Convergence is usually non monotone that is neither the objective function nor the residual or gradient magnitude necessarily decrease with each iteration along a successful convergence toward the solution If f displaystyle f nbsp is a quadratic function with Hessian A displaystyle A nbsp 1 a L O N G displaystyle 1 alpha LONG nbsp is the Rayleigh quotient of A displaystyle A nbsp by vector D x displaystyle Delta x nbsp and 1 a S H O R T displaystyle 1 alpha SHORT nbsp is the Rayleigh quotient of A displaystyle A nbsp by vector A D x displaystyle sqrt A Delta x nbsp here taking A displaystyle sqrt A nbsp as a solution to A T A A displaystyle sqrt A T sqrt A A nbsp more at Definite matrix Fletcher 4 compared its computational performance to conjugate gradient CG methods finding CG tending faster for linear problems but BB often faster for non linear problems versus applicable CG based methods BB has low storage requirements suitable for large systems with millions of elements in x displaystyle x nbsp a S H O R T a L O N G c o s 2 displaystyle frac alpha SHORT alpha LONG cos 2 nbsp angle between D x displaystyle Delta x nbsp and D g displaystyle Delta g nbsp Modifications and related methods editSince being demonstrated by Raydan 3 BB is often applied with the non monotone safeguarding strategy of Grippo Lampariello and Lucidi 7 This tolerates some rise of the objective but excessive rise initiates a backtracking line search using smaller step sizes to assure global convergence Fletcher 4 finds that allowing wider limits for non monotonicity tend to result in more efficient convergence Others 8 9 10 11 have identified a step size being the geometric mean between the long and short BB step sizes which exhibits similar properties References edit Barzilai Jonathan Borwein Jonathan M 1988 Two Point Step Size Gradient Methods IMA Journal of Numerical Analysis 8 141 148 doi 10 1093 imanum 8 1 141 a b Raydan Marcos 1993 On the Barzilai and Borwein choice of steplength for the gradient method IMA Journal of Numerical Analysis 13 3 321 326 doi 10 1093 imanum 13 3 321 hdl 1911 101676 a b Raydan M The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem SIAM Journal of Optimization 7 pp 26 33 1997 a b c Fletcher R 2005 On the Barzilai Borwein Method In Qi L Teo K Yang X eds Optimization and Control with Applications Applied Optimization Vol 96 Boston Springer pp 235 256 ISBN 0 387 24254 6 A Cauchy Methode generale pour la resolution des systemes d equations simultanees C R Acad Sci Paris 25 536 538 1847 H Akaike On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method Ann Inst Statist Math Tokyo 11 1959 pp 1 17 L Grippo F Lampariello and S Lucidi A nonmonotone line search technique for Newton s method SIAM J Numer Anal vol 23 pp 707 716 1986 Varadhan R Roland C 2008 Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm Scandinavian Journal of Statistics 35 2 335 353 Y H Dai M Al Baali and X Yang A positive Barzilai Borwein like stepsize and an extension for symmetric linear systems in Numerical Analysis and Optimization Cham Switzerland Springer 2015 pp 59 75 Dai Yu Hong Huang Yakui Liu Xin Wei 2018 A family of spectral gradient methods for optimization arXiv 1812 02974 math OC Shuai Huang Zhong Wan A new nonmonotone spectral residual method for nonsmooth nonlinear equations Journal of Computational and Applied Mathematics 313 pp 82 101 Elsevier 2017External links editJonathan Barzilai Retrieved from https en wikipedia org w index php title Barzilai Borwein method amp oldid 1170137041, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.