fbpx
Wikipedia

Frank–Wolfe algorithm

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method,[1] reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956.[2] In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function (taken over the same domain).

Problem statement edit

Suppose   is a compact convex set in a vector space and   is a convex, differentiable real-valued function. The Frank–Wolfe algorithm solves the optimization problem

Minimize  
subject to  .

Algorithm edit

 
A step of the Frank–Wolfe algorithm
Initialization: Let  , and let   be any point in  .
Step 1. Direction-finding subproblem: Find   solving
Minimize  
Subject to  
(Interpretation: Minimize the linear approximation of the problem given by the first-order Taylor approximation of   around   constrained to stay within  .)
Step 2. Step size determination: Set  , or alternatively find   that minimizes   subject to   .
Step 3. Update: Let  , let   and go to Step 1.

Properties edit

While competing methods such as gradient descent for constrained optimization require a projection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a convex problem over the same set in each iteration, and automatically stays in the feasible set.

The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is   after k iterations, so long as the gradient is Lipschitz continuous with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.[3]

The iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in machine learning and signal processing problems,[4] as well as for example the optimization of minimum–cost flows in transportation networks.[5]

If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a linear program.

While the worst-case convergence rate with   can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.[6]

Lower bounds on the solution value, and primal-dual analysis edit

Since   is convex, for any two points   we have:

 

This also holds for the (unknown) optimal solution  . That is,  . The best lower bound with respect to a given point   is given by

 

The latter optimization problem is solved in every iteration of the Frank–Wolfe algorithm, therefore the solution   of the direction-finding subproblem of the  -th iteration can be used to determine increasing lower bounds   during each iteration by setting   and

 

Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always  .

It has been shown that this corresponding duality gap, that is the difference between   and the lower bound  , decreases with the same convergence rate, i.e.  

Notes edit

  1. ^ Levitin, E. S.; Polyak, B. T. (1966). "Constrained minimization methods". USSR Computational Mathematics and Mathematical Physics. 6 (5): 1. doi:10.1016/0041-5553(66)90114-5.
  2. ^ Frank, M.; Wolfe, P. (1956). "An algorithm for quadratic programming". Naval Research Logistics Quarterly. 3 (1–2): 95–110. doi:10.1002/nav.3800030109.
  3. ^ Dunn, J. C.; Harshbarger, S. (1978). "Conditional gradient algorithms with open loop step size rules". Journal of Mathematical Analysis and Applications. 62 (2): 432. doi:10.1016/0022-247X(78)90137-3.
  4. ^ Clarkson, K. L. (2010). "Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm". ACM Transactions on Algorithms. 6 (4): 1–30. CiteSeerX 10.1.1.145.9299. doi:10.1145/1824777.1824783.
  5. ^ Fukushima, M. (1984). "A modified Frank-Wolfe algorithm for solving the traffic assignment problem". Transportation Research Part B: Methodological. 18 (2): 169–177. doi:10.1016/0191-2615(84)90029-8.
  6. ^ Bertsekas, Dimitri (1999). Nonlinear Programming. Athena Scientific. p. 215. ISBN 978-1-886529-00-7.

Bibliography edit

  • Jaggi, Martin (2013). "Revisiting Frank–Wolfe: Projection-Free Sparse Convex Optimization". Journal of Machine Learning Research: Workshop and Conference Proceedings. 28 (1): 427–435. (Overview paper)
  • The Frank–Wolfe algorithm description
  • Nocedal, Jorge; Wright, Stephen J. (2006). Numerical Optimization (2nd ed.). Berlin, New York: Springer-Verlag. ISBN 978-0-387-30303-1..

External links edit

See also edit

frank, wolfe, algorithm, iterative, first, order, optimization, algorithm, constrained, convex, optimization, also, known, conditional, gradient, method, reduced, gradient, algorithm, convex, combination, algorithm, method, originally, proposed, marguerite, fr. The Frank Wolfe algorithm is an iterative first order optimization algorithm for constrained convex optimization Also known as the conditional gradient method 1 reduced gradient algorithm and the convex combination algorithm the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956 2 In each iteration the Frank Wolfe algorithm considers a linear approximation of the objective function and moves towards a minimizer of this linear function taken over the same domain Contents 1 Problem statement 2 Algorithm 3 Properties 4 Lower bounds on the solution value and primal dual analysis 5 Notes 6 Bibliography 7 External links 8 See alsoProblem statement editSuppose D displaystyle mathcal D nbsp is a compact convex set in a vector space and f D R displaystyle f colon mathcal D to mathbb R nbsp is a convex differentiable real valued function The Frank Wolfe algorithm solves the optimization problem Minimize f x displaystyle f mathbf x nbsp subject to x D displaystyle mathbf x in mathcal D nbsp Algorithm edit nbsp A step of the Frank Wolfe algorithm Initialization Let k 0 displaystyle k leftarrow 0 nbsp and let x 0 displaystyle mathbf x 0 nbsp be any point in D displaystyle mathcal D nbsp Step 1 Direction finding subproblem Find s k displaystyle mathbf s k nbsp solvingMinimize s T f x k displaystyle mathbf s T nabla f mathbf x k nbsp Subject to x k s D displaystyle mathbf x k mathbf s in mathcal D nbsp dd Interpretation Minimize the linear approximation of the problem given by the first order Taylor approximation of f displaystyle f nbsp around x k displaystyle mathbf x k nbsp constrained to stay within D displaystyle mathcal D nbsp Step 2 Step size determination Set a 2 k 2 displaystyle alpha leftarrow frac 2 k 2 nbsp or alternatively find a displaystyle alpha nbsp that minimizes f x k a s k x k displaystyle f mathbf x k alpha mathbf s k mathbf x k nbsp subject to 0 a 1 displaystyle 0 leq alpha leq 1 nbsp Step 3 Update Let x k 1 x k a s k x k displaystyle mathbf x k 1 leftarrow mathbf x k alpha mathbf s k mathbf x k nbsp let k k 1 displaystyle k leftarrow k 1 nbsp and go to Step 1 Properties editWhile competing methods such as gradient descent for constrained optimization require a projection step back to the feasible set in each iteration the Frank Wolfe algorithm only needs the solution of a convex problem over the same set in each iteration and automatically stays in the feasible set The convergence of the Frank Wolfe algorithm is sublinear in general the error in the objective function to the optimum is O 1 k displaystyle O 1 k nbsp after k iterations so long as the gradient is Lipschitz continuous with respect to some norm The same convergence rate can also be shown if the sub problems are only solved approximately 3 The iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set which has helped to the popularity of the algorithm for sparse greedy optimization in machine learning and signal processing problems 4 as well as for example the optimization of minimum cost flows in transportation networks 5 If the feasible set is given by a set of linear constraints then the subproblem to be solved in each iteration becomes a linear program While the worst case convergence rate with O 1 k displaystyle O 1 k nbsp can not be improved in general faster convergence can be obtained for special problem classes such as some strongly convex problems 6 Lower bounds on the solution value and primal dual analysis editSince f displaystyle f nbsp is convex for any two points x y D displaystyle mathbf x mathbf y in mathcal D nbsp we have f y f x y x T f x displaystyle f mathbf y geq f mathbf x mathbf y mathbf x T nabla f mathbf x nbsp This also holds for the unknown optimal solution x displaystyle mathbf x nbsp That is f x f x x x T f x displaystyle f mathbf x geq f mathbf x mathbf x mathbf x T nabla f mathbf x nbsp The best lower bound with respect to a given point x displaystyle mathbf x nbsp is given by f x f x x x T f x min y D f x y x T f x f x x T f x min y D y T f x displaystyle begin aligned f mathbf x amp geq f mathbf x mathbf x mathbf x T nabla f mathbf x amp geq min mathbf y in D left f mathbf x mathbf y mathbf x T nabla f mathbf x right amp f mathbf x mathbf x T nabla f mathbf x min mathbf y in D mathbf y T nabla f mathbf x end aligned nbsp The latter optimization problem is solved in every iteration of the Frank Wolfe algorithm therefore the solution s k displaystyle mathbf s k nbsp of the direction finding subproblem of the k displaystyle k nbsp th iteration can be used to determine increasing lower bounds l k displaystyle l k nbsp during each iteration by setting l 0 displaystyle l 0 infty nbsp and l k max l k 1 f x k s k x k T f x k displaystyle l k max l k 1 f mathbf x k mathbf s k mathbf x k T nabla f mathbf x k nbsp Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion and give an efficient certificate of the approximation quality in every iteration since always l k f x f x k displaystyle l k leq f mathbf x leq f mathbf x k nbsp It has been shown that this corresponding duality gap that is the difference between f x k displaystyle f mathbf x k nbsp and the lower bound l k displaystyle l k nbsp decreases with the same convergence rate i e f x k l k O 1 k displaystyle f mathbf x k l k O 1 k nbsp Notes edit Levitin E S Polyak B T 1966 Constrained minimization methods USSR Computational Mathematics and Mathematical Physics 6 5 1 doi 10 1016 0041 5553 66 90114 5 Frank M Wolfe P 1956 An algorithm for quadratic programming Naval Research Logistics Quarterly 3 1 2 95 110 doi 10 1002 nav 3800030109 Dunn J C Harshbarger S 1978 Conditional gradient algorithms with open loop step size rules Journal of Mathematical Analysis and Applications 62 2 432 doi 10 1016 0022 247X 78 90137 3 Clarkson K L 2010 Coresets sparse greedy approximation and the Frank Wolfe algorithm ACM Transactions on Algorithms 6 4 1 30 CiteSeerX 10 1 1 145 9299 doi 10 1145 1824777 1824783 Fukushima M 1984 A modified Frank Wolfe algorithm for solving the traffic assignment problem Transportation Research Part B Methodological 18 2 169 177 doi 10 1016 0191 2615 84 90029 8 Bertsekas Dimitri 1999 Nonlinear Programming Athena Scientific p 215 ISBN 978 1 886529 00 7 Bibliography editJaggi Martin 2013 Revisiting Frank Wolfe Projection Free Sparse Convex Optimization Journal of Machine Learning Research Workshop and Conference Proceedings 28 1 427 435 Overview paper The Frank Wolfe algorithm description Nocedal Jorge Wright Stephen J 2006 Numerical Optimization 2nd ed Berlin New York Springer Verlag ISBN 978 0 387 30303 1 External links edithttps conditional gradients org a survey of Frank Wolfe algorithms Marguerite Frank giving a personal account of the history of the algorithmSee also editProximal gradient methods Retrieved from https en wikipedia org w index php title Frank Wolfe algorithm amp oldid 1221067654, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.