fbpx
Wikipedia

Random coordinate descent

Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010).[1] In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) give iteration complexity bounds which do not require this, i.e., the method is applied to the objective function directly. Furthermore, they generalize the setting to the problem of minimizing a composite function, i.e., sum of a smooth convex and a (possibly nonsmooth) convex block-separable function:

where is decomposed into blocks of variables/coordinates: and are (simple) convex functions.

Example (block decomposition): If and , one may choose and .

Example (block-separable regularizers):

  1. , where and is the standard Euclidean norm.

Algorithm Edit

Consider the optimization problem

 

where   is a convex and smooth function.

Smoothness: By smoothness we mean the following: we assume the gradient of   is coordinate-wise Lipschitz continuous with constants  . That is, we assume that

 

for all   and  , where   denotes the partial derivative with respect to variable  .

Nesterov, and Richtarik and Takac showed that the following algorithm converges to the optimal point:

Algorithm Random Coordinate Descent Method Input:   //starting point Output:   set x := x_0 for k := 1, ... do choose coordinate  , uniformly at random update   end for 
  • "←" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

Convergence rate Edit

Since the iterates of this algorithm are random vectors, a complexity result would give a bound on the number of iterations needed for the method to output an approximate solution with high probability. It was shown in [2] that if  , where  ,   is an optimal solution ( ),   is a confidence level and   is target accuracy, then  .

Example on particular function Edit

The following Figure shows how   develops during iterations, in principle. The problem is

 

 

Extension to block coordinate setting Edit

 
Blocking coordinate directions into Block coordinate directions

One can naturally extend this algorithm not only just to coordinates, but to blocks of coordinates. Assume that we have space  . This space has 5 coordinate directions, concretely   in which Random Coordinate Descent Method can move. However, one can group some coordinate directions into blocks and we can have instead of those 5 coordinate directions 3 block coordinate directions (see image).

See also Edit

References Edit

  1. ^ Nesterov, Yurii (2010), "Efficiency of coordinate descent methods on huge-scale optimization problems", SIAM Journal on Optimization, 22 (2): 341–362, CiteSeerX 10.1.1.332.3336, doi:10.1137/100802001
  2. ^ Richtárik, Peter; Takáč, Martin (2011), "Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function", Mathematical Programming, Series A, 144 (1–2): 1–38, arXiv:1107.2848, doi:10.1007/s10107-012-0614-z

random, coordinate, descent, randomized, block, coordinate, descent, method, optimization, algorithm, popularized, nesterov, 2010, richtárik, takáč, 2011, first, analysis, this, method, when, applied, problem, minimizing, smooth, convex, function, performed, n. Randomized Block Coordinate Descent Method is an optimization algorithm popularized by Nesterov 2010 and Richtarik and Takac 2011 The first analysis of this method when applied to the problem of minimizing a smooth convex function was performed by Nesterov 2010 1 In Nesterov s analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor Richtarik and Takac 2011 give iteration complexity bounds which do not require this i e the method is applied to the objective function directly Furthermore they generalize the setting to the problem of minimizing a composite function i e sum of a smooth convex and a possibly nonsmooth convex block separable function F x f x PS x displaystyle F x f x Psi x where PS x i 1 n PS i x i displaystyle Psi x sum i 1 n Psi i x i x R N displaystyle x in R N is decomposed into n displaystyle n blocks of variables coordinates x x 1 x n displaystyle x x 1 dots x n and PS 1 PS n displaystyle Psi 1 dots Psi n are simple convex functions Example block decomposition If x x 1 x 2 x 5 R 5 displaystyle x x 1 x 2 dots x 5 in R 5 and n 3 displaystyle n 3 one may choose x 1 x 1 x 3 x 2 x 2 x 5 displaystyle x 1 x 1 x 3 x 2 x 2 x 5 and x 3 x 4 displaystyle x 3 x 4 Example block separable regularizers n N PS x x 1 i 1 n x i displaystyle n N Psi x x 1 sum i 1 n x i N N 1 N 2 N n PS x i 1 n x i 2 displaystyle N N 1 N 2 dots N n Psi x sum i 1 n x i 2 where x i R N i displaystyle x i in R N i and 2 displaystyle cdot 2 is the standard Euclidean norm Contents 1 Algorithm 2 Convergence rate 3 Example on particular function 4 Extension to block coordinate setting 5 See also 6 ReferencesAlgorithm EditConsider the optimization problem min x R n f x displaystyle min x in R n f x nbsp where f displaystyle f nbsp is a convex and smooth function Smoothness By smoothness we mean the following we assume the gradient of f displaystyle f nbsp is coordinate wise Lipschitz continuous with constants L 1 L 2 L n displaystyle L 1 L 2 dots L n nbsp That is we assume that i f x h e i i f x L i h displaystyle nabla i f x he i nabla i f x leq L i h nbsp for all x R n displaystyle x in R n nbsp and h R displaystyle h in R nbsp where i displaystyle nabla i nbsp denotes the partial derivative with respect to variable x i displaystyle x i nbsp Nesterov and Richtarik and Takac showed that the following algorithm converges to the optimal point Algorithm Random Coordinate Descent Method Input x 0 R n displaystyle x 0 in R n nbsp starting point Output x displaystyle x nbsp set x x 0 for k 1 do choose coordinate i 1 2 n displaystyle i in 1 2 dots n nbsp uniformly at random update x i x i 1 L i i f x displaystyle x i x i frac 1 L i nabla i f x nbsp end for denotes assignment For instance largest item means that the value of largest changes to the value of item return terminates the algorithm and outputs the following value Convergence rate EditSince the iterates of this algorithm are random vectors a complexity result would give a bound on the number of iterations needed for the method to output an approximate solution with high probability It was shown in 2 that if k 2 n R L x 0 ϵ log f x 0 f ϵ r displaystyle k geq frac 2nR L x 0 epsilon log left frac f x 0 f epsilon rho right nbsp where R L x max y max x X y x L f y f x displaystyle R L x max y max x in X y x L f y leq f x nbsp f displaystyle f nbsp is an optimal solution f min x R n f x displaystyle f min x in R n f x nbsp r 0 1 displaystyle rho in 0 1 nbsp is a confidence level and ϵ gt 0 displaystyle epsilon gt 0 nbsp is target accuracy then P r o b f x k f gt ϵ r displaystyle Prob f x k f gt epsilon leq rho nbsp Example on particular function EditThe following Figure shows how x k displaystyle x k nbsp develops during iterations in principle The problem is f x 1 2 x T 1 0 5 0 5 1 x 1 5 1 5 x x 0 0 0 T displaystyle f x tfrac 1 2 x T left begin array cc 1 amp 0 5 0 5 amp 1 end array right x left begin array cc 1 5 amp 1 5 end array right x quad x 0 left begin array cc 0 amp 0 end array right T nbsp nbsp Extension to block coordinate setting Edit nbsp Blocking coordinate directions into Block coordinate directionsOne can naturally extend this algorithm not only just to coordinates but to blocks of coordinates Assume that we have space R 5 displaystyle R 5 nbsp This space has 5 coordinate directions concretely e 1 1 0 0 0 0 T e 2 0 1 0 0 0 T e 3 0 0 1 0 0 T e 4 0 0 0 1 0 T e 5 0 0 0 0 1 T displaystyle e 1 1 0 0 0 0 T e 2 0 1 0 0 0 T e 3 0 0 1 0 0 T e 4 0 0 0 1 0 T e 5 0 0 0 0 1 T nbsp in which Random Coordinate Descent Method can move However one can group some coordinate directions into blocks and we can have instead of those 5 coordinate directions 3 block coordinate directions see image See also EditCoordinate descent Gradient descent Mathematical optimizationReferences Edit Nesterov Yurii 2010 Efficiency of coordinate descent methods on huge scale optimization problems SIAM Journal on Optimization 22 2 341 362 CiteSeerX 10 1 1 332 3336 doi 10 1137 100802001 Richtarik Peter Takac Martin 2011 Iteration complexity of randomized block coordinate descent methods for minimizing a composite function Mathematical Programming Series A 144 1 2 1 38 arXiv 1107 2848 doi 10 1007 s10107 012 0614 z Retrieved from https en wikipedia org w index php title Random coordinate descent amp oldid 1026813015, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.