fbpx
Wikipedia

LogSumExp

The LogSumExp (LSE) (also called RealSoftMax[1] or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms.[2] It is defined as the logarithm of the sum of the exponentials of the arguments:

Properties edit

The LogSumExp function domain is  , the real coordinate space, and its codomain is  , the real line. It is an approximation to the maximum   with the following bounds

 
The first inequality is strict unless  . The second inequality is strict unless all arguments are equal. (Proof: Let  . Then  . Applying the logarithm to the inequality gives the result.)

In addition, we can scale the function to make the bounds tighter. Consider the function  . Then

 
(Proof: Replace each   with   for some   in the inequalities above, to give
 
and, since  
 
finally, dividing by   gives the result.)

Also, if we multiply by a negative number instead, we of course find a comparison to the   function:

 
The LogSumExp function is convex, and is strictly increasing everywhere in its domain[3] (but not strictly convex everywhere[4]).

Writing   the partial derivatives are:

 
which means the gradient of LogSumExp is the softmax function.

The convex conjugate of LogSumExp is the negative entropy.

log-sum-exp trick for log-domain calculations edit

The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.[5]

Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale:

 
A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers.[6]

Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).

 
where  

Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.

A strictly convex log-sum-exp type function edit

LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function[7] by adding an extra argument set to zero:

 
This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

In tropical analysis, this is the sum in the log semiring.

See also edit

References edit

  1. ^ Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.
  2. ^ Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18 (12): 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442. S2CID 17259055.
  3. ^ El Ghaoui, Laurent (2017). Optimization Models and Applications.
  4. ^ "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.
  5. ^ McElreath, Richard. Statistical Rethinking. OCLC 1107423386.
  6. ^ "Practical issues: Numeric stability". CS231n Convolutional Neural Networks for Visual Recognition.
  7. ^ Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225. Bibcode:2018arXiv180307225N. {{cite journal}}: Cite journal requires |journal= (help)


logsumexp, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, august, 2015, le. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources LogSumExp news newspapers books scholar JSTOR August 2015 Learn how and when to remove this message The LogSumExp LSE also called RealSoftMax 1 or multivariable softplus function is a smooth maximum a smooth approximation to the maximum function mainly used by machine learning algorithms 2 It is defined as the logarithm of the sum of the exponentials of the arguments L S E x 1 x n log exp x 1 exp x n displaystyle mathrm LSE x 1 dots x n log left exp x 1 cdots exp x n right Contents 1 Properties 2 log sum exp trick for log domain calculations 3 A strictly convex log sum exp type function 4 See also 5 ReferencesProperties editThe LogSumExp function domain is R n displaystyle mathbb R n nbsp the real coordinate space and its codomain is R displaystyle mathbb R nbsp the real line It is an approximation to the maximum max i x i displaystyle max i x i nbsp with the following boundsmax x 1 x n L S E x 1 x n max x 1 x n log n displaystyle max x 1 dots x n leq mathrm LSE x 1 dots x n leq max x 1 dots x n log n nbsp The first inequality is strict unless n 1 displaystyle n 1 nbsp The second inequality is strict unless all arguments are equal Proof Let m max i x i displaystyle m max i x i nbsp Then exp m i 1 n exp x i n exp m displaystyle exp m leq sum i 1 n exp x i leq n exp m nbsp Applying the logarithm to the inequality gives the result In addition we can scale the function to make the bounds tighter Consider the function 1 t L S E t x 1 t x n displaystyle frac 1 t mathrm LSE tx 1 dots tx n nbsp Thenmax x 1 x n lt 1 t L S E t x 1 t x n max x 1 x n log n t displaystyle max x 1 dots x n lt frac 1 t mathrm LSE tx 1 dots tx n leq max x 1 dots x n frac log n t nbsp Proof Replace each x i displaystyle x i nbsp with t x i displaystyle tx i nbsp for some t gt 0 displaystyle t gt 0 nbsp in the inequalities above to give max t x 1 t x n lt L S E t x 1 t x n max t x 1 t x n log n displaystyle max tx 1 dots tx n lt mathrm LSE tx 1 dots tx n leq max tx 1 dots tx n log n nbsp and since t gt 0 displaystyle t gt 0 nbsp t max x 1 x n lt L S E t x 1 t x n t max x 1 x n log n displaystyle t max x 1 dots x n lt mathrm LSE tx 1 dots tx n leq t max x 1 dots x n log n nbsp finally dividing by t displaystyle t nbsp gives the result Also if we multiply by a negative number instead we of course find a comparison to the min displaystyle min nbsp function min x 1 x n log n t 1 t L S E t x lt min x 1 x n displaystyle min x 1 dots x n frac log n t leq frac 1 t mathrm LSE tx lt min x 1 dots x n nbsp The LogSumExp function is convex and is strictly increasing everywhere in its domain 3 but not strictly convex everywhere 4 Writing x x 1 x n displaystyle mathbf x x 1 dots x n nbsp the partial derivatives are x i L S E x exp x i j exp x j displaystyle frac partial partial x i mathrm LSE mathbf x frac exp x i sum j exp x j nbsp which means the gradient of LogSumExp is the softmax function The convex conjugate of LogSumExp is the negative entropy log sum exp trick for log domain calculations editThe LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale as in log probability 5 Similar to multiplication operations in linear scale becoming simple additions in log scale an addition operation in linear scale becomes the LSE in log scale L S E log x 1 log x n log x 1 x n displaystyle mathrm LSE log x 1 log x n log x 1 dots x n nbsp A common purpose of using log domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly i e in a linear domain using limited precision floating point numbers 6 Unfortunately the use of LSE directly in this case can again cause overflow underflow problems Therefore the following equivalent must be used instead especially when the accuracy of the above max approximation is not sufficient L S E x 1 x n x log exp x 1 x exp x n x displaystyle mathrm LSE x 1 dots x n x log left exp x 1 x cdots exp x n x right nbsp where x max x 1 x n displaystyle x max x 1 dots x n nbsp Many math libraries such as IT provide a default routine of LSE and use this formula internally A strictly convex log sum exp type function editLSE is convex but not strictly convex We can define a strictly convex log sum exp type function 7 by adding an extra argument set to zero L S E 0 x 1 x n L S E 0 x 1 x n displaystyle mathrm LSE 0 x 1 x n mathrm LSE 0 x 1 x n nbsp This function is a proper Bregman generator strictly convex and differentiable It is encountered in machine learning for example as the cumulant of the multinomial binomial family In tropical analysis this is the sum in the log semiring See also editLogarithmic mean Log semiring Smooth maximum Softmax functionReferences edit Zhang Aston Lipton Zack Li Mu Smola Alex Dive into Deep Learning Chapter 3 Exercises www d2l ai Retrieved 27 June 2020 Nielsen Frank Sun Ke 2016 Guaranteed bounds on the Kullback Leibler divergence of univariate mixtures using piecewise log sum exp inequalities Entropy 18 12 442 arXiv 1606 05850 Bibcode 2016Entrp 18 442N doi 10 3390 e18120442 S2CID 17259055 El Ghaoui Laurent 2017 Optimization Models and Applications convex analysis About the strictly convexity of log sum exp function Mathematics Stack Exchange stackexchange com McElreath Richard Statistical Rethinking OCLC 1107423386 Practical issues Numeric stability CS231n Convolutional Neural Networks for Visual Recognition Nielsen Frank Hadjeres Gaetan 2018 Monte Carlo Information Geometry The dually flat case arXiv 1803 07225 Bibcode 2018arXiv180307225N a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Retrieved from https en wikipedia org w index php title LogSumExp amp oldid 1220911019, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.