fbpx
Wikipedia

Huber loss

In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.

Definition edit

 
Huber loss (green,  ) and squared error loss (blue) as a function of  

The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by[1]

 

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where  . The variable a often refers to the residuals, that is to the difference between the observed and predicted values  , so the former can be expanded to[2]

 

The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin.

 
Comparison of Huber loss with other loss functions used for robust regression.

Motivation edit

Two very commonly used loss functions are the squared loss,  , and the absolute loss,  . The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of  's (as in  ), the sample mean is influenced too much by a few particularly large  -values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions.

As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum  ; at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points   and  . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function).

Pseudo-Huber loss function edit

The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which the Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the   value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as[3][4]

 

As such, this function approximates   for small values of  , and approximates a straight line with slope   for large values of  .

While the above is the most common form, other smooth approximations of the Huber loss function also exist.[5]

Variant for classification edit

For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction   (a real-valued classifier score) and a true binary class label  , the modified Huber loss is defined as[6]

 

The term   is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of  .[6]

Applications edit

The Huber loss function is used in robust statistics, M-estimation and additive modelling.[7]

See also edit

References edit

  1. ^ Huber, Peter J. (1964). "Robust Estimation of a Location Parameter". Annals of Statistics. 53 (1): 73–101. doi:10.1214/aoms/1177703732. JSTOR 2238020.
  2. ^ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). . p. 349. Archived from the original on 2015-01-26. Compared to Hastie et al., the loss is scaled by a factor of ½, to be consistent with Huber's original definition given earlier.
  3. ^ Charbonnier, P.; Blanc-Féraud, L.; Aubert, G.; Barlaud, M. (1997). "Deterministic edge-preserving regularization in computed imaging". IEEE Trans. Image Process. 6 (2): 298–311. Bibcode:1997ITIP....6..298C. CiteSeerX 10.1.1.64.7521. doi:10.1109/83.551699. PMID 18282924.
  4. ^ Hartley, R.; Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. p. 619. ISBN 978-0-521-54051-3.
  5. ^ Lange, K. (1990). "Convergence of Image Reconstruction Algorithms with Gibbs Smoothing". IEEE Trans. Med. Imaging. 9 (4): 439–446. doi:10.1109/42.61759. PMID 18222791.
  6. ^ a b Zhang, Tong (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML.
  7. ^ Friedman, J. H. (2001). "Greedy Function Approximation: A Gradient Boosting Machine". Annals of Statistics. 26 (5): 1189–1232. doi:10.1214/aos/1013203451. JSTOR 2699986.

huber, loss, statistics, loss, function, used, robust, regression, that, less, sensitive, outliers, data, than, squared, error, loss, variant, classification, also, sometimes, used, contents, definition, motivation, pseudo, function, variant, classification, a. In statistics the Huber loss is a loss function used in robust regression that is less sensitive to outliers in data than the squared error loss A variant for classification is also sometimes used Contents 1 Definition 2 Motivation 3 Pseudo Huber loss function 4 Variant for classification 5 Applications 6 See also 7 ReferencesDefinition edit nbsp Huber loss green d 1 displaystyle delta 1 nbsp and squared error loss blue as a function of y f x displaystyle y f x nbsp The Huber loss function describes the penalty incurred by an estimation procedure f Huber 1964 defines the loss function piecewise by 1 Ld a 12a2for a d d a 12d otherwise displaystyle L delta a begin cases frac 1 2 a 2 amp text for a leq delta delta cdot left a frac 1 2 delta right amp text otherwise end cases nbsp This function is quadratic for small values of a and linear for large values with equal values and slopes of the different sections at the two points where a d displaystyle a delta nbsp The variable a often refers to the residuals that is to the difference between the observed and predicted values a y f x displaystyle a y f x nbsp so the former can be expanded to 2 Ld y f x 12 y f x 2for y f x d d y f x 12d otherwise displaystyle L delta y f x begin cases frac 1 2 y f x 2 amp text for y f x leq delta delta cdot left y f x frac 1 2 delta right amp text otherwise end cases nbsp The Huber loss is the convolution of the absolute value function with the rectangular function scaled and translated Thus it smoothens out the former s corner at the origin nbsp Comparison of Huber loss with other loss functions used for robust regression Motivation editThis section does not cite any sources Please help improve this section by adding citations to reliable sources Unsourced material may be challenged and removed August 2014 Learn how and when to remove this template message Two very commonly used loss functions are the squared loss L a a2 displaystyle L a a 2 nbsp and the absolute loss L a a displaystyle L a a nbsp The squared loss function results in an arithmetic mean unbiased estimator and the absolute value loss function results in a median unbiased estimator in the one dimensional case and a geometric median unbiased estimator for the multi dimensional case The squared loss has the disadvantage that it has the tendency to be dominated by outliers when summing over a set of a displaystyle a nbsp s as in i 1nL ai textstyle sum i 1 n L a i nbsp the sample mean is influenced too much by a few particularly large a displaystyle a nbsp values when the distribution is heavy tailed in terms of estimation theory the asymptotic relative efficiency of the mean is poor for heavy tailed distributions As defined above the Huber loss function is strongly convex in a uniform neighborhood of its minimum a 0 displaystyle a 0 nbsp at the boundary of this uniform neighborhood the Huber loss function has a differentiable extension to an affine function at points a d displaystyle a delta nbsp and a d displaystyle a delta nbsp These properties allow it to combine much of the sensitivity of the mean unbiased minimum variance estimator of the mean using the quadratic loss function and the robustness of the median unbiased estimator using the absolute value function Pseudo Huber loss function editThe Pseudo Huber loss function can be used as a smooth approximation of the Huber loss function It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target minimum and less steep for extreme values The scale at which the Pseudo Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the d displaystyle delta nbsp value The Pseudo Huber loss function ensures that derivatives are continuous for all degrees It is defined as 3 4 Ld a d2 1 a d 2 1 displaystyle L delta a delta 2 left sqrt 1 a delta 2 1 right nbsp As such this function approximates a2 2 displaystyle a 2 2 nbsp for small values of a displaystyle a nbsp and approximates a straight line with slope d displaystyle delta nbsp for large values of a displaystyle a nbsp While the above is the most common form other smooth approximations of the Huber loss function also exist 5 Variant for classification editFor classification purposes a variant of the Huber loss called modified Huber is sometimes used Given a prediction f x displaystyle f x nbsp a real valued classifier score and a true binary class label y 1 1 displaystyle y in 1 1 nbsp the modified Huber loss is defined as 6 L y f x max 0 1 yf x 2foryf x gt 1 4yf x otherwise displaystyle L y f x begin cases max 0 1 y f x 2 amp textrm for y f x gt 1 4y f x amp textrm otherwise end cases nbsp The term max 0 1 yf x displaystyle max 0 1 y f x nbsp is the hinge loss used by support vector machines the quadratically smoothed hinge loss is a generalization of L displaystyle L nbsp 6 Applications editThe Huber loss function is used in robust statistics M estimation and additive modelling 7 See also editWinsorizing Robust regression M estimator Visual comparison of different M estimatorsReferences edit Huber Peter J 1964 Robust Estimation of a Location Parameter Annals of Statistics 53 1 73 101 doi 10 1214 aoms 1177703732 JSTOR 2238020 Hastie Trevor Tibshirani Robert Friedman Jerome 2009 The Elements of Statistical Learning p 349 Archived from the original on 2015 01 26 Compared to Hastie et al the loss is scaled by a factor of to be consistent with Huber s original definition given earlier Charbonnier P Blanc Feraud L Aubert G Barlaud M 1997 Deterministic edge preserving regularization in computed imaging IEEE Trans Image Process 6 2 298 311 Bibcode 1997ITIP 6 298C CiteSeerX 10 1 1 64 7521 doi 10 1109 83 551699 PMID 18282924 Hartley R Zisserman A 2003 Multiple View Geometry in Computer Vision 2nd ed Cambridge University Press p 619 ISBN 978 0 521 54051 3 Lange K 1990 Convergence of Image Reconstruction Algorithms with Gibbs Smoothing IEEE Trans Med Imaging 9 4 439 446 doi 10 1109 42 61759 PMID 18222791 a b Zhang Tong 2004 Solving large scale linear prediction problems using stochastic gradient descent algorithms ICML Friedman J H 2001 Greedy Function Approximation A Gradient Boosting Machine Annals of Statistics 26 5 1189 1232 doi 10 1214 aos 1013203451 JSTOR 2699986 Retrieved from https en wikipedia org w index php title Huber loss amp oldid 1214307792, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.