fbpx
Wikipedia

Large margin nearest neighbor

Large margin nearest neighbor (LMNN)[1] classification is a statistical machine learning algorithm for metric learning. It learns a pseudometric designed for k-nearest neighbor classification. The algorithm is based on semidefinite programming, a sub-class of convex optimization.

The goal of supervised learning (more specifically classification) is to learn a decision rule that can categorize data instances into pre-defined classes. The k-nearest neighbor rule assumes a training data set of labeled instances (i.e. the classes are known). It classifies a new data instance with the class obtained from the majority vote of the k closest (labeled) training instances. Closeness is measured with a pre-defined metric. Large margin nearest neighbors is an algorithm that learns this global (pseudo-)metric in a supervised fashion to improve the classification accuracy of the k-nearest neighbor rule.

Setup edit

The main intuition behind LMNN is to learn a pseudometric under which all data instances in the training set are surrounded by at least k instances that share the same class label. If this is achieved, the leave-one-out error (a special case of cross validation) is minimized. Let the training data consist of a data set  , where the set of possible class categories is  .

The algorithm learns a pseudometric of the type

 .

For   to be well defined, the matrix   needs to be positive semi-definite. The Euclidean metric is a special case, where   is the identity matrix. This generalization is often (falsely[citation needed]) referred to as Mahalanobis metric.

Figure 1 illustrates the effect of the metric under varying  . The two circles show the set of points with equal distance to the center  . In the Euclidean case this set is a circle, whereas under the modified (Mahalanobis) metric it becomes an ellipsoid.

 
Figure 1: Schematic illustration of LMNN.

The algorithm distinguishes between two types of special data points: target neighbors and impostors.

Target neighbors edit

Target neighbors are selected before learning. Each instance   has exactly   different target neighbors within  , which all share the same class label  . The target neighbors are the data points that should become nearest neighbors under the learned metric. Let us denote the set of target neighbors for a data point   as  .

Impostors edit

An impostor of a data point   is another data point   with a different class label (i.e.  ) which is one of the nearest neighbors of  . During learning the algorithm tries to minimize the number of impostors for all data instances in the training set.

Algorithm edit

Large margin nearest neighbors optimizes the matrix   with the help of semidefinite programming. The objective is twofold: For every data point  , the target neighbors should be close and the impostors should be far away. Figure 1 shows the effect of such an optimization on an illustrative example. The learned metric causes the input vector   to be surrounded by training instances of the same class. If it was a test point, it would be classified correctly under the   nearest neighbor rule.

The first optimization goal is achieved by minimizing the average distance between instances and their target neighbors

 .

The second goal is achieved by penalizing distances to impostors   that are less than one unit further away than target neighbors   (and therefore pushing them out of the local neighborhood of  ). The resulting value to be minimized can be stated as:

 

With a hinge loss function  , which ensures that impostor proximity is not penalized when outside the margin. The margin of exactly one unit fixes the scale of the matrix  . Any alternative choice   would result in a rescaling of   by a factor of  .

The final optimization problem becomes:

 
 
 
 
 

The hyperparameter   is some positive constant (typically set through cross-validation). Here the variables   (together with two types of constraints) replace the term in the cost function. They play a role similar to slack variables to absorb the extent of violations of the impostor constraints. The last constraint ensures that   is positive semi-definite. The optimization problem is an instance of semidefinite programming (SDP). Although SDPs tend to suffer from high computational complexity, this particular SDP instance can be solved very efficiently due to the underlying geometric properties of the problem. In particular, most impostor constraints are naturally satisfied and do not need to be enforced during runtime (i.e. the set of variables  is sparse). A particularly well suited solver technique is the working set method, which keeps a small set of constraints that are actively enforced and monitors the remaining (likely satisfied) constraints only occasionally to ensure correctness.

Extensions and efficient solvers edit

LMNN was extended to multiple local metrics in the 2008 paper.[2] This extension significantly improves the classification error, but involves a more expensive optimization problem. In their 2009 publication in the Journal of Machine Learning Research,[3] Weinberger and Saul derive an efficient solver for the semi-definite program. It can learn a metric for the MNIST handwritten digit data set in several hours, involving billions of pairwise constraints. An open source Matlab implementation is freely available at the .

Kumal et al.[4] extended the algorithm to incorporate local invariances to multivariate polynomial transformations and improved regularization.

See also edit

References edit

  1. ^ Weinberger, K. Q.; Blitzer J. C.; Saul L. K. (2006). "Distance Metric Learning for Large Margin Nearest Neighbor Classification". Advances in Neural Information Processing Systems. 18: 1473–1480.
  2. ^ Weinberger, K. Q.; Saul L. K. (2008). (PDF). Proceedings of International Conference on Machine Learning: 1160–1167. Archived from the original (PDF) on 2011-07-24. Retrieved 2010-07-14.
  3. ^ Weinberger, K. Q.; Saul L. K. (2009). "Distance Metric Learning for Large Margin Classification" (PDF). Journal of Machine Learning Research. 10: 207–244.
  4. ^ Kumar, M.P.; Torr P.H.S.; Zisserman A. (2007). "An Invariant Large Margin Nearest Neighbour Classifier". 2007 IEEE 11th International Conference on Computer Vision. pp. 1–8. doi:10.1109/ICCV.2007.4409041. ISBN 978-1-4244-1630-1. S2CID 1326101.

External links edit

  • ICML 2010 Tutorial on Metric Learning

large, margin, nearest, neighbor, lmnn, classification, statistical, machine, learning, algorithm, metric, learning, learns, pseudometric, designed, nearest, neighbor, classification, algorithm, based, semidefinite, programming, class, convex, optimization, go. Large margin nearest neighbor LMNN 1 classification is a statistical machine learning algorithm for metric learning It learns a pseudometric designed for k nearest neighbor classification The algorithm is based on semidefinite programming a sub class of convex optimization The goal of supervised learning more specifically classification is to learn a decision rule that can categorize data instances into pre defined classes The k nearest neighbor rule assumes a training data set of labeled instances i e the classes are known It classifies a new data instance with the class obtained from the majority vote of the k closest labeled training instances Closeness is measured with a pre defined metric Large margin nearest neighbors is an algorithm that learns this global pseudo metric in a supervised fashion to improve the classification accuracy of the k nearest neighbor rule Contents 1 Setup 1 1 Target neighbors 1 2 Impostors 2 Algorithm 3 Extensions and efficient solvers 4 See also 5 References 6 External linksSetup editThe main intuition behind LMNN is to learn a pseudometric under which all data instances in the training set are surrounded by at least k instances that share the same class label If this is achieved the leave one out error a special case of cross validation is minimized Let the training data consist of a data set D x 1 y1 x n yn Rd C displaystyle D vec x 1 y 1 dots vec x n y n subset R d times C nbsp where the set of possible class categories is C 1 c displaystyle C 1 dots c nbsp The algorithm learns a pseudometric of the type d x i x j x i x j M x i x j displaystyle d vec x i vec x j vec x i vec x j top mathbf M vec x i vec x j nbsp For d displaystyle d cdot cdot nbsp to be well defined the matrix M displaystyle mathbf M nbsp needs to be positive semi definite The Euclidean metric is a special case where M displaystyle mathbf M nbsp is the identity matrix This generalization is often falsely citation needed referred to as Mahalanobis metric Figure 1 illustrates the effect of the metric under varying M displaystyle mathbf M nbsp The two circles show the set of points with equal distance to the center x i displaystyle vec x i nbsp In the Euclidean case this set is a circle whereas under the modified Mahalanobis metric it becomes an ellipsoid nbsp Figure 1 Schematic illustration of LMNN The algorithm distinguishes between two types of special data points target neighbors and impostors Target neighbors edit Target neighbors are selected before learning Each instance x i displaystyle vec x i nbsp has exactly k displaystyle k nbsp different target neighbors within D displaystyle D nbsp which all share the same class label yi displaystyle y i nbsp The target neighbors are the data points that should become nearest neighbors under the learned metric Let us denote the set of target neighbors for a data point x i displaystyle vec x i nbsp as Ni displaystyle N i nbsp Impostors edit An impostor of a data point x i displaystyle vec x i nbsp is another data point x j displaystyle vec x j nbsp with a different class label i e yi yj displaystyle y i neq y j nbsp which is one of the nearest neighbors of x i displaystyle vec x i nbsp During learning the algorithm tries to minimize the number of impostors for all data instances in the training set Algorithm editLarge margin nearest neighbors optimizes the matrix M displaystyle mathbf M nbsp with the help of semidefinite programming The objective is twofold For every data point x i displaystyle vec x i nbsp the target neighbors should be close and the impostors should be far away Figure 1 shows the effect of such an optimization on an illustrative example The learned metric causes the input vector x i displaystyle vec x i nbsp to be surrounded by training instances of the same class If it was a test point it would be classified correctly under the k 3 displaystyle k 3 nbsp nearest neighbor rule The first optimization goal is achieved by minimizing the average distance between instances and their target neighbors i j Nid x i x j displaystyle sum i j in N i d vec x i vec x j nbsp The second goal is achieved by penalizing distances to impostors x l displaystyle vec x l nbsp that are less than one unit further away than target neighbors x j displaystyle vec x j nbsp and therefore pushing them out of the local neighborhood of x i displaystyle vec x i nbsp The resulting value to be minimized can be stated as i j Ni l yl yi d x i x j 1 d x i x l displaystyle sum i j in N i l y l neq y i d vec x i vec x j 1 d vec x i vec x l nbsp With a hinge loss function max 0 textstyle cdot max cdot 0 nbsp which ensures that impostor proximity is not penalized when outside the margin The margin of exactly one unit fixes the scale of the matrix M displaystyle M nbsp Any alternative choice c gt 0 displaystyle c gt 0 nbsp would result in a rescaling of M displaystyle M nbsp by a factor of 1 c displaystyle 1 c nbsp The final optimization problem becomes minM i j Nid x i x j l i j l3ijl displaystyle min mathbf M sum i j in N i d vec x i vec x j lambda sum i j l xi ijl nbsp i j Ni l yl yi displaystyle forall i j in N i l y l neq y i nbsp d x i x j 1 d x i x l 3ijl displaystyle d vec x i vec x j 1 d vec x i vec x l leq xi ijl nbsp 3ijl 0 displaystyle xi ijl geq 0 nbsp M 0 displaystyle mathbf M succeq 0 nbsp The hyperparameter l gt 0 textstyle lambda gt 0 nbsp is some positive constant typically set through cross validation Here the variables 3ijl displaystyle xi ijl nbsp together with two types of constraints replace the term in the cost function They play a role similar to slack variables to absorb the extent of violations of the impostor constraints The last constraint ensures that M displaystyle mathbf M nbsp is positive semi definite The optimization problem is an instance of semidefinite programming SDP Although SDPs tend to suffer from high computational complexity this particular SDP instance can be solved very efficiently due to the underlying geometric properties of the problem In particular most impostor constraints are naturally satisfied and do not need to be enforced during runtime i e the set of variables 3ijl displaystyle xi ijl nbsp is sparse A particularly well suited solver technique is the working set method which keeps a small set of constraints that are actively enforced and monitors the remaining likely satisfied constraints only occasionally to ensure correctness Extensions and efficient solvers editLMNN was extended to multiple local metrics in the 2008 paper 2 This extension significantly improves the classification error but involves a more expensive optimization problem In their 2009 publication in the Journal of Machine Learning Research 3 Weinberger and Saul derive an efficient solver for the semi definite program It can learn a metric for the MNIST handwritten digit data set in several hours involving billions of pairwise constraints An open source Matlab implementation is freely available at the authors web page Kumal et al 4 extended the algorithm to incorporate local invariances to multivariate polynomial transformations and improved regularization See also editSimilarity learning Linear discriminant analysis Learning vector quantization Pseudometric space Nearest neighbor search Cluster analysis Data classification Data mining Machine learning Pattern recognition Predictive analytics Dimension reduction Neighbourhood components analysisReferences edit Weinberger K Q Blitzer J C Saul L K 2006 Distance Metric Learning for Large Margin Nearest Neighbor Classification Advances in Neural Information Processing Systems 18 1473 1480 Weinberger K Q Saul L K 2008 Fast solvers and efficient implementations for distance metric learning PDF Proceedings of International Conference on Machine Learning 1160 1167 Archived from the original PDF on 2011 07 24 Retrieved 2010 07 14 Weinberger K Q Saul L K 2009 Distance Metric Learning for Large Margin Classification PDF Journal of Machine Learning Research 10 207 244 Kumar M P Torr P H S Zisserman A 2007 An Invariant Large Margin Nearest Neighbour Classifier 2007 IEEE 11th International Conference on Computer Vision pp 1 8 doi 10 1109 ICCV 2007 4409041 ISBN 978 1 4244 1630 1 S2CID 1326101 External links editMatlab Implementation ICML 2010 Tutorial on Metric Learning Retrieved from https en wikipedia org w index php title Large margin nearest neighbor amp oldid 1166863593, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.