fbpx
Wikipedia

k q-flats

In data mining and machine learning, k q-flats algorithm[1][2] is an iterative method which aims to partition m observations into k clusters where each cluster is close to a q-flat, where q is a given integer.

It is a generalization of the k-means algorithm. In k-means algorithm, clusters are formed in the way that each cluster is close to one point, which is a 0-flat. k q-flats algorithm gives better clustering result than k-means algorithm for some data set.

Description Edit

Problem formulation Edit

Given a set A of m observations   where each observation   is an n-dimensional real vector, k q-flats algorithm aims to partition m observation points by generating k q-flats that minimize the sum of the squares of distances of each observation to a nearest q-flat.

A q-flat is a subset of   that is congruent to  . For example, a 0-flat is a point; a 1-flat is a line; a 2-flat is a plane; a  -flat is a hyperplane. q-flat can be characterized by the solution set of a linear system of equations:  , where  ,  .

Denote a partition of   as  . The problem can be formulated as

 

 

 

 

 

(P1)

where   is the projection of   onto  . Note that   is the distance from   to  .

Algorithm Edit

The algorithm is similar to the k-means algorithm (i.e. Lloyd's algorithm) in that it alternates between cluster assignment and cluster update. In specific, the algorithm starts with an initial set of q-flats  , and proceeds by alternating between the following two steps:

Cluster Assignment
(given q-flats, assign each point to closest q-flat): the i-th cluster is updated as
 
Cluster Update
(given cluster assignment, update the q-flats): For  , let   with rows corresponding to all   assigned to cluster l. Set   to be the matrix whose columns are the orthonormal eigenvectors corresponding to the   least eigenvalues of   and  .

Stop whenever the assignments no longer change.

The cluster assignment step uses the following fact: given a q-flat   and a vector a, where  , the distance from a to the q-flat   is  

The key part of this algorithm is how to update the cluster, i.e. given m points, how to find a q-flat that minimizes the sum of squares of distances of each point to the q-flat. Mathematically, this problem is: given   solve the quadratic optimization problem

  subject to  

 

 

 

 

(P2)

where   is given, and  .

The problem can be solved using Lagrangian multiplier method and the solution is as given in the cluster update step.

It can be shown that the algorithm will terminate in a finite number of iterations (no more than the total number of possible assignments, which is bounded by  ). In addition, the algorithm will terminate at a point that the overall objective cannot be decreased either by a different assignment or by defining new cluster planes for these clusters (such point is called "locally optimal" in the references).

This convergence result is a consequence of the fact that problem (P2) can be solved exactly. The same convergence result holds for k-means algorithm because the cluster update problem can be solved exactly.

Relation to other machine learning methods Edit

k-means algorithm Edit

k q-flats algorithm is a generalization of k-means algorithm. In fact, k-means algorithm is k 0-flats algorithm since a point is a 0-flat. Despite their connection, they should be used in different scenarios. k q-flats algorithm for the case that data lie in a few low-dimensional spaces. k-means algorithm is desirable for the case the clusters are of the ambient dimension. For example, if all observations lie in two lines, k q-flats algorithm with   may be used; if the observations are two Gaussian clouds, k-means algorithm may be used.

Sparse Dictionary Learning Edit

Natural signals lie in a high-dimensional space. For example, the dimension of a 1024-by-1024 image is about 106, which is far too high for most signal processing algorithms. One way to get rid of the high dimensionality is to find a set of basis functions, such that the high-dimensional signal can be represented by only a few basis functions. In other words, the coefficients of the signal representation lies in a low-dimensional space, which is easier to apply signal processing algorithms. In the literature, wavelet transform is usually used in image processing, and fourier transform is usually used in audio processing. The set of basis functions is usually called a dictionary.

However, it is not clear what is the best dictionary to use once given a signal data set. One popular approach is to find a dictionary when given a data set using the idea of Sparse Dictionary Learning. It aims to find a dictionary, such that the signal can be sparsely represented by the dictionary. The optimization problem can be written as follows.

  subject to  

where

  • X is a d-by-N matrix. Each columns of X represent a signal, and there are total N signals.
  • B is a d-by-l matrix. Each columns of B represent a basis function, and there are total l basis functions in the dictionary.
  • R is a l-by-N matrix.   (i-th columns of R) represent the coefficients when we use the dictionary B to represent the i-th columns of X.
  •   denotes the zero-norm of the vector v.
  •   denotes the Frobenius norm of matrix V.

The idea of k q-flats algorithm is similar to sparse dictionary learning in nature. If we restrict the q-flat to q-dimensional subspace, then the k q-flats algorithm is simply finding the closed q-dimensional subspace to a given signal. Sparse dictionary learning is also doing the same thing, except for an additional constraints on the sparsity of the representation. Mathematically, it is possible to show that k q-flats algorithm is of the form of sparse dictionary learning with an additional block structure on R.

Let   be a   matrix, where columns of   are basis of the k-th flat. Then the projection of the signal x to the k-th flat is  , where   is a q-dimensional coefficient. Let   denote concatenation of basis of K flats, it is easy to show that the k q-flat algorithm is the same as the following.

  subject to   and R has a block structure.

The block structure of R refers the fact that each signal is labeled to only one flat. Comparing the two formulations, k q-flat is the same as sparse dictionary modeling when   and with an additional block structure on R. Users may refer to Szlam's paper[3] for more discussion about the relationship between the two concept.

Applications and variations Edit

Classification Edit

Classification is a procedure that classifies an input signal into different classes. One example is to classify an email into spam or non-spam classes. Classification algorithms usually require a supervised learning stage. In the supervised learning stage, training data for each class is used for the algorithm to learn the characteristics of the class. In the classification stage, a new observation is classified into a class by using the characteristics that were already trained.

k q-flat algorithm can be used for classification. Suppose there are total of m classes. For each class, k flats are trained a priori via training data set. When a new data comes, find the flat that is closest to the new data. Then the new data is associate to class of the closest flat.

However, the classification performance can be further improved if we impose some structure on the flats. One possible choice is to require different flats from different class be sufficiently far apart. Some researchers[4] use this idea and develop a discriminative k q-flat algorithm.

K-metrics [3] Edit

In k q-flats algorithm,   is used to measure the representation error.   denotes the projection of x to the flat F. If data lies in a q-dimension flat, then a single q-flat can represent the data very well. On the contrary, if data lies in a very high dimension space but near a common center, then k-means algorithm is a better way than k q-flat algorithm to represent the data. It is because k-means algorithm use   to measure the error, where   denotes the center. K-metrics is a generalization that use both the idea of flat and mean. In k-metrics, error is measured by the following Mahalanobis metric.

 

where A is a positive semi-definite matrix.

If A is the identity matrix, then the Mahalanobis metric is exactly the same as the error measure used in k-means. If A is not the identity matrix, then   will favor certain directions as the k q-flat error measure.

References Edit

  1. ^ Bradley, P.S.; Mangasarian, O.L. (2000). "k-Plane Clustering". Journal of Global Optimization. 16 (1): 23–32. doi:10.1023/A:1008324625522. S2CID 913034.
  2. ^ Tseng, P. (2000). "Nearest q-Flat to m Points". Journal of Optimization Theory and Applications. 105 (1): 249–252. doi:10.1023/A:1004678431677. ISSN 0022-3239. S2CID 118142932.
  3. ^ a b Szlam, Arthur; Sapiro, Guillermo (2009-06-14). "Discriminative k -metrics". In Bottou, Léon; Littman, Michael (eds.). Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 1009–1016. doi:10.1145/1553374.1553503. hdl:11299/180116. ISBN 978-1-60558-516-1. S2CID 2509292.
  4. ^ Szlam, A.; Sapiro, G. (2008). "Supervised Learning via Discriminative k q-Flats" (PDF).

flats, data, mining, machine, learning, flats, algorithm, iterative, method, which, aims, partition, observations, into, clusters, where, each, cluster, close, flat, where, given, integer, generalization, means, algorithm, means, algorithm, clusters, formed, t. In data mining and machine learning k q flats algorithm 1 2 is an iterative method which aims to partition m observations into k clusters where each cluster is close to a q flat where q is a given integer It is a generalization of the k means algorithm In k means algorithm clusters are formed in the way that each cluster is close to one point which is a 0 flat k q flats algorithm gives better clustering result than k means algorithm for some data set Contents 1 Description 1 1 Problem formulation 1 2 Algorithm 2 Relation to other machine learning methods 2 1 k means algorithm 2 2 Sparse Dictionary Learning 3 Applications and variations 3 1 Classification 3 2 K metrics 3 4 ReferencesDescription EditThis article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details December 2011 Learn how and when to remove this template message Problem formulation Edit Given a set A of m observations a 1 a 2 a m displaystyle a 1 a 2 dots a m nbsp where each observation a i displaystyle a i nbsp is an n dimensional real vector k q flats algorithm aims to partition m observation points by generating k q flats that minimize the sum of the squares of distances of each observation to a nearest q flat A q flat is a subset of R n displaystyle mathbb R n nbsp that is congruent to R q displaystyle mathbb R q nbsp For example a 0 flat is a point a 1 flat is a line a 2 flat is a plane a n 1 displaystyle n 1 nbsp flat is a hyperplane q flat can be characterized by the solution set of a linear system of equations F x x R n W x g displaystyle F left x mid x in mathbb R n W x gamma right nbsp where W R n n q displaystyle W in mathbb R n times n q nbsp g R 1 n q displaystyle gamma in mathbb R 1 times n q nbsp Denote a partition of 1 2 n displaystyle 1 2 dots n nbsp as S S 1 S 2 S k displaystyle S S 1 S 2 dots S k nbsp The problem can be formulated as min F l l 1 k are q flats min S l 1 k a j S i a j P F i a j 2 displaystyle min F l l 1 dots k text are q flats min S sum l 1 k sum a j in S i a j P F i a j 2 nbsp P1 where P F i a j displaystyle P F i a j nbsp is the projection of a j displaystyle a j nbsp onto F i displaystyle F i nbsp Note that a j P F i a j dist a j F l displaystyle a j P F i a j operatorname dist a j F l nbsp is the distance from a j displaystyle a j nbsp to F l displaystyle F l nbsp Algorithm Edit The algorithm is similar to the k means algorithm i e Lloyd s algorithm in that it alternates between cluster assignment and cluster update In specific the algorithm starts with an initial set of q flats F l 0 x R n W l 0 x g l 0 l 1 k displaystyle F l 0 left x in R n mid left W l 0 right x gamma l 0 right l 1 dots k nbsp and proceeds by alternating between the following two steps Cluster Assignment given q flats assign each point to closest q flat the i th cluster is updated as S i t a j W i t a j g i t F min l 1 k W l t a j g l t F displaystyle S i t left a j mid left W i t a j gamma i t right F min l 1 dots k left W l t a j gamma l t right F right nbsp Cluster Update given cluster assignment update the q flats For l 1 k displaystyle l 1 dots k nbsp let A l R m l n displaystyle A l in R m l times n nbsp with rows corresponding to all a i displaystyle a i nbsp assigned to cluster l Set W l t 1 displaystyle W l t 1 nbsp to be the matrix whose columns are the orthonormal eigenvectors corresponding to the n q displaystyle n q nbsp least eigenvalues of A l I e e m A l displaystyle A l left I tfrac ee m right A l nbsp and g l t 1 e A l W l t 1 m displaystyle gamma l t 1 frac e A l W l t 1 m nbsp Stop whenever the assignments no longer change The cluster assignment step uses the following fact given a q flat F l x W x g displaystyle F l x mid W x gamma nbsp and a vector a where W W I displaystyle W W I nbsp the distance from a to the q flat F l displaystyle F l nbsp is dist a F l min x W x g x a F 2 W W W 1 W x g F 2 W x g F 2 textstyle operatorname dist a F l min x W x gamma left x a right F 2 left W W W 1 W x gamma right F 2 left W x gamma right F 2 nbsp The key part of this algorithm is how to update the cluster i e given m points how to find a q flat that minimizes the sum of squares of distances of each point to the q flat Mathematically this problem is given A R m n displaystyle A in R m times n nbsp solve the quadratic optimization problem min W R n n q g R 1 n q A W e g F 2 displaystyle min W in R n times n q gamma in R 1 times n q left AW e gamma right F 2 nbsp subject to W W I displaystyle W W I nbsp P2 where A R m n displaystyle A in mathbb R m times n nbsp is given and e 1 1 R m 1 displaystyle e 1 dots 1 in mathbb R m times 1 nbsp The problem can be solved using Lagrangian multiplier method and the solution is as given in the cluster update step It can be shown that the algorithm will terminate in a finite number of iterations no more than the total number of possible assignments which is bounded by k m displaystyle k m nbsp In addition the algorithm will terminate at a point that the overall objective cannot be decreased either by a different assignment or by defining new cluster planes for these clusters such point is called locally optimal in the references This convergence result is a consequence of the fact that problem P2 can be solved exactly The same convergence result holds for k means algorithm because the cluster update problem can be solved exactly Relation to other machine learning methods Editk means algorithm Edit k q flats algorithm is a generalization of k means algorithm In fact k means algorithm is k 0 flats algorithm since a point is a 0 flat Despite their connection they should be used in different scenarios k q flats algorithm for the case that data lie in a few low dimensional spaces k means algorithm is desirable for the case the clusters are of the ambient dimension For example if all observations lie in two lines k q flats algorithm with q 1 displaystyle q 1 nbsp may be used if the observations are two Gaussian clouds k means algorithm may be used Sparse Dictionary Learning Edit Natural signals lie in a high dimensional space For example the dimension of a 1024 by 1024 image is about 106 which is far too high for most signal processing algorithms One way to get rid of the high dimensionality is to find a set of basis functions such that the high dimensional signal can be represented by only a few basis functions In other words the coefficients of the signal representation lies in a low dimensional space which is easier to apply signal processing algorithms In the literature wavelet transform is usually used in image processing and fourier transform is usually used in audio processing The set of basis functions is usually called a dictionary However it is not clear what is the best dictionary to use once given a signal data set One popular approach is to find a dictionary when given a data set using the idea of Sparse Dictionary Learning It aims to find a dictionary such that the signal can be sparsely represented by the dictionary The optimization problem can be written as follows min B R X B R F 2 displaystyle min B R X BR F 2 nbsp subject to R i 0 q displaystyle R i 0 leq q nbsp where X is a d by N matrix Each columns of X represent a signal and there are total N signals B is a d by l matrix Each columns of B represent a basis function and there are total l basis functions in the dictionary R is a l by N matrix R i displaystyle R i nbsp i th columns of R represent the coefficients when we use the dictionary B to represent the i th columns of X v 0 displaystyle v 0 nbsp denotes the zero norm of the vector v V F displaystyle V F nbsp denotes the Frobenius norm of matrix V The idea of k q flats algorithm is similar to sparse dictionary learning in nature If we restrict the q flat to q dimensional subspace then the k q flats algorithm is simply finding the closed q dimensional subspace to a given signal Sparse dictionary learning is also doing the same thing except for an additional constraints on the sparsity of the representation Mathematically it is possible to show that k q flats algorithm is of the form of sparse dictionary learning with an additional block structure on R Let B k displaystyle B k nbsp be a d q displaystyle d times q nbsp matrix where columns of B k displaystyle B k nbsp are basis of the k th flat Then the projection of the signal x to the k th flat is B k r k displaystyle B k r k nbsp where r k displaystyle r k nbsp is a q dimensional coefficient Let B B 1 B K displaystyle B B 1 cdots B K nbsp denote concatenation of basis of K flats it is easy to show that the k q flat algorithm is the same as the following min B R X B R F 2 displaystyle min B R X BR F 2 nbsp subject to R i 0 q displaystyle R i 0 leq q nbsp and R has a block structure The block structure of R refers the fact that each signal is labeled to only one flat Comparing the two formulations k q flat is the same as sparse dictionary modeling when l K q displaystyle l K times q nbsp and with an additional block structure on R Users may refer to Szlam s paper 3 for more discussion about the relationship between the two concept Applications and variations EditClassification Edit Classification is a procedure that classifies an input signal into different classes One example is to classify an email into spam or non spam classes Classification algorithms usually require a supervised learning stage In the supervised learning stage training data for each class is used for the algorithm to learn the characteristics of the class In the classification stage a new observation is classified into a class by using the characteristics that were already trained k q flat algorithm can be used for classification Suppose there are total of m classes For each class k flats are trained a priori via training data set When a new data comes find the flat that is closest to the new data Then the new data is associate to class of the closest flat However the classification performance can be further improved if we impose some structure on the flats One possible choice is to require different flats from different class be sufficiently far apart Some researchers 4 use this idea and develop a discriminative k q flat algorithm K metrics 3 Edit In k q flats algorithm x P F x 2 displaystyle x P F x 2 nbsp is used to measure the representation error P F x displaystyle P F x nbsp denotes the projection of x to the flat F If data lies in a q dimension flat then a single q flat can represent the data very well On the contrary if data lies in a very high dimension space but near a common center then k means algorithm is a better way than k q flat algorithm to represent the data It is because k means algorithm use x x c 2 displaystyle x x c 2 nbsp to measure the error where x c displaystyle x c nbsp denotes the center K metrics is a generalization that use both the idea of flat and mean In k metrics error is measured by the following Mahalanobis metric x y A 2 x y T A x y displaystyle left x y right A 2 x y mathsf T A x y nbsp where A is a positive semi definite matrix If A is the identity matrix then the Mahalanobis metric is exactly the same as the error measure used in k means If A is not the identity matrix then x y A 2 displaystyle x y A 2 nbsp will favor certain directions as the k q flat error measure References Edit Bradley P S Mangasarian O L 2000 k Plane Clustering Journal of Global Optimization 16 1 23 32 doi 10 1023 A 1008324625522 S2CID 913034 Tseng P 2000 Nearest q Flat to m Points Journal of Optimization Theory and Applications 105 1 249 252 doi 10 1023 A 1004678431677 ISSN 0022 3239 S2CID 118142932 a b Szlam Arthur Sapiro Guillermo 2009 06 14 Discriminative k metrics In Bottou Leon Littman Michael eds Proceedings of the 26th Annual International Conference on Machine Learning ACM pp 1009 1016 doi 10 1145 1553374 1553503 hdl 11299 180116 ISBN 978 1 60558 516 1 S2CID 2509292 Szlam A Sapiro G 2008 Supervised Learning via Discriminative k q Flats PDF Retrieved from https en wikipedia org w index php title K q flats amp oldid 1169213307, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.