fbpx
Wikipedia

Affinity propagation

In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points.[1] Unlike clustering algorithms such as k-means or k-medoids, affinity propagation does not require the number of clusters to be determined or estimated before running the algorithm. Similar to k-medoids, affinity propagation finds "exemplars," members of the input set that are representative of clusters.[1]

Algorithm Edit

Let x1 through xn be a set of data points, with no assumptions made about their internal structure, and let s be a function that quantifies the similarity between any two points, such that s(i, j) > s(i, k) iff xi is more similar to xj than to xk. For this example, the negative squared distance of two data points was used i.e. for points xi and xk,  [1]

The diagonal of s (i.e.  ) is particularly important, as it represents the instance preference, meaning how likely a particular instance is to become an exemplar. When it is set to the same value for all inputs, it controls how many classes the algorithm produces. A value close to the minimum possible similarity produces fewer classes, while a value close to or larger than the maximum possible similarity produces many classes. It is typically initialized to the median similarity of all pairs of inputs.

The algorithm proceeds by alternating between two message-passing steps, which update two matrices:[1]

  • The "responsibility" matrix R has values r(i, k) that quantify how well-suited xk is to serve as the exemplar for xi, relative to other candidate exemplars for xi.
  • The "availability" matrix A contains values a(i, k) that represent how "appropriate" it would be for xi to pick xk as its exemplar, taking into account other points' preference for xk as an exemplar.

Both matrices are initialized to all zeroes, and can be viewed as log-probability tables. The algorithm then performs the following updates iteratively:

  • First, responsibility updates are sent around:  
  • Then, availability is updated per
  for   and
 .

Iterations are performed until either the cluster boundaries remain unchanged over a number of iterations, or some predetermined number (of iterations) is reached. The exemplars are extracted from the final matrices as those whose 'responsibility + availability' for themselves is positive (i.e.  ).

Applications Edit

The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, e.g. clustering of pictures of human faces and identifying regulated transcripts, than k-means,[1] even when k-means was allowed many random restarts and initialized using PCA.[2] A study comparing affinity propagation and Markov clustering on protein interaction graph partitioning found Markov clustering to work better for that problem.[3] A semi-supervised variant has been proposed for text mining applications.[4] Another recent application was in economics, when the affinity propagation was used to find some temporal patterns in the output multipliers of the US economy between 1997 and 2017.[5]

Software Edit

  • A Java implementation is included in the ELKI data mining framework.
  • A Julia implementation of affinity propagation is contained in Julia Statistics's Clustering.jl package.
  • A Python version is part of the scikit-learn library.
  • An R implementation is available in the "apcluster" package.

References Edit

  1. ^ a b c d e Brendan J. Frey; Delbert Dueck (2007). "Clustering by passing messages between data points". Science. 315 (5814): 972–976. Bibcode:2007Sci...315..972F. CiteSeerX 10.1.1.121.3145. doi:10.1126/science.1136800. PMID 17218491. S2CID 6502291.
  2. ^ Delbert Dueck; Brendan J. Frey (2007). Non-metric affinity propagation for unsupervised image categorization. Int'l Conf. on Computer Vision. doi:10.1109/ICCV.2007.4408853.
  3. ^ James Vlasblom; Shoshana Wodak (2009). "Markov clustering versus affinity propagation for the partitioning of protein interaction graphs". BMC Bioinformatics. 10 (1): 99. doi:10.1186/1471-2105-10-99. PMC 2682798. PMID 19331680.
  4. ^ Renchu Guan; Xiaohu Shi; Maurizio Marchese; Chen Yang; Yanchun Liang (2011). "Text Clustering with Seeds Affinity Propagation". IEEE Transactions on Knowledge & Data Engineering. 23 (4): 627–637. doi:10.1109/tkde.2010.144. hdl:11572/89884. S2CID 14053903.
  5. ^ Almeida, Lucas Milanez de Lima; Balanco, Paulo Antonio de Freitas (2020-06-01). "Application of multivariate analysis as complementary instrument in studies about structural changes: An example of the multipliers in the US economy". Structural Change and Economic Dynamics. 53: 189–207. doi:10.1016/j.strueco.2020.02.006. ISSN 0954-349X. S2CID 216406772.

affinity, propagation, statistics, data, mining, affinity, propagation, clustering, algorithm, based, concept, message, passing, between, data, points, unlike, clustering, algorithms, such, means, medoids, affinity, propagation, does, require, number, clusters. In statistics and data mining affinity propagation AP is a clustering algorithm based on the concept of message passing between data points 1 Unlike clustering algorithms such as k means or k medoids affinity propagation does not require the number of clusters to be determined or estimated before running the algorithm Similar to k medoids affinity propagation finds exemplars members of the input set that are representative of clusters 1 Contents 1 Algorithm 2 Applications 3 Software 4 ReferencesAlgorithm EditLet x1 through xn be a set of data points with no assumptions made about their internal structure and let s be a function that quantifies the similarity between any two points such that s i j gt s i k iff xi is more similar to xj than to xk For this example the negative squared distance of two data points was used i e for points xi and xk s i k x i x k 2 displaystyle s i k left x i x k right 2 1 The diagonal of s i e s i i displaystyle s i i is particularly important as it represents the instance preference meaning how likely a particular instance is to become an exemplar When it is set to the same value for all inputs it controls how many classes the algorithm produces A value close to the minimum possible similarity produces fewer classes while a value close to or larger than the maximum possible similarity produces many classes It is typically initialized to the median similarity of all pairs of inputs The algorithm proceeds by alternating between two message passing steps which update two matrices 1 The responsibility matrix R has values r i k that quantify how well suited xk is to serve as the exemplar for xi relative to other candidate exemplars for xi The availability matrix A contains values a i k that represent how appropriate it would be for xi to pick xk as its exemplar taking into account other points preference for xk as an exemplar Both matrices are initialized to all zeroes and can be viewed as log probability tables The algorithm then performs the following updates iteratively First responsibility updates are sent around r i k s i k max k k a i k s i k displaystyle r i k leftarrow s i k max k neq k left a i k s i k right Then availability is updated pera i k min 0 r k k i i k max 0 r i k displaystyle a i k leftarrow min left 0 r k k sum i not in i k max 0 r i k right for i k displaystyle i neq k and a k k i k max 0 r i k displaystyle a k k leftarrow sum i neq k max 0 r i k dd Iterations are performed until either the cluster boundaries remain unchanged over a number of iterations or some predetermined number of iterations is reached The exemplars are extracted from the final matrices as those whose responsibility availability for themselves is positive i e r i i a i i gt 0 displaystyle r i i a i i gt 0 Applications EditThe inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks e g clustering of pictures of human faces and identifying regulated transcripts than k means 1 even when k means was allowed many random restarts and initialized using PCA 2 A study comparing affinity propagation and Markov clustering on protein interaction graph partitioning found Markov clustering to work better for that problem 3 A semi supervised variant has been proposed for text mining applications 4 Another recent application was in economics when the affinity propagation was used to find some temporal patterns in the output multipliers of the US economy between 1997 and 2017 5 Software EditA Java implementation is included in the ELKI data mining framework A Julia implementation of affinity propagation is contained in Julia Statistics s Clustering jl package A Python version is part of the scikit learn library An R implementation is available in the apcluster package References Edit a b c d e Brendan J Frey Delbert Dueck 2007 Clustering by passing messages between data points Science 315 5814 972 976 Bibcode 2007Sci 315 972F CiteSeerX 10 1 1 121 3145 doi 10 1126 science 1136800 PMID 17218491 S2CID 6502291 Delbert Dueck Brendan J Frey 2007 Non metric affinity propagation for unsupervised image categorization Int l Conf on Computer Vision doi 10 1109 ICCV 2007 4408853 James Vlasblom Shoshana Wodak 2009 Markov clustering versus affinity propagation for the partitioning of protein interaction graphs BMC Bioinformatics 10 1 99 doi 10 1186 1471 2105 10 99 PMC 2682798 PMID 19331680 Renchu Guan Xiaohu Shi Maurizio Marchese Chen Yang Yanchun Liang 2011 Text Clustering with Seeds Affinity Propagation IEEE Transactions on Knowledge amp Data Engineering 23 4 627 637 doi 10 1109 tkde 2010 144 hdl 11572 89884 S2CID 14053903 Almeida Lucas Milanez de Lima Balanco Paulo Antonio de Freitas 2020 06 01 Application of multivariate analysis as complementary instrument in studies about structural changes An example of the multipliers in the US economy Structural Change and Economic Dynamics 53 189 207 doi 10 1016 j strueco 2020 02 006 ISSN 0954 349X S2CID 216406772 Retrieved from https en wikipedia org w index php title Affinity propagation amp oldid 1166488165, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.