fbpx
Wikipedia

Hierarchical Dirichlet process

In statistics and machine learning, the hierarchical Dirichlet process (HDP) is a nonparametric Bayesian approach to clustering grouped data.[1][2] It uses a Dirichlet process for each group of data, with the Dirichlet processes for all groups sharing a base distribution which is itself drawn from a Dirichlet process. This method allows groups to share statistical strength via sharing of clusters across groups. The base distribution being drawn from a Dirichlet process is important, because draws from a Dirichlet process are atomic probability measures, and the atoms will appear in all group-level Dirichlet processes. Since each atom corresponds to a cluster, clusters are shared across all groups. It was developed by Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David Blei and published in the Journal of the American Statistical Association in 2006,[1] as a formalization and generalization of the infinite hidden Markov model published in 2002.[3]

Model

This model description is sourced from.[1] The HDP is a model for grouped data. What this means is that the data items come in multiple distinct groups. For example, in a topic model words are organized into documents, with each document formed by a bag (group) of words (data items). Indexing groups by  , suppose each group consist of data items  .

The HDP is parameterized by a base distribution   that governs the a priori distribution over data items, and a number of concentration parameters that govern the a priori number of clusters and amount of sharing across groups. The  th group is associated with a random probability measure   which has distribution given by a Dirichlet process:

 

where   is the concentration parameter associated with the group, and   is the base distribution shared across all groups. In turn, the common base distribution is Dirichlet process distributed:

 

with concentration parameter   and base distribution  . Finally, to relate the Dirichlet processes back with the observed data, each data item   is associated with a latent parameter  :

 

The first line states that each parameter has a prior distribution given by  , while the second line states that each data item has a distribution   parameterized by its associated parameter. The resulting model above is called a HDP mixture model, with the HDP referring to the hierarchically linked set of Dirichlet processes, and the mixture model referring to the way the Dirichlet processes are related to the data items.

To understand how the HDP implements a clustering model, and how clusters become shared across groups, recall that draws from a Dirichlet process are atomic probability measures with probability one. This means that the common base distribution   has a form which can be written as:

 

where there are an infinite number of atoms,  , assuming that the overall base distribution   has infinite support. Each atom is associated with a mass  . The masses have to sum to one since   is a probability measure. Since   is itself the base distribution for the group specific Dirichlet processes, each   will have atoms given by the atoms of  , and can itself be written in the form:

 

Thus the set of atoms is shared across all groups, with each group having its own group-specific atom masses. Relating this representation back to the observed data, we see that each data item is described by a mixture model:

 

where the atoms   play the role of the mixture component parameters, while the masses   play the role of the mixing proportions. In conclusion, each group of data is modeled using a mixture model, with mixture components shared across all groups but mixing proportions being group-specific. In clustering terms, we can interpret each mixture component as modeling a cluster of data items, with clusters shared across all groups, and each group, having its own mixing proportions, composed of different combinations of clusters.

Applications

The HDP mixture model is a natural nonparametric generalization of Latent Dirichlet allocation, where the number of topics can be unbounded and learnt from data.[1] Here each group is a document consisting of a bag of words, each cluster is a topic, and each document is a mixture of topics. The HDP is also a core component of the infinite hidden Markov model,[3] which is a nonparametric generalization of the hidden Markov model allowing the number of states to be unbounded and learnt from data.[1][4]

Generalizations

The HDP can be generalized in a number of directions. The Dirichlet processes can be replaced by Pitman-Yor processes and Gamma processes, resulting in the Hierarchical Pitman-Yor process and Hierarchical Gamma process. The hierarchy can be deeper, with multiple levels of groups arranged in a hierarchy. Such an arrangement has been exploited in the sequence memoizer, a Bayesian nonparametric model for sequences which has a multi-level hierarchy of Pitman-Yor processes. In addition, Bayesian Multi-Domain Learning (BMDL) model derives domain-dependent latent representations of overdispersed count data based on hierarchical negative binomial factorization for accurate cancer subtyping even if the number of samples for a specific cancer type is small.[5]

See also

References

  1. ^ a b c d e Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M. (2006). "Hierarchical Dirichlet Processes" (PDF). Journal of the American Statistical Association. 101 (476): pp. 1566–1581. CiteSeerX 10.1.1.5.9094. doi:10.1198/016214506000000302. S2CID 7934949.
  2. ^ Teh, Y. W.; Jordan, M. I. (2010). Hierarchical Bayesian Nonparametric Models with Applications (PDF). Bayesian Nonparametrics. Cambridge University Press. pp. 158–207. CiteSeerX 10.1.1.157.9451. doi:10.1017/CBO9780511802478.006. ISBN 9780511802478.
  3. ^ a b Beal, M.J., Ghahramani, Z. and Rasmussen, C.E. (2002). "The infinite hidden Markov model" (PDF). Advances in Neural Information Processing Systems 14:577–585. Cambridge, MA: MIT Press.
  4. ^ Fox, Emily B., et al. "A sticky HDP-HMM with application to speaker diarization." The Annals of Applied Statistics (2011): 1020-1056.
  5. ^ Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. "Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data" (PDF). 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.

hierarchical, dirichlet, process, this, article, multiple, issues, please, help, improve, discuss, these, issues, talk, page, learn, when, remove, these, template, messages, this, article, relies, excessively, references, primary, sources, please, improve, thi. This article has multiple issues Please help improve it or discuss these issues on the talk page Learn how and when to remove these template messages This article relies excessively on references to primary sources Please improve this article by adding secondary or tertiary sources Find sources Hierarchical Dirichlet process news newspapers books scholar JSTOR February 2012 Learn how and when to remove this template message This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Hierarchical Dirichlet process news newspapers books scholar JSTOR February 2012 Learn how and when to remove this template message Learn how and when to remove this template message In statistics and machine learning the hierarchical Dirichlet process HDP is a nonparametric Bayesian approach to clustering grouped data 1 2 It uses a Dirichlet process for each group of data with the Dirichlet processes for all groups sharing a base distribution which is itself drawn from a Dirichlet process This method allows groups to share statistical strength via sharing of clusters across groups The base distribution being drawn from a Dirichlet process is important because draws from a Dirichlet process are atomic probability measures and the atoms will appear in all group level Dirichlet processes Since each atom corresponds to a cluster clusters are shared across all groups It was developed by Yee Whye Teh Michael I Jordan Matthew J Beal and David Blei and published in the Journal of the American Statistical Association in 2006 1 as a formalization and generalization of the infinite hidden Markov model published in 2002 3 Contents 1 Model 2 Applications 3 Generalizations 4 See also 5 ReferencesModel EditThis model description is sourced from 1 The HDP is a model for grouped data What this means is that the data items come in multiple distinct groups For example in a topic model words are organized into documents with each document formed by a bag group of words data items Indexing groups by j 1 J displaystyle j 1 J suppose each group consist of data items x j 1 x j n displaystyle x j1 x jn The HDP is parameterized by a base distribution H displaystyle H that governs the a priori distribution over data items and a number of concentration parameters that govern the a priori number of clusters and amount of sharing across groups The j displaystyle j th group is associated with a random probability measure G j displaystyle G j which has distribution given by a Dirichlet process G j G 0 DP a j G 0 displaystyle begin aligned G j G 0 amp sim operatorname DP alpha j G 0 end aligned where a j displaystyle alpha j is the concentration parameter associated with the group and G 0 displaystyle G 0 is the base distribution shared across all groups In turn the common base distribution is Dirichlet process distributed G 0 DP a 0 H displaystyle begin aligned G 0 amp sim operatorname DP alpha 0 H end aligned with concentration parameter a 0 displaystyle alpha 0 and base distribution H displaystyle H Finally to relate the Dirichlet processes back with the observed data each data item x j i displaystyle x ji is associated with a latent parameter 8 j i displaystyle theta ji 8 j i G j G j x j i 8 j i F 8 j i displaystyle begin aligned theta ji G j amp sim G j x ji theta ji amp sim F theta ji end aligned The first line states that each parameter has a prior distribution given by G j displaystyle G j while the second line states that each data item has a distribution F 8 j i displaystyle F theta ji parameterized by its associated parameter The resulting model above is called a HDP mixture model with the HDP referring to the hierarchically linked set of Dirichlet processes and the mixture model referring to the way the Dirichlet processes are related to the data items To understand how the HDP implements a clustering model and how clusters become shared across groups recall that draws from a Dirichlet process are atomic probability measures with probability one This means that the common base distribution G 0 displaystyle G 0 has a form which can be written as G 0 k 1 p 0 k d 8 k displaystyle begin aligned G 0 amp sum k 1 infty pi 0k delta theta k end aligned where there are an infinite number of atoms 8 k k 1 2 displaystyle theta k k 1 2 assuming that the overall base distribution H displaystyle H has infinite support Each atom is associated with a mass p 0 k displaystyle pi 0k The masses have to sum to one since G 0 displaystyle G 0 is a probability measure Since G 0 displaystyle G 0 is itself the base distribution for the group specific Dirichlet processes each G j displaystyle G j will have atoms given by the atoms of G 0 displaystyle G 0 and can itself be written in the form G j k 1 p j k d 8 k displaystyle begin aligned G j amp sum k 1 infty pi jk delta theta k end aligned Thus the set of atoms is shared across all groups with each group having its own group specific atom masses Relating this representation back to the observed data we see that each data item is described by a mixture model x j i G j k 1 p j k F 8 k displaystyle begin aligned x ji G j amp sim sum k 1 infty pi jk F theta k end aligned where the atoms 8 k displaystyle theta k play the role of the mixture component parameters while the masses p j k displaystyle pi jk play the role of the mixing proportions In conclusion each group of data is modeled using a mixture model with mixture components shared across all groups but mixing proportions being group specific In clustering terms we can interpret each mixture component as modeling a cluster of data items with clusters shared across all groups and each group having its own mixing proportions composed of different combinations of clusters Applications EditThe HDP mixture model is a natural nonparametric generalization of Latent Dirichlet allocation where the number of topics can be unbounded and learnt from data 1 Here each group is a document consisting of a bag of words each cluster is a topic and each document is a mixture of topics The HDP is also a core component of the infinite hidden Markov model 3 which is a nonparametric generalization of the hidden Markov model allowing the number of states to be unbounded and learnt from data 1 4 Generalizations EditThe HDP can be generalized in a number of directions The Dirichlet processes can be replaced by Pitman Yor processes and Gamma processes resulting in the Hierarchical Pitman Yor process and Hierarchical Gamma process The hierarchy can be deeper with multiple levels of groups arranged in a hierarchy Such an arrangement has been exploited in the sequence memoizer a Bayesian nonparametric model for sequences which has a multi level hierarchy of Pitman Yor processes In addition Bayesian Multi Domain Learning BMDL model derives domain dependent latent representations of overdispersed count data based on hierarchical negative binomial factorization for accurate cancer subtyping even if the number of samples for a specific cancer type is small 5 See also EditChinese Restaurant ProcessReferences Edit Scholia has a topic profile for Hierarchical Dirichlet process a b c d e Teh Y W Jordan M I Beal M J Blei D M 2006 Hierarchical Dirichlet Processes PDF Journal of the American Statistical Association 101 476 pp 1566 1581 CiteSeerX 10 1 1 5 9094 doi 10 1198 016214506000000302 S2CID 7934949 Teh Y W Jordan M I 2010 Hierarchical Bayesian Nonparametric Models with Applications PDF Bayesian Nonparametrics Cambridge University Press pp 158 207 CiteSeerX 10 1 1 157 9451 doi 10 1017 CBO9780511802478 006 ISBN 9780511802478 a b Beal M J Ghahramani Z and Rasmussen C E 2002 The infinite hidden Markov model PDF Advances in Neural Information Processing Systems 14 577 585 Cambridge MA MIT Press Fox Emily B et al A sticky HDP HMM with application to speaker diarization The Annals of Applied Statistics 2011 1020 1056 Hajiramezanali E amp Dadaneh S Z amp Karbalayghareh A amp Zhou Z amp Qian X Bayesian multi domain learning for cancer subtype discovery from next generation sequencing count data PDF 32nd Conference on Neural Information Processing Systems NIPS 2018 Montreal Canada Retrieved from https en wikipedia org w index php title Hierarchical Dirichlet process amp oldid 1100589533, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.