fbpx
Wikipedia

Chinese restaurant process

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

The restaurant analogy first appeared in a 1985 write-up by David Aldous,[1] where it was attributed to Jim Pitman (who additionally credits Lester Dubins).[2]

Formal definition edit

For any positive integer  , let   denote the set of all partitions of the set  . The Chinese restaurant process takes values in the infinite Cartesian product  .

The value of the process at time   is a partition   of the set  , whose probability distribution is determined as follows. At time  , the trivial partition   is obtained (with probability one). At time   the element " " is either:

  1. added to one of the blocks of the partition  , where each block is chosen with probability   where   is the size of the block (i.e. number of elements), or
  2. added to the partition   as a new singleton block, with probability  .

The random partition so generated has some special properties. It is exchangeable in the sense that relabeling   does not change the distribution of the partition, and it is consistent in the sense that the law of the partition of   obtained by removing the element   from the random partition   is the same as the law of the random partition  .

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

 

where   is a block in the partition   and   is the size of  .

The definition can be generalized by introducing a parameter   which modifies the probability of the new customer sitting at a new table to   and correspondingly modifies the probability of them sitting at a table of size   to  . The vanilla process introduced above can be recovered by setting  . Intuitively,   can be interpreted as the effective number of customers sitting at the first empty table.

Alternative definition edit

An equivalent, but subtly different way to define the Chinese restaurant process, is to let new customers choose companions rather than tables.[3] Customer   chooses to sit at the same table as any one of the   seated customers with probability  , or chooses to sit at a new, unoccupied table with probability  . Notice that in this formulation, the customer chooses a table without having to count table occupancies---we don't need  .

Distribution of the number of tables edit

Chinese restaurant table
Parameters

 

 
Support  
PMF  
Mean  
(see digamma function)

The Chinese restaurant table distribution (CRT) is the probability distribution on the number of tables in the Chinese restaurant process.[4] It can be understood as the sum of   independent Bernoulli random variables, each with a different parameter:

 

The probability mass function of   is given by [5]

 

where   denotes Stirling numbers of the first kind.

Generalization edit

This construction can be generalized to a model with two parameters,   &  ,[2][6] commonly called the strength (or concentration) and discount parameters respectively. At time  , the next customer to arrive finds   occupied tables and decides to sit at an empty table with probability

 

or at an occupied table   of size   with probability

 

In order for the construction to define a valid probability measure it is necessary to suppose that either   and   for some  ; or that   and  .

Under this model the probability assigned to any particular partition   of  , in terms of the Pochhammer k-symbol, is

 

where, by convention,  , and for  

 

Thus, for the case when   the partition probability can be expressed in terms of the Gamma function as

 

In the one-parameter case, where   is zero, this simplifies to

 

Or, when   is zero,

 

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.

If  , the probability distribution of the random partition of the integer   thus generated is the Ewens distribution with parameter  , used in population genetics and the unified neutral theory of biodiversity.

Animation of a Chinese restaurant process with scaling parameter  . Tables are hidden once the customers of a table can not be displayed anymore; however, every table has infinitely many seats. (Recording of an interactive animation.[7])

Derivation edit

Here is one way to derive this partition probability. Let   be the random block into which the number   is added, for  . Then

 

The probability that   is any particular partition of the set   is the product of these probabilities as   runs from   to  . Now consider the size of block  : it increases by one each time we add one element into it. When the last element in block   is to be added in, the block size is  . For example, consider this sequence of choices: (generate a new block  )(join  )(join  )(join  ). In the end, block   has 4 elements and the product of the numerators in the above equation gets  . Following this logic, we obtain   as above.

Expected number of tables edit

For the one parameter case, with   and  , the number of tables is distributed according to the chinese restaurant table distribution. The expected value of this random variable, given that there are   seated customers, is[8]

 

where   is the digamma function. In the general case ( ) the expected number of occupied tables is[6]

 

however, note that the   function here is not the standard gamma function.[6]

The Indian buffet process edit

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet process and can be used to infer latent features in data.[9]

Applications edit

The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data,[10] biodiversity modelling, and image reconstruction [11][12]

See also edit

References edit

  1. ^ Aldous, D. J. (1985). "Exchangeability and related topics". École d'Été de Probabilités de Saint-Flour XIII — 1983. Lecture Notes in Mathematics. Vol. 1117. pp. 1–198. doi:10.1007/BFb0099421. ISBN 978-3-540-15203-3. The restaurant process is described on page 92.
  2. ^ a b Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions". Probability Theory and Related Fields. 102 (2): 145–158. doi:10.1007/BF01213386. MR 1337249. S2CID 16849229.
  3. ^ Blei, David M.; Frazier, Peter I. (2011). "Distance Dependent Chinese Restaurant Processes" (PDF). Journal of Machine Learning Research. 12: 2461–2488.
  4. ^ Zhou, Mingyuan; Carin, Lawrence (2012). "Negative Binomial Process Count and Mixture Modeling". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 307–20. arXiv:1209.3442. Bibcode:2012arXiv1209.3442Z. doi:10.1109/TPAMI.2013.211. PMID 26353243. S2CID 1937045.
  5. ^ Antoniak, Charles E (1974). "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems". The Annals of Statistics. 2 (6): 1152–1174. doi:10.1214/aos/1176342871.
  6. ^ a b c Pitman, Jim (2006). . Vol. 1875. Berlin: Springer-Verlag. ISBN 9783540309901. Archived from the original on 2012-09-25. Retrieved 2011-05-11.
  7. ^ "Dirichlet Process and Dirichlet Distribution -- Polya Restaurant Scheme and Chinese Restaurant Process".
  8. ^ Xinhua Zhang, "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf April 11, 2011, at the Wayback Machine
  9. ^ Griffiths, T.L. and Ghahramani, Z. (2005) Infinite Latent Feature Models and the Indian Buffet Process 2008-10-31 at the Wayback Machine. Gatsby Unit Technical Report GCNU-TR-2005-001.
  10. ^ Qin, Zhaohui S (2006). "Clustering microarray gene expression data using weighted Chinese restaurant process". Bioinformatics. 22 (16): 1988–1997. doi:10.1093/bioinformatics/btl284. PMID 16766561.
  11. ^ White, J. T.; Ghosal, S. (2011). "Bayesian smoothing of photon‐limited images with applications in astronomy" (PDF). Journal of the Royal Statistical Society, Series B (Statistical Methodology). 73 (4): 579–599. CiteSeerX 10.1.1.308.7922. doi:10.1111/j.1467-9868.2011.00776.x. S2CID 2342134.
  12. ^ Li, M.; Ghosal, S. (2014). "Bayesian multiscale smoothing of Gaussian noised images". Bayesian Analysis. 9 (3): 733–758. doi:10.1214/14-ba871.

External links edit

chinese, restaurant, process, other, uses, chinese, restaurant, disambiguation, probability, theory, discrete, time, stochastic, process, analogous, seating, customers, tables, restaurant, imagine, restaurant, with, infinite, number, circular, tables, each, wi. For other uses see Chinese restaurant disambiguation In probability theory the Chinese restaurant process is a discrete time stochastic process analogous to seating customers at tables in a restaurant Imagine a restaurant with an infinite number of circular tables each with infinite capacity Customer 1 sits at the first table The next customer either sits at the same table as customer 1 or the next table This continues with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there i e they are more likely to sit at a table with many customers than few or an unoccupied table At time n the n customers have been partitioned among m n tables or blocks of the partition The results of this process are exchangeable meaning the order in which the customers sit does not affect the probability of the final distribution This property greatly simplifies a number of problems in population genetics linguistic analysis and image recognition The restaurant analogy first appeared in a 1985 write up by David Aldous 1 where it was attributed to Jim Pitman who additionally credits Lester Dubins 2 Contents 1 Formal definition 1 1 Alternative definition 2 Distribution of the number of tables 3 Generalization 3 1 Derivation 3 2 Expected number of tables 3 3 The Indian buffet process 4 Applications 5 See also 6 References 7 External linksFormal definition editFor any positive integer n displaystyle n nbsp let P n displaystyle mathcal P n nbsp denote the set of all partitions of the set 1 2 3 n n displaystyle 1 2 3 n triangleq n nbsp The Chinese restaurant process takes values in the infinite Cartesian product n 1 P n displaystyle prod n geq 1 mathcal P n nbsp The value of the process at time n displaystyle n nbsp is a partition B n displaystyle B n nbsp of the set n displaystyle n nbsp whose probability distribution is determined as follows At time n 1 displaystyle n 1 nbsp the trivial partition B 1 1 displaystyle B 1 1 nbsp is obtained with probability one At time n 1 displaystyle n 1 nbsp the element n 1 displaystyle n 1 nbsp is either added to one of the blocks of the partition B n displaystyle B n nbsp where each block is chosen with probability b n 1 displaystyle b n 1 nbsp where b displaystyle b nbsp is the size of the block i e number of elements or added to the partition B n displaystyle B n nbsp as a new singleton block with probability 1 n 1 displaystyle 1 n 1 nbsp The random partition so generated has some special properties It is exchangeable in the sense that relabeling 1 n displaystyle 1 n nbsp does not change the distribution of the partition and it is consistent in the sense that the law of the partition of n 1 displaystyle n 1 nbsp obtained by removing the element n displaystyle n nbsp from the random partition B n displaystyle B n nbsp is the same as the law of the random partition B n 1 displaystyle B n 1 nbsp The probability assigned to any particular partition ignoring the order in which customers sit around any particular table is Pr B n B b B b 1 n B P n displaystyle Pr B n B frac prod b in B b 1 n qquad B in mathcal P n nbsp where b displaystyle b nbsp is a block in the partition B displaystyle B nbsp and b displaystyle b nbsp is the size of b displaystyle b nbsp The definition can be generalized by introducing a parameter 8 gt 0 displaystyle theta gt 0 nbsp which modifies the probability of the new customer sitting at a new table to 8 n 8 displaystyle frac theta n theta nbsp and correspondingly modifies the probability of them sitting at a table of size b displaystyle b nbsp to b n 8 displaystyle frac b n theta nbsp The vanilla process introduced above can be recovered by setting 8 1 displaystyle theta 1 nbsp Intuitively 8 displaystyle theta nbsp can be interpreted as the effective number of customers sitting at the first empty table Alternative definition edit An equivalent but subtly different way to define the Chinese restaurant process is to let new customers choose companions rather than tables 3 Customer n 1 displaystyle n 1 nbsp chooses to sit at the same table as any one of the n displaystyle n nbsp seated customers with probability 1 n 8 displaystyle frac 1 n theta nbsp or chooses to sit at a new unoccupied table with probability 8 n 8 displaystyle frac theta n theta nbsp Notice that in this formulation the customer chooses a table without having to count table occupancies we don t need b displaystyle b nbsp Distribution of the number of tables editChinese restaurant tableParameters8 gt 0 displaystyle theta gt 0 nbsp n 1 2 displaystyle n in 1 2 ldots nbsp Supportk 1 2 n displaystyle k in 1 2 ldots n nbsp PMFG 8 G n 8 s n k 8 k displaystyle frac Gamma theta Gamma n theta s n k theta k nbsp Mean8 ps 8 n ps 8 displaystyle theta psi theta n psi theta nbsp see digamma function The Chinese restaurant table distribution CRT is the probability distribution on the number of tables in the Chinese restaurant process 4 It can be understood as the sum of n displaystyle n nbsp independent Bernoulli random variables each with a different parameter K i 1 n b i b i Bernoulli 8 i 1 8 displaystyle begin aligned K amp sum i 1 n b i 4pt b i amp sim operatorname Bernoulli left frac theta i 1 theta right end aligned nbsp The probability mass function of K displaystyle K nbsp is given by 5 f k G 8 G n 8 s n k 8 k k 1 n displaystyle f k frac Gamma theta Gamma n theta s n k theta k quad k 1 dots n nbsp where s displaystyle s nbsp denotes Stirling numbers of the first kind Generalization editThis construction can be generalized to a model with two parameters 8 displaystyle theta nbsp amp a displaystyle alpha nbsp 2 6 commonly called the strength or concentration and discount parameters respectively At time n 1 displaystyle n 1 nbsp the next customer to arrive finds B displaystyle B nbsp occupied tables and decides to sit at an empty table with probability 8 B a n 8 displaystyle frac theta B alpha n theta nbsp or at an occupied table b displaystyle b nbsp of size b displaystyle b nbsp with probability b a n 8 displaystyle frac b alpha n theta nbsp In order for the construction to define a valid probability measure it is necessary to suppose that either a lt 0 displaystyle alpha lt 0 nbsp and 8 L a displaystyle theta L alpha nbsp for some L 1 2 displaystyle L in 1 2 nbsp or that 0 a lt 1 displaystyle 0 leq alpha lt 1 nbsp and 8 gt a displaystyle theta gt alpha nbsp Under this model the probability assigned to any particular partition B displaystyle B nbsp of n displaystyle n nbsp in terms of the Pochhammer k symbol is Pr B n B 8 a B 1 a 8 1 n 1 1 b B 1 a b 1 1 displaystyle Pr B n B frac theta alpha B 1 alpha theta 1 n 1 1 prod b in B 1 alpha b 1 1 nbsp where by convention a 0 c 1 displaystyle a 0 c 1 nbsp and for b gt 0 displaystyle b gt 0 nbsp a b c i 0 b 1 a i c a b if c 0 c b G a c b G a c otherwise displaystyle a b c prod i 0 b 1 a ic begin cases a b amp text if c 0 dfrac c b Gamma a c b Gamma a c amp text otherwise end cases nbsp Thus for the case when 8 gt 0 displaystyle theta gt 0 nbsp the partition probability can be expressed in terms of the Gamma function as Pr B n B G 8 G 8 n a B G 8 a B G 8 a b B G b a G 1 a displaystyle Pr B n B frac Gamma theta Gamma theta n dfrac alpha B Gamma theta alpha B Gamma theta alpha prod b in B dfrac Gamma b alpha Gamma 1 alpha nbsp In the one parameter case where a displaystyle alpha nbsp is zero this simplifies to Pr B n B G 8 8 B G 8 n b B G b displaystyle Pr B n B frac Gamma theta theta B Gamma theta n prod b in B Gamma b nbsp Or when 8 displaystyle theta nbsp is zero Pr B n B a B 1 G B G n b B G b a G 1 a displaystyle Pr B n B frac alpha B 1 Gamma B Gamma n prod b in B frac Gamma b alpha Gamma 1 alpha nbsp As before the probability assigned to any particular partition depends only on the block sizes so as before the random partition is exchangeable in the sense described above The consistency property still holds as before by construction If a 0 displaystyle alpha 0 nbsp the probability distribution of the random partition of the integer n displaystyle n nbsp thus generated is the Ewens distribution with parameter 8 displaystyle theta nbsp used in population genetics and the unified neutral theory of biodiversity source source source source source Animation of a Chinese restaurant process with scaling parameter 8 0 5 a 0 displaystyle theta 0 5 alpha 0 nbsp Tables are hidden once the customers of a table can not be displayed anymore however every table has infinitely many seats Recording of an interactive animation 7 Derivation edit Here is one way to derive this partition probability Let C i displaystyle C i nbsp be the random block into which the number i displaystyle i nbsp is added for i 1 2 3 displaystyle i 1 2 3 nbsp Then Pr C i c C 1 C i 1 8 B a 8 i 1 if c new block b a 8 i 1 if c b displaystyle Pr C i c mid C 1 ldots C i 1 begin cases dfrac theta B alpha theta i 1 amp text if c in text new block dfrac b alpha theta i 1 amp text if c in b end cases nbsp The probability that B n displaystyle B n nbsp is any particular partition of the set 1 n displaystyle 1 n nbsp is the product of these probabilities as i displaystyle i nbsp runs from 1 displaystyle 1 nbsp to n displaystyle n nbsp Now consider the size of block b displaystyle b nbsp it increases by one each time we add one element into it When the last element in block b displaystyle b nbsp is to be added in the block size is b 1 displaystyle b 1 nbsp For example consider this sequence of choices generate a new block b displaystyle b nbsp join b displaystyle b nbsp join b displaystyle b nbsp join b displaystyle b nbsp In the end block b displaystyle b nbsp has 4 elements and the product of the numerators in the above equation gets 8 1 2 3 displaystyle theta cdot 1 cdot 2 cdot 3 nbsp Following this logic we obtain Pr B n B displaystyle Pr B n B nbsp as above Expected number of tables edit For the one parameter case with a 0 displaystyle alpha 0 nbsp and 0 lt 8 lt displaystyle 0 lt theta lt infty nbsp the number of tables is distributed according to the chinese restaurant table distribution The expected value of this random variable given that there are n displaystyle n nbsp seated customers is 8 k 1 n 8 8 k 1 8 PS 8 n PS 8 displaystyle begin aligned sum k 1 n frac theta theta k 1 theta cdot Psi theta n Psi theta end aligned nbsp where PS 8 displaystyle Psi theta nbsp is the digamma function In the general case a gt 0 displaystyle alpha gt 0 nbsp the expected number of occupied tables is 6 G 8 n a G 8 1 a G 8 n G 8 a 8 a displaystyle begin aligned frac Gamma theta n alpha Gamma theta 1 alpha Gamma theta n Gamma theta alpha frac theta alpha end aligned nbsp however note that the G displaystyle Gamma cdot nbsp function here is not the standard gamma function 6 The Indian buffet process edit It is possible to adapt the model such that each data point is no longer uniquely associated with a class i e we are no longer constructing a partition but may be associated with any combination of the classes This strains the restaurant tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far and in addition the diner may sample from the untested dishes This has been named the Indian buffet process and can be used to infer latent features in data 9 Applications editThe Chinese restaurant process is closely connected to Dirichlet processes and Polya s urn scheme and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods The Generalized Chinese Restaurant Process is closely related to Pitman Yor process These processes have been used in many applications including modeling text clustering biological microarray data 10 biodiversity modelling and image reconstruction 11 12 See also editEwens sampling formula Preferential attachment Hilbert s paradox of the Grand HotelReferences edit Aldous D J 1985 Exchangeability and related topics Ecole d Ete de Probabilites de Saint Flour XIII 1983 Lecture Notes in Mathematics Vol 1117 pp 1 198 doi 10 1007 BFb0099421 ISBN 978 3 540 15203 3 The restaurant process is described on page 92 a b Pitman Jim 1995 Exchangeable and Partially Exchangeable Random Partitions Probability Theory and Related Fields 102 2 145 158 doi 10 1007 BF01213386 MR 1337249 S2CID 16849229 Blei David M Frazier Peter I 2011 Distance Dependent Chinese Restaurant Processes PDF Journal of Machine Learning Research 12 2461 2488 Zhou Mingyuan Carin Lawrence 2012 Negative Binomial Process Count and Mixture Modeling IEEE Transactions on Pattern Analysis and Machine Intelligence 37 2 307 20 arXiv 1209 3442 Bibcode 2012arXiv1209 3442Z doi 10 1109 TPAMI 2013 211 PMID 26353243 S2CID 1937045 Antoniak Charles E 1974 Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems The Annals of Statistics 2 6 1152 1174 doi 10 1214 aos 1176342871 a b c Pitman Jim 2006 Combinatorial Stochastic Processes Vol 1875 Berlin Springer Verlag ISBN 9783540309901 Archived from the original on 2012 09 25 Retrieved 2011 05 11 Dirichlet Process and Dirichlet Distribution Polya Restaurant Scheme and Chinese Restaurant Process Xinhua Zhang A Very Gentle Note on the Construction of Dirichlet Process September 2008 The Australian National University Canberra Online http users cecs anu edu au xzhang pubDoc notes dirichlet process pdf Archived April 11 2011 at the Wayback Machine Griffiths T L and Ghahramani Z 2005 Infinite Latent Feature Models and the Indian Buffet Process Archived 2008 10 31 at the Wayback Machine Gatsby Unit Technical Report GCNU TR 2005 001 Qin Zhaohui S 2006 Clustering microarray gene expression data using weighted Chinese restaurant process Bioinformatics 22 16 1988 1997 doi 10 1093 bioinformatics btl284 PMID 16766561 White J T Ghosal S 2011 Bayesian smoothing of photon limited images with applications in astronomy PDF Journal of the Royal Statistical Society Series B Statistical Methodology 73 4 579 599 CiteSeerX 10 1 1 308 7922 doi 10 1111 j 1467 9868 2011 00776 x S2CID 2342134 Li M Ghosal S 2014 Bayesian multiscale smoothing of Gaussian noised images Bayesian Analysis 9 3 733 758 doi 10 1214 14 ba871 External links editIntroduction to the Dirichlet Distribution and Related Processes by Frigyik Kapila and Gupta A talk by Michael I Jordan on the CRP http videolectures net icml05 jordan dpcrp Retrieved from https en wikipedia org w index php title Chinese restaurant process amp oldid 1188173790, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.