fbpx
Wikipedia

Yule–Simon distribution

In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.[1]

Yule–Simon
Probability mass function

Yule–Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Cumulative distribution function

Yule–Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Parameters shape (real)
Support
PMF
CDF
Mean for
Mode
Variance for
Skewness for
Excess kurtosis for
MGF does not exist
CF

The probability mass function (pmf) of the Yule–Simon (ρ) distribution is

for integer and real , where is the beta function. Equivalently the pmf can be written in terms of the rising factorial as

where is the gamma function. Thus, if is an integer,

The parameter can be estimated using a fixed point algorithm.[2]

The probability mass function f has the property that for sufficiently large k we have

Plot of the Yule–Simon(1) distribution (red) and its asymptotic Zipf's law (blue)

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: can be used to model, for example, the relative frequency of the th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of .

Occurrence edit

The Yule–Simon distribution arose originally as the limiting distribution of a particular model studied by Udny Yule in 1925 to analyze the growth in the number of species per genus in some higher taxa of biotic organisms.[3] The Yule model makes use of two related Yule processes, where a Yule process is defined as a continuous time birth process which starts with one or more individuals. Yule proved that when time goes to infinity, the limit distribution of the number of species in a genus selected uniformly at random has a specific form and exhibits a power-law behavior in its tail. Thirty years later, the Nobel laureate Herbert A. Simon proposed a time-discrete preferential attachment model to describe the appearance of new words in a large piece of a text. Interestingly enough, the limit distribution of the number of occurrences of each word, when the number of words diverges, coincides with that of the number of species belonging to the randomly chosen genus in the Yule model, for a specific choice of the parameters. This fact explains the designation Yule–Simon distribution that is commonly assigned to that limit distribution. In the context of random graphs, the Barabási–Albert model also exhibits an asymptotic degree distribution that equals the Yule–Simon distribution in correspondence of a specific choice of the parameters and still presents power-law characteristics for more general choices of the parameters. The same happens also for other preferential attachment random graph models.[4]

The preferential attachment process can also be studied as an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number (of balls) the urn already contains.

The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution.[citation needed] Specifically, assume that   follows an exponential distribution with scale   or rate  :

 

with density

 

Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:

 

The pmf of a geometric distribution is

 

for  . The Yule–Simon pmf is then the following exponential-geometric compound distribution:

 

The maximum likelihood estimator for the parameter   given the observations   is the solution to the fixed point equation

 

where   are the rate and shape parameters of the gamma distribution prior on  .

This algorithm is derived by Garcia[2] by directly optimizing the likelihood. Roberts and Roberts[5]

generalize the algorithm to Bayesian settings with the compound geometric formulation described above. Additionally, Roberts and Roberts[5] are able to use the Expectation Maximisation (EM) framework to show convergence of the fixed point algorithm. Moreover, Roberts and Roberts[5] derive the sub-linearity of the convergence rate for the fixed point algorithm. Additionally, they use the EM formulation to give 2 alternate derivations of the standard error of the estimator from the fixed point equation. The variance of the   estimator is

 

the standard error is the square root of the quantity of this estimate divided by N.

Generalizations edit

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as

 

with  . For   the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

See also edit

Bibliography edit

  • Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)

References edit

  1. ^ Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika. 42 (3–4): 425–440. doi:10.1093/biomet/42.3-4.425.
  2. ^ a b Garcia Garcia, Juan Manuel (2011). "A fixed-point algorithm to estimate the Yule-Simon distribution parameter". Applied Mathematics and Computation. 217 (21): 8560–8566. doi:10.1016/j.amc.2011.03.092.
  3. ^ Yule, G. U. (1924). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S". Philosophical Transactions of the Royal Society B. 213 (402–410): 21–87. doi:10.1098/rstb.1925.0002.
  4. ^ Pachon, Angelica; Polito, Federico; Sacerdote, Laura (2015). "Random Graphs Associated to Some Discrete and Continuous Time Preferential Attachment Models". Journal of Statistical Physics. 162 (6): 1608–1638. arXiv:1503.06150. doi:10.1007/s10955-016-1462-7. S2CID 119168040.
  5. ^ a b c Roberts, Lucas; Roberts, Denisa (2017). "An Expectation Maximization Framework for Preferential Attachment Models". arXiv:1710.08511 [stat.CO].

yule, simon, distribution, probability, statistics, discrete, probability, distribution, named, after, udny, yule, herbert, simon, simon, originally, called, yule, distribution, yule, simonprobability, mass, function, yule, simon, scale, note, that, function, . In probability and statistics the Yule Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A Simon Simon originally called it the Yule distribution 1 Yule SimonProbability mass function Yule Simon PMF on a log log scale Note that the function is only defined at integer values of k The connecting lines do not indicate continuity Cumulative distribution function Yule Simon CMF Note that the function is only defined at integer values of k The connecting lines do not indicate continuity Parametersr gt 0 displaystyle rho gt 0 shape real Supportk 1 2 displaystyle k in 1 2 dotsc PMFr B k r 1 displaystyle rho operatorname B k rho 1 CDF1 k B k r 1 displaystyle 1 k operatorname B k rho 1 Meanr r 1 displaystyle frac rho rho 1 for r gt 1 displaystyle rho gt 1 Mode1 displaystyle 1 Variancer 2 r 1 2 r 2 displaystyle frac rho 2 rho 1 2 rho 2 for r gt 2 displaystyle rho gt 2 Skewness r 1 2 r 2 r 3 r displaystyle frac rho 1 2 sqrt rho 2 rho 3 rho for r gt 3 displaystyle rho gt 3 Excess kurtosisr 3 11 r 3 49 r 22 r 4 r 3 r displaystyle rho 3 frac 11 rho 3 49 rho 22 rho 4 rho 3 rho for r gt 4 displaystyle rho gt 4 MGFdoes not existCFr r 1 2 F 1 1 1 r 2 e i t e i t displaystyle frac rho rho 1 2 F 1 1 1 rho 2 e i t e i t The probability mass function pmf of the Yule Simon r distribution is f k r r B k r 1 displaystyle f k rho rho operatorname B k rho 1 for integer k 1 displaystyle k geq 1 and real r gt 0 displaystyle rho gt 0 where B displaystyle operatorname B is the beta function Equivalently the pmf can be written in terms of the rising factorial as f k r r G r 1 k r r 1 displaystyle f k rho frac rho Gamma rho 1 k rho underline rho 1 where G displaystyle Gamma is the gamma function Thus if r displaystyle rho is an integer f k r r r k 1 k r displaystyle f k rho frac rho rho k 1 k rho The parameter r displaystyle rho can be estimated using a fixed point algorithm 2 The probability mass function f has the property that for sufficiently large k we have f k r r G r 1 k r 1 1 k r 1 displaystyle f k rho approx frac rho Gamma rho 1 k rho 1 propto frac 1 k rho 1 Plot of the Yule Simon 1 distribution red and its asymptotic Zipf s law blue This means that the tail of the Yule Simon distribution is a realization of Zipf s law f k r displaystyle f k rho can be used to model for example the relative frequency of the k displaystyle k th most frequent word in a large collection of text which according to Zipf s law is inversely proportional to a typically small power of k displaystyle k Contents 1 Occurrence 2 Generalizations 3 See also 4 Bibliography 5 ReferencesOccurrence editThe Yule Simon distribution arose originally as the limiting distribution of a particular model studied by Udny Yule in 1925 to analyze the growth in the number of species per genus in some higher taxa of biotic organisms 3 The Yule model makes use of two related Yule processes where a Yule process is defined as a continuous time birth process which starts with one or more individuals Yule proved that when time goes to infinity the limit distribution of the number of species in a genus selected uniformly at random has a specific form and exhibits a power law behavior in its tail Thirty years later the Nobel laureate Herbert A Simon proposed a time discrete preferential attachment model to describe the appearance of new words in a large piece of a text Interestingly enough the limit distribution of the number of occurrences of each word when the number of words diverges coincides with that of the number of species belonging to the randomly chosen genus in the Yule model for a specific choice of the parameters This fact explains the designation Yule Simon distribution that is commonly assigned to that limit distribution In the context of random graphs the Barabasi Albert model also exhibits an asymptotic degree distribution that equals the Yule Simon distribution in correspondence of a specific choice of the parameters and still presents power law characteristics for more general choices of the parameters The same happens also for other preferential attachment random graph models 4 The preferential attachment process can also be studied as an urn process in which balls are added to a growing number of urns each ball being allocated to an urn with probability linear in the number of balls the urn already contains The distribution also arises as a compound distribution in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution citation needed Specifically assume that W displaystyle W nbsp follows an exponential distribution with scale 1 r displaystyle 1 rho nbsp or rate r displaystyle rho nbsp W Exponential r displaystyle W sim operatorname Exponential rho nbsp with density h w r r exp r w displaystyle h w rho rho exp rho w nbsp Then a Yule Simon distributed variable K has the following geometric distribution conditional on W K Geometric exp W displaystyle K sim operatorname Geometric exp W nbsp The pmf of a geometric distribution is g k p p 1 p k 1 displaystyle g k p p 1 p k 1 nbsp for k 1 2 displaystyle k in 1 2 dotsc nbsp The Yule Simon pmf is then the following exponential geometric compound distribution f k r 0 g k exp w h w r d w displaystyle f k rho int 0 infty g k exp w h w rho dw nbsp The maximum likelihood estimator for the parameter r displaystyle rho nbsp given the observations k 1 k 2 k 3 k N displaystyle k 1 k 2 k 3 dots k N nbsp is the solution to the fixed point equation r t 1 N a 1 b i 1 N j 1 k i 1 r t j displaystyle rho t 1 frac N a 1 b sum i 1 N sum j 1 k i frac 1 rho t j nbsp where b 0 a 1 displaystyle b 0 a 1 nbsp are the rate and shape parameters of the gamma distribution prior on r displaystyle rho nbsp This algorithm is derived by Garcia 2 by directly optimizing the likelihood Roberts and Roberts 5 generalize the algorithm to Bayesian settings with the compound geometric formulation described above Additionally Roberts and Roberts 5 are able to use the Expectation Maximisation EM framework to show convergence of the fixed point algorithm Moreover Roberts and Roberts 5 derive the sub linearity of the convergence rate for the fixed point algorithm Additionally they use the EM formulation to give 2 alternate derivations of the standard error of the estimator from the fixed point equation The variance of the l displaystyle lambda nbsp estimator is Var l 1 N l 2 i 1 N j 1 k i 1 l j 2 displaystyle operatorname Var hat lambda frac 1 frac N hat lambda 2 sum i 1 N sum j 1 k i frac 1 hat lambda j 2 nbsp the standard error is the square root of the quantity of this estimate divided by N Generalizations editThe two parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function The probability mass function of the generalized Yule Simon r a distribution is defined as f k r a r 1 a r B 1 a k r 1 displaystyle f k rho alpha frac rho 1 alpha rho mathrm B 1 alpha k rho 1 nbsp with 0 a lt 1 displaystyle 0 leq alpha lt 1 nbsp For a 0 displaystyle alpha 0 nbsp the ordinary Yule Simon r distribution is obtained as a special case The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail See also editZeta distribution Scale free network Beta negative binomial distributionBibliography editColin Rose and Murray D Smith Mathematical Statistics with Mathematica New York Springer 2002 ISBN 0 387 95234 9 See page 107 where it is called the Yule distribution References edit Simon H A 1955 On a class of skew distribution functions Biometrika 42 3 4 425 440 doi 10 1093 biomet 42 3 4 425 a b Garcia Garcia Juan Manuel 2011 A fixed point algorithm to estimate the Yule Simon distribution parameter Applied Mathematics and Computation 217 21 8560 8566 doi 10 1016 j amc 2011 03 092 Yule G U 1924 A Mathematical Theory of Evolution based on the Conclusions of Dr J C Willis F R S Philosophical Transactions of the Royal Society B 213 402 410 21 87 doi 10 1098 rstb 1925 0002 Pachon Angelica Polito Federico Sacerdote Laura 2015 Random Graphs Associated to Some Discrete and Continuous Time Preferential Attachment Models Journal of Statistical Physics 162 6 1608 1638 arXiv 1503 06150 doi 10 1007 s10955 016 1462 7 S2CID 119168040 a b c Roberts Lucas Roberts Denisa 2017 An Expectation Maximization Framework for Preferential Attachment Models arXiv 1710 08511 stat CO Retrieved from https en wikipedia org w index php title Yule Simon distribution amp oldid 1159560958, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.