fbpx
Wikipedia

Tweedie distribution

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous.[1] Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.[2]

The Tweedie distributions were named by Bent Jørgensen[3] after Maurice Tweedie, a statistician and medical physicist at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1984.[1][4][2]

Definitions edit

The (reproductive) Tweedie distributions are defined as subfamily of (reproductive) exponential dispersion models (ED), with a special mean-variance relationship. A random variable Y is Tweedie distributed Twp(μ, σ2), if   with mean  , positive dispersion parameter   and

 
where   is called the Tweedie power parameter. The probability distribution Pθ,σ2 on the measurable sets A, is given by
 
for some σ-finite measure νλ. This representation uses the canonical parameter θ of an exponential dispersion model and cumulant function
 
where we used  , or equivalently  .

Properties edit

Additive exponential dispersion models edit

The models just described are in the reproductive form. An exponential dispersion model has always a dual: the additive form. If Y is reproductive, then   with   is in the additive form ED*(θ,λ), for Tweedie Tw*p(μ, λ). Additive models have the property that the distribution of the sum of independent random variables,

 
for which Zi ~ ED*(θ,λi) with fixed θ and various λ are members of the family of distributions with the same θ,
 

Reproductive exponential dispersion models edit

A second class of exponential dispersion models exists designated by the random variable

 
where σ2 = 1/λ, known as reproductive exponential dispersion models. They have the property that for n independent random variables Yi ~ ED(μ,σ2/wi), with weighting factors wi and
 
a weighted average of the variables gives,
 

For reproductive models the weighted average of independent random variables with fixed μ and σ2 and various values for wi is a member of the family of distributions with same μ and σ2.

The Tweedie exponential dispersion models are both additive and reproductive; we thus have the duality transformation

 

Scale invariance edit

A third property of the Tweedie models is that they are scale invariant: For a reproductive exponential dispersion model Twp(μ, σ2) and any positive constant c we have the property of closure under scale transformation,

 

The Tweedie power variance function edit

To define the variance function for exponential dispersion models we make use of the mean value mapping, the relationship between the canonical parameter θ and the mean μ. It is defined by the function

 
with cumulative function  . The variance function V(μ) is constructed from the mean value mapping,
 

Here the minus exponent in τ−1(μ) denotes an inverse function rather than a reciprocal. The mean and variance of an additive random variable is then E(Z) = λμ and var(Z) = λV(μ).

Scale invariance implies that the variance function obeys the relationship V(μ) = μ p.[2]

The Tweedie deviance edit

The unit deviance of a reproductive Tweedie distribution is given by

 

The Tweedie cumulant generating functions edit

The properties of exponential dispersion models give us two differential equations.[2] The first relates the mean value mapping and the variance function to each other,

 

The second shows how the mean value mapping is related to the cumulant function,

 

These equations can be solved to obtain the cumulant function for different cases of the Tweedie models. A cumulant generating function (CGF) may then be obtained from the cumulant function. The additive CGF is generally specified by the equation

 
and the reproductive CGF by
 
where s is the generating function variable.

For the additive Tweedie models the CGFs take the form,

 
and for the reproductive models,
 

The additive and reproductive Tweedie models are conventionally denoted by the symbols Tw*p(θ,λ) and Twp(θ,σ2), respectively.

The first and second derivatives of the CGFs, with s = 0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law,

 

The Tweedie convergence theorem edit

The Tweedie exponential dispersion models are fundamental in statistical theory consequent to their roles as foci of convergence for a wide range of statistical processes. Jørgensen et al proved a theorem that specifies the asymptotic behaviour of variance functions known as the Tweedie convergence theorem.[5] This theorem, in technical terms, is stated thus:[2] The unit variance function is regular of order p at zero (or infinity) provided that V(μ) ~ c0μp for μ as it approaches zero (or infinity) for all real values of p and c0 > 0. Then for a unit variance function regular of order p at either zero or infinity and for

 
for any  , and   we have
 
as   or  , respectively, where the convergence is through values of c such that is in the domain of θ and cp−2/σ2 is in the domain of λ. The model must be infinitely divisible as c2−p approaches infinity.[2]

In nontechnical terms this theorem implies that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model. Almost all distribution functions with finite cumulant generating functions qualify as exponential dispersion models and most exponential dispersion models manifest variance functions of this form. Hence many probability distributions have variance functions that express this asymptotic behaviour, and the Tweedie distributions become foci of convergence for a wide range of data types.[6]

Related distributions edit

The Tweedie distributions include a number of familiar distributions as well as some unusual ones, each being specified by the domain of the index parameter. We have the

For 0 < p < 1 no Tweedie model exists. Note that all stable distributions mean actually generated by stable distributions.

Occurrence and applications edit

The Tweedie models and Taylor’s power law edit

Taylor's law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power-law relationship.[7] For the population count Y with mean μ and variance var(Y), Taylor's law is written,

 
where a and p are both positive constants. Since L. R. Taylor described this law in 1961 there have been many different explanations offered to explain it, ranging from animal behavior,[7] a random walk model,[8] a stochastic birth, death, immigration and emigration model,[9] to a consequence of equilibrium and non-equilibrium statistical mechanics.[10] No consensus exists as to an explanation for this model.

Since Taylor's law is mathematically identical to the variance-to-mean power law that characterizes the Tweedie models, it seemed reasonable to use these models and the Tweedie convergence theorem to explain the observed clustering of animals and plants associated with Taylor's law.[11][12] The majority of the observed values for the power-law exponent p have fallen in the interval (1,2) and so the Tweedie compound Poisson–gamma distribution would seem applicable. Comparison of the empirical distribution function to the theoretical compound Poisson–gamma distribution has provided a means to verify consistency of this hypothesis.[11]

Whereas conventional models for Taylor's law have tended to involve ad hoc animal behavioral or population dynamic assumptions, the Tweedie convergence theorem would imply that Taylor's law results from a general mathematical convergence effect much as how the central limit theorem governs the convergence behavior of certain types of random data. Indeed, any mathematical model, approximation or simulation that is designed to yield Taylor's law (on the basis of this theorem) is required to converge to the form of the Tweedie models.[6]

Tweedie convergence and 1/f noise edit

Pink noise, or 1/f noise, refers to a pattern of noise characterized by a power-law relationship between its intensities S(f) at different frequencies f,

 
where the dimensionless exponent γ ∈ [0,1]. It is found within a diverse number of natural processes.[13] Many different explanations for 1/f noise exist, a widely held hypothesis is based on Self-organized criticality where dynamical systems close to a critical point are thought to manifest scale-invariant spatial and/or temporal behavior.

In this subsection a mathematical connection between 1/f noise and the Tweedie variance-to-mean power law will be described. To begin, we first need to introduce self-similar processes: For the sequence of numbers

 
with mean
 
deviations
 
variance
 
and autocorrelation function
 
with lag k, if the autocorrelation of this sequence has the long range behavior
 
as k→∞ and where L(k) is a slowly varying function at large values of k, this sequence is called a self-similar process.[14]

The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of N elements into groups of m equal-sized segments (N/m is integer) so that new reproductive sequences, based on the mean values, can be defined:

 

The variance determined from this sequence will scale as the bin size changes such that

 
if and only if the autocorrelation has the limiting form[15]
 

One can also construct a set of corresponding additive sequences

 
based on the expanding bins,
 

Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship

 

Since   and   are constants this relationship constitutes a variance-to-mean power law, with p = 2 - d.[6][16]

The biconditional relationship above between the variance-to-mean power law and power law autocorrelation function, and the Wiener–Khinchin theorem[17] imply that any sequence that exhibits a variance-to-mean power law by the method of expanding bins will also manifest 1/f noise, and vice versa. Moreover, the Tweedie convergence theorem, by virtue of its central limit-like effect of generating distributions that manifest variance-to-mean power functions, will also generate processes that manifest 1/f noise.[6] The Tweedie convergence theorem thus provides an alternative explanation for the origin of 1/f noise, based its central limit-like effect.

Much as the central limit theorem requires certain kinds of random processes to have as a focus of their convergence the Gaussian distribution and thus express white noise, the Tweedie convergence theorem requires certain non-Gaussian processes to have as a focus of convergence the Tweedie distributions that express 1/f noise.[6]

The Tweedie models and multifractality edit

From the properties of self-similar processes, the power-law exponent p = 2 - d is related to the Hurst exponent H and the fractal dimension D by[15]

 

A one-dimensional data sequence of self-similar data may demonstrate a variance-to-mean power law with local variations in the value of p and hence in the value of D. When fractal structures manifest local variations in fractal dimension, they are said to be multifractals. Examples of data sequences that exhibit local variations in p like this include the eigenvalue deviations of the Gaussian Orthogonal and Unitary Ensembles.[6] The Tweedie compound Poisson–gamma distribution has served to model multifractality based on local variations in the Tweedie exponent α. Consequently, in conjunction with the variation of α, the Tweedie convergence theorem can be viewed as having a role in the genesis of such multifractals.

The variation of α has been found to obey the asymmetric Laplace distribution in certain cases.[18] This distribution has been shown to be a member of the family of geometric Tweedie models,[19] that manifest as limiting distributions in a convergence theorem for geometric dispersion models.

Regional organ blood flow edit

Regional organ blood flow has been traditionally assessed by the injection of radiolabelled polyethylene microspheres into the arterial circulation of animals, of a size that they become entrapped within the microcirculation of organs. The organ to be assessed is then divided into equal-sized cubes and the amount of radiolabel within each cube is evaluated by liquid scintillation counting and recorded. The amount of radioactivity within each cube is taken to reflect the blood flow through that sample at the time of injection. It is possible to evaluate adjacent cubes from an organ in order to additively determine the blood flow through larger regions. Through the work of J B Bassingthwaighte and others an empirical power law has been derived between the relative dispersion of blood flow of tissue samples (RD = standard deviation/mean) of mass m relative to reference-sized samples:[20]

 

This power law exponent Ds has been called a fractal dimension. Bassingthwaighte's power law can be shown to directly relate to the variance-to-mean power law. Regional organ blood flow can thus be modelled by the Tweedie compound Poisson–gamma distribution.,[21] In this model tissue sample could be considered to contain a random (Poisson) distributed number of entrapment sites, each with gamma distributed blood flow. Blood flow at this microcirculatory level has been observed to obey a gamma distribution,[22] thus providing support for this hypothesis.

Cancer metastasis edit

The "experimental cancer metastasis assay"[23] has some resemblance to the above method to measure regional blood flow. Groups of syngeneic and age matched mice are given intravenous injections of equal-sized aliquots of suspensions of cloned cancer cells and then after a set period of time their lungs are removed and the number of cancer metastases enumerated within each pair of lungs. If other groups of mice are injected with different cancer cell clones then the number of metastases per group will differ in accordance with the metastatic potentials of the clones. It has been long recognized that there can be considerable intraclonal variation in the numbers of metastases per mouse despite the best attempts to keep the experimental conditions within each clonal group uniform.[23] This variation is larger than would be expected on the basis of a Poisson distribution of numbers of metastases per mouse in each clone and when the variance of the number of metastases per mouse was plotted against the corresponding mean a power law was found.[24]

The variance-to-mean power law for metastases was found to also hold for spontaneous murine metastases[25] and for cases series of human metastases.[26] Since hematogenous metastasis occurs in direct relationship to regional blood flow[27] and videomicroscopic studies indicate that the passage and entrapment of cancer cells within the circulation appears analogous to the microsphere experiments[28] it seemed plausible to propose that the variation in numbers of hematogenous metastases could reflect heterogeneity in regional organ blood flow.[29] The blood flow model was based on the Tweedie compound Poisson–gamma distribution, a distribution governing a continuous random variable. For that reason in the metastasis model it was assumed that blood flow was governed by that distribution and that the number of regional metastases occurred as a Poisson process for which the intensity was directly proportional to blood flow. This led to the description of the Poisson negative binomial (PNB) distribution as a discrete equivalent to the Tweedie compound Poisson–gamma distribution. The probability generating function for the PNB distribution is

 

The relationship between the mean and variance of the PNB distribution is then

 
which, in the range of many experimental metastasis assays, would be indistinguishable from the variance-to-mean power law. For sparse data, however, this discrete variance-to-mean relationship would behave more like that of a Poisson distribution where the variance equaled the mean.

Genomic structure and evolution edit

The local density of Single Nucleotide Polymorphisms (SNPs) within the human genome, as well as that of genes, appears to cluster in accord with the variance-to-mean power law and the Tweedie compound Poisson–gamma distribution.[30][31] In the case of SNPs their observed density reflects the assessment techniques, the availability of genomic sequences for analysis, and the nucleotide heterozygosity.[32] The first two factors reflect ascertainment errors inherent to the collection methods, the latter factor reflects an intrinsic property of the genome.

In the coalescent model of population genetics each genetic locus has its own unique history. Within the evolution of a population from some species some genetic loci could presumably be traced back to a relatively recent common ancestor whereas other loci might have more ancient genealogies. More ancient genomic segments would have had more time to accumulate SNPs and to experience recombination. R R Hudson has proposed a model where recombination could cause variation in the time to most common recent ancestor for different genomic segments.[33] A high recombination rate could cause a chromosome to contain a large number of small segments with less correlated genealogies.

Assuming a constant background rate of mutation the number of SNPs per genomic segment would accumulate proportionately to the time to the most recent common ancestor. Current population genetic theory would indicate that these times would be gamma distributed, on average.[34] The Tweedie compound Poisson–gamma distribution would suggest a model whereby the SNP map would consist of multiple small genomic segments with the mean number of SNPs per segment would be gamma distributed as per Hudson's model.

The distribution of genes within the human genome also demonstrated a variance-to-mean power law, when the method of expanding bins was used to determine the corresponding variances and means.[31] Similarly the number of genes per enumerative bin was found to obey a Tweedie compound Poisson–gamma distribution. This probability distribution was deemed compatible with two different biological models: the microarrangement model where the number of genes per unit genomic length was determined by the sum of a random number of smaller genomic segments derived by random breakage and reconstruction of protochormosomes. These smaller segments would be assumed to carry on average a gamma distributed number of genes.

In the alternative gene cluster model, genes would be distributed randomly within the protochromosomes. Over large evolutionary timescales there would occur tandem duplication, mutations, insertions, deletions and rearrangements that could affect the genes through a stochastic birth, death and immigration process to yield the Tweedie compound Poisson–gamma distribution.

Both these mechanisms would implicate neutral evolutionary processes that would result in regional clustering of genes.

Random matrix theory edit

The Gaussian unitary ensemble (GUE) consists of complex Hermitian matrices that are invariant under unitary transformations whereas the Gaussian orthogonal ensemble (GOE) consists of real symmetric matrices invariant under orthogonal transformations. The ranked eigenvalues En from these random matrices obey Wigner's semicircular distribution: For a N×N matrix the average density for eigenvalues of size E will be

 
as E→ ∞. Integration of the semicircular rule provides the number of eigenvalues on average less than E,
 

The ranked eigenvalues can be unfolded, or renormalized, with the equation

 

This removes the trend of the sequence from the fluctuating portion. If we look at the absolute value of the difference between the actual and expected cumulative number of eigenvalues

 
we obtain a sequence of eigenvalue fluctuations which, using the method of expanding bins, reveals a variance-to-mean power law.[6] The eigenvalue fluctuations of both the GUE and the GOE manifest this power law with the power law exponents ranging between 1 and 2, and they similarly manifest 1/f noise spectra. These eigenvalue fluctuations also correspond to the Tweedie compound Poisson–gamma distribution and they exhibit multifractality.[6]

The distribution of prime numbers edit

The second Chebyshev function ψ(x) is given by,

 
where the summation extends over all prime powers   not exceeding x, x runs over the positive real numbers, and   is the von Mangoldt function. The function ψ(x) is related to the prime-counting function π(x), and as such provides information with regards to the distribution of prime numbers amongst the real numbers. It is asymptotic to x, a statement equivalent to the prime number theorem and it can also be shown to be related to the zeros of the Riemann zeta function located on the critical strip ρ, where the real part of the zeta zero ρ is between 0 and 1. Then ψ expressed for x greater than one can be written:
 
where
 

The Riemann hypothesis states that the nontrivial zeros of the Riemann zeta function all have real part 12. These zeta function zeros are related to the distribution of prime numbers. Schoenfeld[35] has shown that if the Riemann hypothesis is true then

 
for all  . If we analyze the Chebyshev deviations Δ(n) on the integers n using the method of expanding bins and plot the variance versus the mean a variance to mean power law can be demonstrated.[citation needed] Moreover, these deviations correspond to the Tweedie compound Poisson-gamma distribution and they exhibit 1/f noise.

Other applications edit

Applications of Tweedie distributions include:

References edit

  1. ^ a b Tweedie, M.C.K. (1984). "An index which distinguishes between some important exponential families". In Ghosh, J.K.; Roy, J (eds.). Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. Calcutta: Indian Statistical Institute. pp. 579–604. MR 0786162.
  2. ^ a b c d e f Jørgensen, Bent (1997). The theory of dispersion models. Chapman & Hall. ISBN 978-0412997112.
  3. ^ Jørgensen, B (1987). "Exponential dispersion models". Journal of the Royal Statistical Society, Series B. 49 (2): 127–162. JSTOR 2345415.
  4. ^ Smith, C.A.B. (1997). "Obituary: Maurice Charles Kenneth Tweedie, 1919–96". Journal of the Royal Statistical Society, Series A. 160 (1): 151–154. doi:10.1111/1467-985X.00052.
  5. ^ Jørgensen, B; Martinez, JR; Tsao, M (1994). "Asymptotic behaviour of the variance function". Scandinavian Journal of Statistics. 21: 223–243.
  6. ^ a b c d e f g h Kendal, W. S.; Jørgensen, B. (2011). "Tweedie convergence: A mathematical basis for Taylor's power law, 1/f noise, and multifractality". Physical Review E. 84 (6): 066120. Bibcode:2011PhRvE..84f6120K. doi:10.1103/PhysRevE.84.066120. PMID 22304168.
  7. ^ a b Taylor, LR (1961). "Aggregation, variance and the mean". Nature. 189 (4766): 732–735. Bibcode:1961Natur.189..732T. doi:10.1038/189732a0. S2CID 4263093.
  8. ^ Hanski, I (1980). "Spatial patterns and movements in coprophagous beetles". Oikos. 34 (3): 293–310. Bibcode:1980Oikos..34..293H. doi:10.2307/3544289. JSTOR 3544289.
  9. ^ Anderson, RD; Crawley, GM; Hassell, M (1982). "Variability in the abundance of animal and plant species". Nature. 296 (5854): 245–248. Bibcode:1982Natur.296..245A. doi:10.1038/296245a0. S2CID 4272853.
  10. ^ Fronczak, A; Fronczak, P (2010). "Origins of Taylor's power law for fluctuation scaling in complex systems". Phys Rev E. 81 (6): 066112. arXiv:0909.1896. Bibcode:2010PhRvE..81f6112F. doi:10.1103/physreve.81.066112. PMID 20866483. S2CID 17435198.
  11. ^ a b c Kendal, WS (2002). "Spatial aggregation of the Colorado potato beetle described by an exponential dispersion model". Ecological Modelling. 151 (2–3): 261–269. doi:10.1016/s0304-3800(01)00494-x.
  12. ^ Kendal, WS (2004). "Taylor's ecological power law as a consequence of scale invariant exponential dispersion models". Ecol Complex. 1 (3): 193–209. doi:10.1016/j.ecocom.2004.05.001.
  13. ^ Dutta, P; Horn, PM (1981). "Low frequency fluctuations in solids: 1/f noise". Rev Mod Phys. 53 (3): 497–516. Bibcode:1981RvMP...53..497D. doi:10.1103/revmodphys.53.497.
  14. ^ Leland, WE; Taqqu, MS; Willinger, W; Wilson, DV (1994). "On the self-similar nature of Ethernet traffic (Extended version)". IEEE/ACM Transactions on Networking. 2: 1–15. doi:10.1109/90.282603. S2CID 6011907.
  15. ^ a b Tsybakov, B; Georganas, ND (1997). "On self-similar traffic in ATM queues: definitions, overflow probability bound, and cell delay distribution". IEEE/ACM Transactions on Networking. 5 (3): 397–409. CiteSeerX 10.1.1.53.5040. doi:10.1109/90.611104. S2CID 2205855.
  16. ^ Kendal, WS (2007). "Scale invariant correlations between genes and SNPs on Human chromosome 1 reveal potential evolutionary mechanisms". J Theor Biol. 245 (2): 329–340. Bibcode:2007JThBi.245..329K. doi:10.1016/j.jtbi.2006.10.010. PMID 17137602.
  17. ^ McQuarrie DA (1976) Statistical mechanics [Harper & Row]
  18. ^ Kendal, WS (2014). "Multifractality attributed to dual central limit-lie convergence effects". Physica A. 401: 22–33. Bibcode:2014PhyA..401...22K. doi:10.1016/j.physa.2014.01.022.
  19. ^ Jørgensen, B; Kokonendji, CC (2011). "Dispersion models for geometric sums". Braz J Probab Stat. 25 (3): 263–293. doi:10.1214/10-bjps136.
  20. ^ Bassingthwaighte, JB (1989). "Fractal nature of regional myocardial blood flow heterogeneity". Circ Res. 65 (3): 578–590. doi:10.1161/01.res.65.3.578. PMC 3361973. PMID 2766485.
  21. ^ Kendal, WS (2001). "A stochastic model for the self-similar heterogeneity of regional organ blood flow". Proc Natl Acad Sci U S A. 98 (3): 837–841. Bibcode:2001PNAS...98..837K. doi:10.1073/pnas.98.3.837. PMC 14670. PMID 11158557.
  22. ^ Honig, CR; Feldstein, ML; Frierson, JL (1977). "Capillary lengths, anastomoses, and estimated capillary transit times in skeletal muscle". Am J Physiol Heart Circ Physiol. 233 (1): H122–H129. doi:10.1152/ajpheart.1977.233.1.h122. PMID 879328.
  23. ^ a b Fidler, IJ; Kripke, M (1977). "Metastasis results from preexisting variant cells within a malignant tumor". Science. 197 (4306): 893–895. Bibcode:1977Sci...197..893F. doi:10.1126/science.887927. PMID 887927.
  24. ^ Kendal, WS; Frost, P (1987). "Experimental metastasis: a novel application of the variance-to-mean power function". J Natl Cancer Inst. 79 (5): 1113–1115. doi:10.1093/jnci/79.5.1113. PMID 3479636.
  25. ^ Kendal, WS (1999). "Clustering of murine lung metastases reflects fractal nonuniformity in regional lung blood flow". Invasion and Metastasis. 18 (5–6): 285–296. doi:10.1159/000024521. PMID 10729773. S2CID 46835513.
  26. ^ Kendal, WS; Lagerwaard, FJ; Agboola, O (2000). "Characterization of the frequency distribution for human hematogenous metastases: evidence for clustering and a power variance function". Clin Exp Metastasis. 18 (3): 219–229. doi:10.1023/A:1006737100797. PMID 11315095. S2CID 25261069.
  27. ^ Weiss, L; Bronk, J; Pickren, JW; Lane, WW (1981). "Metastatic patterns and targe organ arterial blood flow". Invasion and Metastasis. 1 (2): 126–135. PMID 7188382.
  28. ^ Chambers, AF; Groom, AC; MacDonald, IC (2002). "Dissemination and growth of cancer cells in metastatic sites". Nature Reviews Cancer. 2 (8): 563–572. doi:10.1038/nrc865. PMID 12154349. S2CID 135169.
  29. ^ Kendal, WS (2002). "A frequency distribution for the number of hematogenous organ metastases". Invasion and Metastasis. 1 (2): 126–135. Bibcode:2002JThBi.217..203K. doi:10.1006/jtbi.2002.3021. PMID 12202114.
  30. ^ Kendal, WS (2003). "An exponential dispersion model for the distribution of human single nucleotide polymorphisms". Mol Biol Evol. 20 (4): 579–590. doi:10.1093/molbev/msg057. PMID 12679541.
  31. ^ a b Kendal, WS (2004). "A scale invariant clustering of genes on human chromosome 7". BMC Evol Biol. 4: 3. doi:10.1186/1471-2148-4-3. PMC 373443. PMID 15040817.
  32. ^ Sachidanandam, R; Weissman, D; Schmidt, SC; et al. (2001). "A map of human genome variation containing 1.42 million single nucleotide polymorphisms". Nature. 409 (6822): 928–933. Bibcode:2001Natur.409..928S. doi:10.1038/35057149. PMID 11237013.
  33. ^ Hudson, RR (1991). "Gene genealogies and the coalescent process". Oxford Surveys in Evolutionary Biology. 7: 1–44.
  34. ^ Tavare, S; Balding, DJ; Griffiths, RC; Donnelly, P (1997). "Inferring coalescent times from DNA sequence data". Genetics. 145 (2): 505–518. doi:10.1093/genetics/145.2.505. PMC 1207814. PMID 9071603.
  35. ^ Schoenfeld, J (1976). "Sharper bounds for the Chebyshev functions θ(x) and ψ(x). II". Mathematics of Computation. 30 (134): 337–360. doi:10.1090/s0025-5718-1976-0457374-x.
  36. ^ Haberman, S.; Renshaw, A. E. (1996). "Generalized linear models and actuarial science". The Statistician. 45 (4): 407–436. doi:10.2307/2988543. JSTOR 2988543.
  37. ^ Renshaw, A. E. 1994. Modelling the claims process in the presence of covariates. ASTIN Bulletin 24: 265–286.
  38. ^ Jørgensen, B.; Paes; Souza, M. C. (1994). "Fitting Tweedie's compound Poisson model to insurance claims data". Scand. Actuar. J. 1: 69–93. CiteSeerX 10.1.1.329.9259. doi:10.1080/03461238.1994.10413930.
  39. ^ Haberman, S., and Renshaw, A. E. 1998. Actuarial applications of generalized linear models. In Statistics in Finance, D. J. Hand and S. D. Jacka (eds), Arnold, London.
  40. ^ Mildenhall, S. J. 1999. A systematic relationship between minimum bias and generalized linear models. 1999 Proceedings of the Casualty Actuarial Society 86: 393–487.
  41. ^ Murphy, K. P., Brockman, M. J., and Lee, P. K. W. (2000). Using generalized linear models to build dynamic pricing systems. Casualty Actuarial Forum, Winter 2000.
  42. ^ Smyth, G.K.; Jørgensen, B. (2002). "Fitting Tweedie's compound Poisson model to insurance claims data: dispersion modelling" (PDF). ASTIN Bulletin. 32: 143–157. doi:10.2143/ast.32.1.1020.
  43. ^ Davidian, M (1990). "Estimation of variance functions in assays with possible unequal replication and nonnormal data". Biometrika. 77: 43–54. doi:10.1093/biomet/77.1.43.
  44. ^ Davidian, M.; Carroll, R. J.; Smith, W. (1988). "Variance functions and the minimum detectable concentration in assays". Biometrika. 75 (3): 549–556. doi:10.1093/biomet/75.3.549.
  45. ^ Aalen, O. O. (1992). "Modelling heterogeneity in survival analysis by the compound Poisson distribution". Ann. Appl. Probab. 2 (4): 951–972. doi:10.1214/aoap/1177005583.
  46. ^ Hougaard, P.; Harvald, B.; Holm, N. V. (1992). "Measuring the similarities between the lifetimes of adult Danish twins born between 1881–1930". Journal of the American Statistical Association. 87 (417): 17–24. doi:10.1080/01621459.1992.10475170.
  47. ^ Hougaard, P (1986). "Survival models for heterogeneous populations derived from stable distributions". Biometrika. 73 (2): 387–396. doi:10.1093/biomet/73.2.387.
  48. ^ Gilchrist, R. and Drinkwater, D. 1999. Fitting Tweedie models to data with probability of zero responses. Proceedings of the 14th International Workshop on Statistical Modelling, Graz, pp. 207–214.
  49. ^ a b Smyth, G. K. 1996. Regression analysis of quantity data with exact zeros. Proceedings of the Second Australia—Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, 572–580.
  50. ^ Kurz, Christoph F. (2017). "Tweedie distributions for fitting semicontinuous health care utilization cost data". BMC Medical Research Methodology. 17 (171): 171. doi:10.1186/s12874-017-0445-y. PMC 5735804. PMID 29258428.
  51. ^ Hasan, M.M.; Dunn, P.K. (2010). "Two Tweedie distributions that are near-optimal for modelling monthly rainfall in Australia". International Journal of Climatology. 31 (9): 1389–1397. doi:10.1002/joc.2162. S2CID 140135793.
  52. ^ Candy, S. G. (2004). "Modelling catch and effort data using generalized linear models, the Tweedie distribution, random vessel effects and random stratum-by-year effects". CCAMLR Science. 11: 59–80.
  53. ^ Kendal, WS; Jørgensen, B (2011). "Taylor's power law and fluctuation scaling explained by a central-limit-like convergence". Phys. Rev. E. 83 (6): 066115. Bibcode:2011PhRvE..83f6115K. doi:10.1103/physreve.83.066115. PMID 21797449.
  54. ^ Kendal, WS (2015). "Self-organized criticality attributed to a central limit-like convergence effect". Physica A. 421: 141–150. Bibcode:2015PhyA..421..141K. doi:10.1016/j.physa.2014.11.035.

Further reading edit

  • Dunn, P.K.; Smyth, G.K. (2018). Generalized Linear Models With Examples in R. New York: Springer. doi:10.1007/978-1-4419-0118-7. ISBN 978-1-4419-0118-7. Chapter 12 is about Tweedie distributions and models.
  • Kaas, R. (2005). "Compound Poisson distribution and GLM’s – Tweedie’s distribution". In Proceedings of the Contact Forum "3rd Actuarial and Financial Mathematics Day", pages 3–12. Brussels: Royal Flemish Academy of Belgium for Science and the Arts.
  • Tweedie, M.C.K. (1956). "Some statistical properties of Inverse Gaussian distributions". Virginia J. Sci. New Series. 7: 160–165.

tweedie, distribution, probability, statistics, family, probability, distributions, which, include, purely, continuous, normal, gamma, inverse, gaussian, distributions, purely, discrete, scaled, poisson, distribution, class, compound, poisson, gamma, distribut. In probability and statistics the Tweedie distributions are a family of probability distributions which include the purely continuous normal gamma and inverse Gaussian distributions the purely discrete scaled Poisson distribution and the class of compound Poisson gamma distributions which have positive mass at zero but are otherwise continuous 1 Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models 2 The Tweedie distributions were named by Bent Jorgensen 3 after Maurice Tweedie a statistician and medical physicist at the University of Liverpool UK who presented the first thorough study of these distributions in 1984 1 4 2 Contents 1 Definitions 2 Properties 2 1 Additive exponential dispersion models 2 2 Reproductive exponential dispersion models 2 3 Scale invariance 2 4 The Tweedie power variance function 2 5 The Tweedie deviance 2 6 The Tweedie cumulant generating functions 2 7 The Tweedie convergence theorem 3 Related distributions 4 Occurrence and applications 4 1 The Tweedie models and Taylor s power law 4 2 Tweedie convergence and 1 f noise 4 3 The Tweedie models and multifractality 4 4 Regional organ blood flow 4 5 Cancer metastasis 4 6 Genomic structure and evolution 4 7 Random matrix theory 4 8 The distribution of prime numbers 4 9 Other applications 5 References 6 Further readingDefinitions editThe reproductive Tweedie distributions are defined as subfamily of reproductive exponential dispersion models ED with a special mean variance relationship A random variable Y is Tweedie distributed Twp m s2 if Y E D m s 2 displaystyle Y sim mathrm ED mu sigma 2 nbsp with mean m E Y displaystyle mu operatorname E Y nbsp positive dispersion parameter s 2 displaystyle sigma 2 nbsp andVar Y s 2 m p displaystyle operatorname Var Y sigma 2 mu p nbsp where p R displaystyle p in mathbf R nbsp is called the Tweedie power parameter The probability distribution P8 s2 on the measurable sets A is given by P 8 s 2 Y A A exp 8 z k p 8 s 2 n l d z displaystyle P theta sigma 2 Y in A int A exp left frac theta cdot z kappa p theta sigma 2 right cdot nu lambda dz nbsp for some s finite measure nl This representation uses the canonical parameter 8 of an exponential dispersion model and cumulant function k p 8 a 1 a 8 a 1 a for p 1 2 log 8 for p 2 e 8 for p 1 displaystyle kappa p theta begin cases frac alpha 1 alpha left frac theta alpha 1 right alpha amp text for p neq 1 2 log theta amp text for p 2 e theta amp text for p 1 end cases nbsp where we used a p 2 p 1 displaystyle alpha frac p 2 p 1 nbsp or equivalently p a 2 a 1 displaystyle p frac alpha 2 alpha 1 nbsp Properties editAdditive exponential dispersion models edit The models just described are in the reproductive form An exponential dispersion model has always a dual the additive form If Y is reproductive then Z l Y displaystyle Z lambda Y nbsp with l 1 s 2 displaystyle lambda frac 1 sigma 2 nbsp is in the additive form ED 8 l for Tweedie Tw p m l Additive models have the property that the distribution of the sum of independent random variables Z Z 1 Z n displaystyle Z Z 1 cdots Z n nbsp for which Zi ED 8 li with fixed 8 and various l are members of the family of distributions with the same 8 Z ED 8 l 1 l n displaystyle Z sim operatorname ED theta lambda 1 cdots lambda n nbsp Reproductive exponential dispersion models edit A second class of exponential dispersion models exists designated by the random variableY Z l ED m s 2 displaystyle Y Z lambda sim operatorname ED mu sigma 2 nbsp where s2 1 l known as reproductive exponential dispersion models They have the property that for n independent random variables Yi ED m s2 wi with weighting factors wi and w i 1 n w i displaystyle w sum i 1 n w i nbsp a weighted average of the variables gives w 1 i 1 n w i Y i ED m s 2 w displaystyle w 1 sum i 1 n w i Y i sim operatorname ED mu sigma 2 w nbsp For reproductive models the weighted average of independent random variables with fixed m and s2 and various values for wi is a member of the family of distributions with same m and s2 The Tweedie exponential dispersion models are both additive and reproductive we thus have the duality transformationY Z Y s 2 displaystyle Y mapsto Z Y sigma 2 nbsp Scale invariance edit A third property of the Tweedie models is that they are scale invariant For a reproductive exponential dispersion model Twp m s2 and any positive constant c we have the property of closure under scale transformation c Tw p m s 2 Tw p c m c 2 p s 2 displaystyle c operatorname Tw p mu sigma 2 operatorname Tw p c mu c 2 p sigma 2 nbsp The Tweedie power variance function edit To define the variance function for exponential dispersion models we make use of the mean value mapping the relationship between the canonical parameter 8 and the mean m It is defined by the functiont 8 k 8 m displaystyle tau theta kappa prime theta mu nbsp with cumulative function k 8 displaystyle kappa theta nbsp The variance function V m is constructed from the mean value mapping V m t t 1 m displaystyle V mu tau prime tau 1 mu nbsp Here the minus exponent in t 1 m denotes an inverse function rather than a reciprocal The mean and variance of an additive random variable is then E Z lm and var Z lV m Scale invariance implies that the variance function obeys the relationship V m mp 2 The Tweedie deviance edit The unit deviance of a reproductive Tweedie distribution is given byd y m y m 2 for p 0 2 y log y m m y for p 1 2 log m y y m 1 for p 2 2 max y 0 2 p 1 p 2 p y m 1 p 1 p m 2 p 2 p else displaystyle d y mu begin cases y mu 2 amp text for p 0 2 y log y mu mu y amp text for p 1 2 log mu y y mu 1 amp text for p 2 2 left frac max y 0 2 p 1 p 2 p frac y mu 1 p 1 p frac mu 2 p 2 p right amp text else end cases nbsp The Tweedie cumulant generating functions edit The properties of exponential dispersion models give us two differential equations 2 The first relates the mean value mapping and the variance function to each other t 1 m m 1 V m displaystyle frac partial tau 1 mu partial mu frac 1 V mu nbsp The second shows how the mean value mapping is related to the cumulant function k 8 8 t 8 displaystyle frac partial kappa theta partial theta tau theta nbsp These equations can be solved to obtain the cumulant function for different cases of the Tweedie models A cumulant generating function CGF may then be obtained from the cumulant function The additive CGF is generally specified by the equationK s log E e s Z l k 8 s k 8 displaystyle K s log operatorname E e sZ lambda kappa theta s kappa theta nbsp and the reproductive CGF by K s log E e s Y l k 8 s l k 8 displaystyle K s log operatorname E e sY lambda kappa theta s lambda kappa theta nbsp where s is the generating function variable For the additive Tweedie models the CGFs take the form K p s 8 l l k p 8 1 s 8 a 1 p 1 2 l log 1 s 8 p 2 l e 8 e s 1 p 1 displaystyle K p s theta lambda begin cases lambda kappa p theta 1 s theta alpha 1 amp quad p neq 1 2 lambda log 1 s theta amp quad p 2 lambda e theta e s 1 amp quad p 1 end cases nbsp and for the reproductive models K p s 8 l l k p 8 1 s 8 l a 1 p 1 2 l log 1 s 8 l p 2 l e 8 e s l 1 p 1 displaystyle K p s theta lambda begin cases lambda kappa p theta left left 1 s theta lambda right alpha 1 right amp quad p neq 1 2 1ex lambda log 1 s theta lambda amp quad p 2 1ex lambda e theta left e s lambda 1 right amp quad p 1 end cases nbsp The additive and reproductive Tweedie models are conventionally denoted by the symbols Tw p 8 l and Twp 8 s2 respectively The first and second derivatives of the CGFs with s 0 yields the mean and variance respectively One can thus confirm that for the additive models the variance relates to the mean by the power law v a r Z E Z p displaystyle mathrm var Z propto mathrm E Z p nbsp The Tweedie convergence theorem edit The Tweedie exponential dispersion models are fundamental in statistical theory consequent to their roles as foci of convergence for a wide range of statistical processes Jorgensen et al proved a theorem that specifies the asymptotic behaviour of variance functions known as the Tweedie convergence theorem 5 This theorem in technical terms is stated thus 2 The unit variance function is regular of order p at zero or infinity provided that V m c0mp for m as it approaches zero or infinity for all real values of p and c0 gt 0 Then for a unit variance function regular of order p at either zero or infinity and forp 0 1 displaystyle p notin 0 1 nbsp for any m gt 0 displaystyle mu gt 0 nbsp and s 2 gt 0 displaystyle sigma 2 gt 0 nbsp we have c 1 ED c m s 2 c 2 p T w p m c 0 s 2 displaystyle c 1 operatorname ED c mu sigma 2 c 2 p rightarrow Tw p mu c 0 sigma 2 nbsp as c 0 displaystyle c downarrow 0 nbsp or c displaystyle c rightarrow infty nbsp respectively where the convergence is through values of c such that cm is in the domain of 8 and cp 2 s2 is in the domain of l The model must be infinitely divisible as c2 p approaches infinity 2 In nontechnical terms this theorem implies that any exponential dispersion model that asymptotically manifests a variance to mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model Almost all distribution functions with finite cumulant generating functions qualify as exponential dispersion models and most exponential dispersion models manifest variance functions of this form Hence many probability distributions have variance functions that express this asymptotic behaviour and the Tweedie distributions become foci of convergence for a wide range of data types 6 Related distributions editThe Tweedie distributions include a number of familiar distributions as well as some unusual ones each being specified by the domain of the index parameter We have the extreme stable distribution p lt 0 normal distribution p 0 Poisson distribution p 1 compound Poisson gamma distribution 1 lt p lt 2 gamma distribution p 2 positive stable distributions 2 lt p lt 3 Inverse Gaussian distribution p 3 positive stable distributions p gt 3 and extreme stable distributions p For 0 lt p lt 1 no Tweedie model exists Note that all stable distributions mean actually generated by stable distributions Occurrence and applications editThe Tweedie models and Taylor s power law edit Taylor s law is an empirical law in ecology that relates the variance of the number of individuals of a species per unit area of habitat to the corresponding mean by a power law relationship 7 For the population count Y with mean m and variance var Y Taylor s law is written var Y a m p displaystyle operatorname var Y a mu p nbsp where a and p are both positive constants Since L R Taylor described this law in 1961 there have been many different explanations offered to explain it ranging from animal behavior 7 a random walk model 8 a stochastic birth death immigration and emigration model 9 to a consequence of equilibrium and non equilibrium statistical mechanics 10 No consensus exists as to an explanation for this model Since Taylor s law is mathematically identical to the variance to mean power law that characterizes the Tweedie models it seemed reasonable to use these models and the Tweedie convergence theorem to explain the observed clustering of animals and plants associated with Taylor s law 11 12 The majority of the observed values for the power law exponent p have fallen in the interval 1 2 and so the Tweedie compound Poisson gamma distribution would seem applicable Comparison of the empirical distribution function to the theoretical compound Poisson gamma distribution has provided a means to verify consistency of this hypothesis 11 Whereas conventional models for Taylor s law have tended to involve ad hoc animal behavioral or population dynamic assumptions the Tweedie convergence theorem would imply that Taylor s law results from a general mathematical convergence effect much as how the central limit theorem governs the convergence behavior of certain types of random data Indeed any mathematical model approximation or simulation that is designed to yield Taylor s law on the basis of this theorem is required to converge to the form of the Tweedie models 6 Tweedie convergence and 1 f noise edit Pink noise or 1 f noise refers to a pattern of noise characterized by a power law relationship between its intensities S f at different frequencies f S f 1 f g displaystyle S f propto frac 1 f gamma nbsp where the dimensionless exponent g 0 1 It is found within a diverse number of natural processes 13 Many different explanations for 1 f noise exist a widely held hypothesis is based on Self organized criticality where dynamical systems close to a critical point are thought to manifest scale invariant spatial and or temporal behavior In this subsection a mathematical connection between 1 f noise and the Tweedie variance to mean power law will be described To begin we first need to introduce self similar processes For the sequence of numbersY Y i i 0 1 2 N displaystyle Y Y i i 0 1 2 ldots N nbsp with mean m E Y i displaystyle widehat mu operatorname E Y i nbsp deviations y i Y i m displaystyle y i Y i widehat mu nbsp variance s 2 E y i 2 displaystyle widehat sigma 2 operatorname E y i 2 nbsp and autocorrelation function r k E y i y i k E y i 2 displaystyle r k frac operatorname E y i y i k operatorname E y i 2 nbsp with lag k if the autocorrelation of this sequence has the long range behavior r k k d L k displaystyle r k sim k d L k nbsp as k and where L k is a slowly varying function at large values of k this sequence is called a self similar process 14 The method of expanding bins can be used to analyze self similar processes Consider a set of equal sized non overlapping bins that divides the original sequence of N elements into groups of m equal sized segments N m is integer so that new reproductive sequences based on the mean values can be defined Y i m Y i m m 1 Y i m m displaystyle Y i m left Y im m 1 cdots Y im right m nbsp The variance determined from this sequence will scale as the bin size changes such thatvar Y m s 2 m d displaystyle operatorname var Y m widehat sigma 2 m d nbsp if and only if the autocorrelation has the limiting form 15 lim k r k k d 2 d 1 d 2 displaystyle lim k to infty r k k d 2 d 1 d 2 nbsp One can also construct a set of corresponding additive sequencesZ i m m Y i m displaystyle Z i m mY i m nbsp based on the expanding bins Z i m Y i m m 1 Y i m displaystyle Z i m Y im m 1 cdots Y im nbsp Provided the autocorrelation function exhibits the same behavior the additive sequences will obey the relationshipvar Z i m m 2 var Y m s 2 m 2 d E Z i m 2 d displaystyle operatorname var Z i m m 2 operatorname var Y m left frac widehat sigma 2 widehat mu 2 d right operatorname E Z i m 2 d nbsp Since m displaystyle widehat mu nbsp and s 2 displaystyle widehat sigma 2 nbsp are constants this relationship constitutes a variance to mean power law with p 2 d 6 16 The biconditional relationship above between the variance to mean power law and power law autocorrelation function and the Wiener Khinchin theorem 17 imply that any sequence that exhibits a variance to mean power law by the method of expanding bins will also manifest 1 f noise and vice versa Moreover the Tweedie convergence theorem by virtue of its central limit like effect of generating distributions that manifest variance to mean power functions will also generate processes that manifest 1 f noise 6 The Tweedie convergence theorem thus provides an alternative explanation for the origin of 1 f noise based its central limit like effect Much as the central limit theorem requires certain kinds of random processes to have as a focus of their convergence the Gaussian distribution and thus express white noise the Tweedie convergence theorem requires certain non Gaussian processes to have as a focus of convergence the Tweedie distributions that express 1 f noise 6 The Tweedie models and multifractality edit From the properties of self similar processes the power law exponent p 2 d is related to the Hurst exponent H and the fractal dimension D by 15 D 2 H 2 p 2 displaystyle D 2 H 2 p 2 nbsp A one dimensional data sequence of self similar data may demonstrate a variance to mean power law with local variations in the value of p and hence in the value of D When fractal structures manifest local variations in fractal dimension they are said to be multifractals Examples of data sequences that exhibit local variations in p like this include the eigenvalue deviations of the Gaussian Orthogonal and Unitary Ensembles 6 The Tweedie compound Poisson gamma distribution has served to model multifractality based on local variations in the Tweedie exponent a Consequently in conjunction with the variation of a the Tweedie convergence theorem can be viewed as having a role in the genesis of such multifractals The variation of a has been found to obey the asymmetric Laplace distribution in certain cases 18 This distribution has been shown to be a member of the family of geometric Tweedie models 19 that manifest as limiting distributions in a convergence theorem for geometric dispersion models Regional organ blood flow edit Regional organ blood flow has been traditionally assessed by the injection of radiolabelled polyethylene microspheres into the arterial circulation of animals of a size that they become entrapped within the microcirculation of organs The organ to be assessed is then divided into equal sized cubes and the amount of radiolabel within each cube is evaluated by liquid scintillation counting and recorded The amount of radioactivity within each cube is taken to reflect the blood flow through that sample at the time of injection It is possible to evaluate adjacent cubes from an organ in order to additively determine the blood flow through larger regions Through the work of J B Bassingthwaighte and others an empirical power law has been derived between the relative dispersion of blood flow of tissue samples RD standard deviation mean of mass m relative to reference sized samples 20 R D m R D m ref m m ref 1 D s displaystyle RD m RD m text ref left frac m m text ref right 1 D s nbsp This power law exponent Ds has been called a fractal dimension Bassingthwaighte s power law can be shown to directly relate to the variance to mean power law Regional organ blood flow can thus be modelled by the Tweedie compound Poisson gamma distribution 21 In this model tissue sample could be considered to contain a random Poisson distributed number of entrapment sites each with gamma distributed blood flow Blood flow at this microcirculatory level has been observed to obey a gamma distribution 22 thus providing support for this hypothesis Cancer metastasis edit The experimental cancer metastasis assay 23 has some resemblance to the above method to measure regional blood flow Groups of syngeneic and age matched mice are given intravenous injections of equal sized aliquots of suspensions of cloned cancer cells and then after a set period of time their lungs are removed and the number of cancer metastases enumerated within each pair of lungs If other groups of mice are injected with different cancer cell clones then the number of metastases per group will differ in accordance with the metastatic potentials of the clones It has been long recognized that there can be considerable intraclonal variation in the numbers of metastases per mouse despite the best attempts to keep the experimental conditions within each clonal group uniform 23 This variation is larger than would be expected on the basis of a Poisson distribution of numbers of metastases per mouse in each clone and when the variance of the number of metastases per mouse was plotted against the corresponding mean a power law was found 24 The variance to mean power law for metastases was found to also hold for spontaneous murine metastases 25 and for cases series of human metastases 26 Since hematogenous metastasis occurs in direct relationship to regional blood flow 27 and videomicroscopic studies indicate that the passage and entrapment of cancer cells within the circulation appears analogous to the microsphere experiments 28 it seemed plausible to propose that the variation in numbers of hematogenous metastases could reflect heterogeneity in regional organ blood flow 29 The blood flow model was based on the Tweedie compound Poisson gamma distribution a distribution governing a continuous random variable For that reason in the metastasis model it was assumed that blood flow was governed by that distribution and that the number of regional metastases occurred as a Poisson process for which the intensity was directly proportional to blood flow This led to the description of the Poisson negative binomial PNB distribution as a discrete equivalent to the Tweedie compound Poisson gamma distribution The probability generating function for the PNB distribution isG s exp l a 1 a 8 a 1 a 1 1 8 s 8 a 1 displaystyle G s exp left lambda frac alpha 1 alpha left frac theta alpha 1 right alpha left left 1 frac 1 theta frac s theta right alpha 1 right right nbsp The relationship between the mean and variance of the PNB distribution is thenvar Y a E Y b E Y displaystyle operatorname var Y a operatorname E Y b operatorname E Y nbsp which in the range of many experimental metastasis assays would be indistinguishable from the variance to mean power law For sparse data however this discrete variance to mean relationship would behave more like that of a Poisson distribution where the variance equaled the mean Genomic structure and evolution edit The local density of Single Nucleotide Polymorphisms SNPs within the human genome as well as that of genes appears to cluster in accord with the variance to mean power law and the Tweedie compound Poisson gamma distribution 30 31 In the case of SNPs their observed density reflects the assessment techniques the availability of genomic sequences for analysis and the nucleotide heterozygosity 32 The first two factors reflect ascertainment errors inherent to the collection methods the latter factor reflects an intrinsic property of the genome In the coalescent model of population genetics each genetic locus has its own unique history Within the evolution of a population from some species some genetic loci could presumably be traced back to a relatively recent common ancestor whereas other loci might have more ancient genealogies More ancient genomic segments would have had more time to accumulate SNPs and to experience recombination R R Hudson has proposed a model where recombination could cause variation in the time to most common recent ancestor for different genomic segments 33 A high recombination rate could cause a chromosome to contain a large number of small segments with less correlated genealogies Assuming a constant background rate of mutation the number of SNPs per genomic segment would accumulate proportionately to the time to the most recent common ancestor Current population genetic theory would indicate that these times would be gamma distributed on average 34 The Tweedie compound Poisson gamma distribution would suggest a model whereby the SNP map would consist of multiple small genomic segments with the mean number of SNPs per segment would be gamma distributed as per Hudson s model The distribution of genes within the human genome also demonstrated a variance to mean power law when the method of expanding bins was used to determine the corresponding variances and means 31 Similarly the number of genes per enumerative bin was found to obey a Tweedie compound Poisson gamma distribution This probability distribution was deemed compatible with two different biological models the microarrangement model where the number of genes per unit genomic length was determined by the sum of a random number of smaller genomic segments derived by random breakage and reconstruction of protochormosomes These smaller segments would be assumed to carry on average a gamma distributed number of genes In the alternative gene cluster model genes would be distributed randomly within the protochromosomes Over large evolutionary timescales there would occur tandem duplication mutations insertions deletions and rearrangements that could affect the genes through a stochastic birth death and immigration process to yield the Tweedie compound Poisson gamma distribution Both these mechanisms would implicate neutral evolutionary processes that would result in regional clustering of genes Random matrix theory edit The Gaussian unitary ensemble GUE consists of complex Hermitian matrices that are invariant under unitary transformations whereas the Gaussian orthogonal ensemble GOE consists of real symmetric matrices invariant under orthogonal transformations The ranked eigenvalues En from these random matrices obey Wigner s semicircular distribution For a N N matrix the average density for eigenvalues of size E will ber E 2 N E 2 p E lt 2 N 0 E gt 2 N displaystyle bar rho E begin cases sqrt 2N E 2 pi amp quad left vert E right vert lt sqrt 2N 0 amp quad left vert E right vert gt sqrt 2N end cases nbsp as E Integration of the semicircular rule provides the number of eigenvalues on average less than E h E 1 2 p E 2 N E 2 2 N arcsin E 2 N p N displaystyle bar eta E frac 1 2 pi left E sqrt 2N E 2 2N arcsin left frac E sqrt 2N right pi N right nbsp The ranked eigenvalues can be unfolded or renormalized with the equatione n h E E n d E r E displaystyle e n bar eta E int infty E n dE bar rho E nbsp This removes the trend of the sequence from the fluctuating portion If we look at the absolute value of the difference between the actual and expected cumulative number of eigenvalues D n n h E n displaystyle left bar D n right left n bar eta E n right nbsp we obtain a sequence of eigenvalue fluctuations which using the method of expanding bins reveals a variance to mean power law 6 The eigenvalue fluctuations of both the GUE and the GOE manifest this power law with the power law exponents ranging between 1 and 2 and they similarly manifest 1 f noise spectra These eigenvalue fluctuations also correspond to the Tweedie compound Poisson gamma distribution and they exhibit multifractality 6 The distribution of prime numbers edit The second Chebyshev function ps x is given by ps x p k x log p n x L n displaystyle psi x sum widehat p k leq x log widehat p sum n leq x Lambda n nbsp where the summation extends over all prime powers p k displaystyle widehat p k nbsp not exceeding x x runs over the positive real numbers and L n displaystyle Lambda n nbsp is the von Mangoldt function The function ps x is related to the prime counting function p x and as such provides information with regards to the distribution of prime numbers amongst the real numbers It is asymptotic to x a statement equivalent to the prime number theorem and it can also be shown to be related to the zeros of the Riemann zeta function located on the critical strip r where the real part of the zeta zero r is between 0 and 1 Then ps expressed for x greater than one can be written ps 0 x x r x r r ln 2 p 1 2 ln 1 x 2 displaystyle psi 0 x x sum rho frac x rho rho ln 2 pi frac 1 2 ln 1 x 2 nbsp where ps 0 x lim e 0 ps x e ps x e 2 displaystyle psi 0 x lim varepsilon rightarrow 0 frac psi x varepsilon psi x varepsilon 2 nbsp The Riemann hypothesis states that the nontrivial zeros of the Riemann zeta function all have real part 1 2 These zeta function zeros are related to the distribution of prime numbers Schoenfeld 35 has shown that if the Riemann hypothesis is true thenD x ps x x lt x log 2 x 8 p displaystyle Delta x left vert psi x x right vert lt sqrt x log 2 x 8 pi nbsp for all x gt 73 2 displaystyle x gt 73 2 nbsp If we analyze the Chebyshev deviations D n on the integers n using the method of expanding bins and plot the variance versus the mean a variance to mean power law can be demonstrated citation needed Moreover these deviations correspond to the Tweedie compound Poisson gamma distribution and they exhibit 1 f noise Other applications edit Applications of Tweedie distributions include actuarial studies 36 37 38 39 40 41 42 assay analysis 43 44 survival analysis 45 46 47 ecology 11 analysis of alcohol consumption in British teenagers 48 medical applications 49 health economics 50 meteorology and climatology 49 51 fisheries 52 Mertens function 53 self organized criticality 54 References edit a b Tweedie M C K 1984 An index which distinguishes between some important exponential families In Ghosh J K Roy J eds Statistics Applications and New Directions Proceedings of the Indian Statistical Institute Golden Jubilee International Conference Calcutta Indian Statistical Institute pp 579 604 MR 0786162 a b c d e f Jorgensen Bent 1997 The theory of dispersion models Chapman amp Hall ISBN 978 0412997112 Jorgensen B 1987 Exponential dispersion models Journal of the Royal Statistical Society Series B 49 2 127 162 JSTOR 2345415 Smith C A B 1997 Obituary Maurice Charles Kenneth Tweedie 1919 96 Journal of the Royal Statistical Society Series A 160 1 151 154 doi 10 1111 1467 985X 00052 Jorgensen B Martinez JR Tsao M 1994 Asymptotic behaviour of the variance function Scandinavian Journal of Statistics 21 223 243 a b c d e f g h Kendal W S Jorgensen B 2011 Tweedie convergence A mathematical basis for Taylor s power law 1 f noise and multifractality Physical Review E 84 6 066120 Bibcode 2011PhRvE 84f6120K doi 10 1103 PhysRevE 84 066120 PMID 22304168 a b Taylor LR 1961 Aggregation variance and the mean Nature 189 4766 732 735 Bibcode 1961Natur 189 732T doi 10 1038 189732a0 S2CID 4263093 Hanski I 1980 Spatial patterns and movements in coprophagous beetles Oikos 34 3 293 310 Bibcode 1980Oikos 34 293H doi 10 2307 3544289 JSTOR 3544289 Anderson RD Crawley GM Hassell M 1982 Variability in the abundance of animal and plant species Nature 296 5854 245 248 Bibcode 1982Natur 296 245A doi 10 1038 296245a0 S2CID 4272853 Fronczak A Fronczak P 2010 Origins of Taylor s power law for fluctuation scaling in complex systems Phys Rev E 81 6 066112 arXiv 0909 1896 Bibcode 2010PhRvE 81f6112F doi 10 1103 physreve 81 066112 PMID 20866483 S2CID 17435198 a b c Kendal WS 2002 Spatial aggregation of the Colorado potato beetle described by an exponential dispersion model Ecological Modelling 151 2 3 261 269 doi 10 1016 s0304 3800 01 00494 x Kendal WS 2004 Taylor s ecological power law as a consequence of scale invariant exponential dispersion models Ecol Complex 1 3 193 209 doi 10 1016 j ecocom 2004 05 001 Dutta P Horn PM 1981 Low frequency fluctuations in solids 1 f noise Rev Mod Phys 53 3 497 516 Bibcode 1981RvMP 53 497D doi 10 1103 revmodphys 53 497 Leland WE Taqqu MS Willinger W Wilson DV 1994 On the self similar nature of Ethernet traffic Extended version IEEE ACM Transactions on Networking 2 1 15 doi 10 1109 90 282603 S2CID 6011907 a b Tsybakov B Georganas ND 1997 On self similar traffic in ATM queues definitions overflow probability bound and cell delay distribution IEEE ACM Transactions on Networking 5 3 397 409 CiteSeerX 10 1 1 53 5040 doi 10 1109 90 611104 S2CID 2205855 Kendal WS 2007 Scale invariant correlations between genes and SNPs on Human chromosome 1 reveal potential evolutionary mechanisms J Theor Biol 245 2 329 340 Bibcode 2007JThBi 245 329K doi 10 1016 j jtbi 2006 10 010 PMID 17137602 McQuarrie DA 1976 Statistical mechanics Harper amp Row Kendal WS 2014 Multifractality attributed to dual central limit lie convergence effects Physica A 401 22 33 Bibcode 2014PhyA 401 22K doi 10 1016 j physa 2014 01 022 Jorgensen B Kokonendji CC 2011 Dispersion models for geometric sums Braz J Probab Stat 25 3 263 293 doi 10 1214 10 bjps136 Bassingthwaighte JB 1989 Fractal nature of regional myocardial blood flow heterogeneity Circ Res 65 3 578 590 doi 10 1161 01 res 65 3 578 PMC 3361973 PMID 2766485 Kendal WS 2001 A stochastic model for the self similar heterogeneity of regional organ blood flow Proc Natl Acad Sci U S A 98 3 837 841 Bibcode 2001PNAS 98 837K doi 10 1073 pnas 98 3 837 PMC 14670 PMID 11158557 Honig CR Feldstein ML Frierson JL 1977 Capillary lengths anastomoses and estimated capillary transit times in skeletal muscle Am J Physiol Heart Circ Physiol 233 1 H122 H129 doi 10 1152 ajpheart 1977 233 1 h122 PMID 879328 a b Fidler IJ Kripke M 1977 Metastasis results from preexisting variant cells within a malignant tumor Science 197 4306 893 895 Bibcode 1977Sci 197 893F doi 10 1126 science 887927 PMID 887927 Kendal WS Frost P 1987 Experimental metastasis a novel application of the variance to mean power function J Natl Cancer Inst 79 5 1113 1115 doi 10 1093 jnci 79 5 1113 PMID 3479636 Kendal WS 1999 Clustering of murine lung metastases reflects fractal nonuniformity in regional lung blood flow Invasion and Metastasis 18 5 6 285 296 doi 10 1159 000024521 PMID 10729773 S2CID 46835513 Kendal WS Lagerwaard FJ Agboola O 2000 Characterization of the frequency distribution for human hematogenous metastases evidence for clustering and a power variance function Clin Exp Metastasis 18 3 219 229 doi 10 1023 A 1006737100797 PMID 11315095 S2CID 25261069 Weiss L Bronk J Pickren JW Lane WW 1981 Metastatic patterns and targe organ arterial blood flow Invasion and Metastasis 1 2 126 135 PMID 7188382 Chambers AF Groom AC MacDonald IC 2002 Dissemination and growth of cancer cells in metastatic sites Nature Reviews Cancer 2 8 563 572 doi 10 1038 nrc865 PMID 12154349 S2CID 135169 Kendal WS 2002 A frequency distribution for the number of hematogenous organ metastases Invasion and Metastasis 1 2 126 135 Bibcode 2002JThBi 217 203K doi 10 1006 jtbi 2002 3021 PMID 12202114 Kendal WS 2003 An exponential dispersion model for the distribution of human single nucleotide polymorphisms Mol Biol Evol 20 4 579 590 doi 10 1093 molbev msg057 PMID 12679541 a b Kendal WS 2004 A scale invariant clustering of genes on human chromosome 7 BMC Evol Biol 4 3 doi 10 1186 1471 2148 4 3 PMC 373443 PMID 15040817 Sachidanandam R Weissman D Schmidt SC et al 2001 A map of human genome variation containing 1 42 million single nucleotide polymorphisms Nature 409 6822 928 933 Bibcode 2001Natur 409 928S doi 10 1038 35057149 PMID 11237013 Hudson RR 1991 Gene genealogies and the coalescent process Oxford Surveys in Evolutionary Biology 7 1 44 Tavare S Balding DJ Griffiths RC Donnelly P 1997 Inferring coalescent times from DNA sequence data Genetics 145 2 505 518 doi 10 1093 genetics 145 2 505 PMC 1207814 PMID 9071603 Schoenfeld J 1976 Sharper bounds for the Chebyshev functions 8 x and ps x II Mathematics of Computation 30 134 337 360 doi 10 1090 s0025 5718 1976 0457374 x Haberman S Renshaw A E 1996 Generalized linear models and actuarial science The Statistician 45 4 407 436 doi 10 2307 2988543 JSTOR 2988543 Renshaw A E 1994 Modelling the claims process in the presence of covariates ASTIN Bulletin 24 265 286 Jorgensen B Paes Souza M C 1994 Fitting Tweedie s compound Poisson model to insurance claims data Scand Actuar J 1 69 93 CiteSeerX 10 1 1 329 9259 doi 10 1080 03461238 1994 10413930 Haberman S and Renshaw A E 1998 Actuarial applications of generalized linear models In Statistics in Finance D J Hand and S D Jacka eds Arnold London Mildenhall S J 1999 A systematic relationship between minimum bias and generalized linear models 1999 Proceedings of the Casualty Actuarial Society 86 393 487 Murphy K P Brockman M J and Lee P K W 2000 Using generalized linear models to build dynamic pricing systems Casualty Actuarial Forum Winter 2000 Smyth G K Jorgensen B 2002 Fitting Tweedie s compound Poisson model to insurance claims data dispersion modelling PDF ASTIN Bulletin 32 143 157 doi 10 2143 ast 32 1 1020 Davidian M 1990 Estimation of variance functions in assays with possible unequal replication and nonnormal data Biometrika 77 43 54 doi 10 1093 biomet 77 1 43 Davidian M Carroll R J Smith W 1988 Variance functions and the minimum detectable concentration in assays Biometrika 75 3 549 556 doi 10 1093 biomet 75 3 549 Aalen O O 1992 Modelling heterogeneity in survival analysis by the compound Poisson distribution Ann Appl Probab 2 4 951 972 doi 10 1214 aoap 1177005583 Hougaard P Harvald B Holm N V 1992 Measuring the similarities between the lifetimes of adult Danish twins born between 1881 1930 Journal of the American Statistical Association 87 417 17 24 doi 10 1080 01621459 1992 10475170 Hougaard P 1986 Survival models for heterogeneous populations derived from stable distributions Biometrika 73 2 387 396 doi 10 1093 biomet 73 2 387 Gilchrist R and Drinkwater D 1999 Fitting Tweedie models to data with probability of zero responses Proceedings of the 14th International Workshop on Statistical Modelling Graz pp 207 214 a b Smyth G K 1996 Regression analysis of quantity data with exact zeros Proceedings of the Second Australia Japan Workshop on Stochastic Models in Engineering Technology and Management Technology Management Centre University of Queensland 572 580 Kurz Christoph F 2017 Tweedie distributions for fitting semicontinuous health care utilization cost data BMC Medical Research Methodology 17 171 171 doi 10 1186 s12874 017 0445 y PMC 5735804 PMID 29258428 Hasan M M Dunn P K 2010 Two Tweedie distributions that are near optimal for modelling monthly rainfall in Australia International Journal of Climatology 31 9 1389 1397 doi 10 1002 joc 2162 S2CID 140135793 Candy S G 2004 Modelling catch and effort data using generalized linear models the Tweedie distribution random vessel effects and random stratum by year effects CCAMLR Science 11 59 80 Kendal WS Jorgensen B 2011 Taylor s power law and fluctuation scaling explained by a central limit like convergence Phys Rev E 83 6 066115 Bibcode 2011PhRvE 83f6115K doi 10 1103 physreve 83 066115 PMID 21797449 Kendal WS 2015 Self organized criticality attributed to a central limit like convergence effect Physica A 421 141 150 Bibcode 2015PhyA 421 141K doi 10 1016 j physa 2014 11 035 Further reading editDunn P K Smyth G K 2018 Generalized Linear Models With Examples in R New York Springer doi 10 1007 978 1 4419 0118 7 ISBN 978 1 4419 0118 7 Chapter 12 is about Tweedie distributions and models Kaas R 2005 Compound Poisson distribution and GLM s Tweedie s distribution In Proceedings of the Contact Forum 3rd Actuarial and Financial Mathematics Day pages 3 12 Brussels Royal Flemish Academy of Belgium for Science and the Arts Tweedie M C K 1956 Some statistical properties of Inverse Gaussian distributions Virginia J Sci New Series 7 160 165 Retrieved from https en wikipedia org w index php title Tweedie distribution amp oldid 1224590127, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.