fbpx
Wikipedia

Index of dispersion

In probability theory and statistics, the index of dispersion,[1] dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.

It is defined as the ratio of the variance to the mean ,

It is also known as the Fano factor, though this term is sometimes reserved for windowed data (the mean and variance are computed over a subpopulation), where the index of dispersion is used in the special case where the window is infinite. Windowing data is frequently done: the VMR is frequently computed over various intervals in time or small regions in space, which may be called "windows", and the resulting statistic called the Fano factor.

It is only defined when the mean is non-zero, and is generally only used for positive statistics, such as count data or time between events, or where the underlying distribution is assumed to be the exponential distribution or Poisson distribution.

Terminology edit

In this context, the observed dataset may consist of the times of occurrence of predefined events, such as earthquakes in a given region over a given magnitude, or of the locations in geographical space of plants of a given species. Details of such occurrences are first converted into counts of the numbers of events or occurrences in each of a set of equal-sized time- or space-regions.

The above defines a dispersion index for counts.[2] A different definition applies for a dispersion index for intervals,[3] where the quantities treated are the lengths of the time-intervals between the events. Common usage is that "index of dispersion" means the dispersion index for counts.

Interpretation edit

Some distributions, most notably the Poisson distribution, have equal variance and mean, giving them a VMR = 1. The geometric distribution and the negative binomial distribution have VMR > 1, while the binomial distribution has VMR < 1, and the constant random variable has VMR = 0. This yields the following table:

Distribution VMR
constant random variable VMR = 0 not dispersed
binomial distribution 0 < VMR < 1 under-dispersed
Poisson distribution VMR = 1
negative binomial distribution VMR > 1 over-dispersed

This can be considered analogous to the classification of conic sections by eccentricity; see Cumulants of particular probability distributions for details.

The relevance of the index of dispersion is that it has a value of 1 when the probability distribution of the number of occurrences in an interval is a Poisson distribution. Thus the measure can be used to assess whether observed data can be modeled using a Poisson process. When the coefficient of dispersion is less than 1, a dataset is said to be "under-dispersed": this condition can relate to patterns of occurrence that are more regular than the randomness associated with a Poisson process. For instance, regular, periodic events will be under-dispersed. If the index of dispersion is larger than 1, a dataset is said to be over-dispersed.

A sample-based estimate of the dispersion index can be used to construct a formal statistical hypothesis test for the adequacy of the model that a series of counts follow a Poisson distribution.[4][5] In terms of the interval-counts, over-dispersion corresponds to there being more intervals with low counts and more intervals with high counts, compared to a Poisson distribution: in contrast, under-dispersion is characterised by there being more intervals having counts close to the mean count, compared to a Poisson distribution.

The VMR is also a good measure of the degree of randomness of a given phenomenon. For example, this technique is commonly used in currency management.

Example edit

For randomly diffusing particles (Brownian motion), the distribution of the number of particle inside a given volume is poissonian, i.e. VMR=1. Therefore, to assess if a given spatial pattern (assuming you have a way to measure it) is due purely to diffusion or if some particle-particle interaction is involved : divide the space into patches, Quadrats or Sample Units (SU), count the number of individuals in each patch or SU, and compute the VMR. VMRs significantly higher than 1 denote a clustered distribution, where random walk is not enough to smother the attractive inter-particle potential.

History edit

The first to discuss the use of a test to detect deviations from a Poisson or binomial distribution appears to have been Lexis in 1877. One of the tests he developed was the Lexis ratio.

This index was first used in botany by Clapham in 1936.

Hoel studied the first four moments of its distribution.[6] He found that the approximation to the χ2 statistic is reasonable if μ > 5.

Skewed distributions edit

For highly skewed distributions, it may be more appropriate to use a linear loss function, as opposed to a quadratic one. The analogous coefficient of dispersion in this case is the ratio of the average absolute deviation from the median to the median of the data,[7] or, in symbols:

 

where n is the sample size, m is the sample median and the sum taken over the whole sample. Iowa, New York and South Dakota use this linear coefficient of dispersion to estimate dues taxes.[8][9][10]

For a two-sample test in which the sample sizes are large, both samples have the same median, and differ in the dispersion around it, a confidence interval for the linear coefficient of dispersion is bounded inferiorly by

 

where tj is the mean absolute deviation of the jth sample and zα is the confidence interval length for a normal distribution of confidence α (e.g., for α = 0.05, zα = 1.96).[7]

See also edit

Similar ratios edit

Notes edit

  1. ^ Cox &Lewis (1966)
  2. ^ Cox & Lewis (1966), p72
  3. ^ Cox & Lewis (1966), p71
  4. ^ Cox & Lewis (1966), p158
  5. ^ Upton & Cook(2006), under index of dispersion
  6. ^ Hoel, P. G. (1943). "On Indices of Dispersion". Annals of Mathematical Statistics. 14 (2): 155–162. doi:10.1214/aoms/1177731457. JSTOR 2235818.
  7. ^ a b Bonett, DG; Seier, E (2006). "Confidence interval for a coefficient of dispersion in non-normal distributions". Biometrical Journal. 48 (1): 144–148. doi:10.1002/bimj.200410148. PMID 16544819. S2CID 33665632.
  8. ^ (PDF). Iowa.gov. Archived from the original (PDF) on 11 November 2010. Median Ratio: The ratio located midway between the highest ratio and the lowest ratio when individual ratios for a class of realty are ranked in ascending or descending order. The median ratio is most frequently used to determine the level of assessment for a given class of real estate.
  9. ^ . Archived from the original on 6 November 2012.
  10. ^ (PDF). state.sd.us. South Dakota Department of Revenue - Property/Special Taxes Division. Archived from the original (PDF) on 10 May 2009.

References edit

  • Cox, D. R.; Lewis, P. A. W. (1966). The Statistical Analysis of Series of Events. London: Methuen.
  • Upton, G.; Cook, I. (2006). Oxford Dictionary of Statistics (2nd ed.). Oxford University Press. ISBN 978-0-19-954145-4.

index, dispersion, broader, coverage, this, topic, statistical, dispersion, probability, theory, statistics, index, dispersion, dispersion, index, coefficient, dispersion, relative, variance, variance, mean, ratio, like, coefficient, variation, normalized, mea. For broader coverage of this topic see Statistical dispersion In probability theory and statistics the index of dispersion 1 dispersion index coefficient of dispersion relative variance or variance to mean ratio VMR like the coefficient of variation is a normalized measure of the dispersion of a probability distribution it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model It is defined as the ratio of the variance s2 displaystyle sigma 2 to the mean m displaystyle mu D s2m displaystyle D sigma 2 over mu It is also known as the Fano factor though this term is sometimes reserved for windowed data the mean and variance are computed over a subpopulation where the index of dispersion is used in the special case where the window is infinite Windowing data is frequently done the VMR is frequently computed over various intervals in time or small regions in space which may be called windows and the resulting statistic called the Fano factor It is only defined when the mean m displaystyle mu is non zero and is generally only used for positive statistics such as count data or time between events or where the underlying distribution is assumed to be the exponential distribution or Poisson distribution Contents 1 Terminology 2 Interpretation 3 Example 4 History 5 Skewed distributions 6 See also 6 1 Similar ratios 7 Notes 8 ReferencesTerminology editIn this context the observed dataset may consist of the times of occurrence of predefined events such as earthquakes in a given region over a given magnitude or of the locations in geographical space of plants of a given species Details of such occurrences are first converted into counts of the numbers of events or occurrences in each of a set of equal sized time or space regions The above defines a dispersion index for counts 2 A different definition applies for a dispersion index for intervals 3 where the quantities treated are the lengths of the time intervals between the events Common usage is that index of dispersion means the dispersion index for counts Interpretation editSome distributions most notably the Poisson distribution have equal variance and mean giving them a VMR 1 The geometric distribution and the negative binomial distribution have VMR gt 1 while the binomial distribution has VMR lt 1 and the constant random variable has VMR 0 This yields the following table Distribution VMRconstant random variable VMR 0 not dispersedbinomial distribution 0 lt VMR lt 1 under dispersedPoisson distribution VMR 1negative binomial distribution VMR gt 1 over dispersedThis can be considered analogous to the classification of conic sections by eccentricity see Cumulants of particular probability distributions for details The relevance of the index of dispersion is that it has a value of 1 when the probability distribution of the number of occurrences in an interval is a Poisson distribution Thus the measure can be used to assess whether observed data can be modeled using a Poisson process When the coefficient of dispersion is less than 1 a dataset is said to be under dispersed this condition can relate to patterns of occurrence that are more regular than the randomness associated with a Poisson process For instance regular periodic events will be under dispersed If the index of dispersion is larger than 1 a dataset is said to be over dispersed A sample based estimate of the dispersion index can be used to construct a formal statistical hypothesis test for the adequacy of the model that a series of counts follow a Poisson distribution 4 5 In terms of the interval counts over dispersion corresponds to there being more intervals with low counts and more intervals with high counts compared to a Poisson distribution in contrast under dispersion is characterised by there being more intervals having counts close to the mean count compared to a Poisson distribution The VMR is also a good measure of the degree of randomness of a given phenomenon For example this technique is commonly used in currency management Example editFor randomly diffusing particles Brownian motion the distribution of the number of particle inside a given volume is poissonian i e VMR 1 Therefore to assess if a given spatial pattern assuming you have a way to measure it is due purely to diffusion or if some particle particle interaction is involved divide the space into patches Quadrats or Sample Units SU count the number of individuals in each patch or SU and compute the VMR VMRs significantly higher than 1 denote a clustered distribution where random walk is not enough to smother the attractive inter particle potential History editThe first to discuss the use of a test to detect deviations from a Poisson or binomial distribution appears to have been Lexis in 1877 One of the tests he developed was the Lexis ratio This index was first used in botany by Clapham in 1936 Hoel studied the first four moments of its distribution 6 He found that the approximation to the x2 statistic is reasonable if m gt 5 Skewed distributions editFor highly skewed distributions it may be more appropriate to use a linear loss function as opposed to a quadratic one The analogous coefficient of dispersion in this case is the ratio of the average absolute deviation from the median to the median of the data 7 or in symbols CD 1n j m xj m displaystyle CD frac 1 n frac sum j m x j m nbsp where n is the sample size m is the sample median and the sum taken over the whole sample Iowa New York and South Dakota use this linear coefficient of dispersion to estimate dues taxes 8 9 10 For a two sample test in which the sample sizes are large both samples have the same median and differ in the dispersion around it a confidence interval for the linear coefficient of dispersion is bounded inferiorly by tatbexp za var log tatb displaystyle frac t a t b exp left sqrt z alpha left operatorname var left log left frac t a t b right right right right nbsp where tj is the mean absolute deviation of the jth sample and za is the confidence interval length for a normal distribution of confidence a e g for a 0 05 za 1 96 7 See also editCount data Harmonic meanSimilar ratios edit Coefficient of variation s m displaystyle sigma mu nbsp Standardized moment mk sk displaystyle mu k sigma k nbsp Fano factor sW2 mW displaystyle sigma W 2 mu W nbsp windowed VMR signal to noise ratio m s displaystyle mu sigma nbsp in signal processing Notes edit Cox amp Lewis 1966 Cox amp Lewis 1966 p72 Cox amp Lewis 1966 p71 Cox amp Lewis 1966 p158 Upton amp Cook 2006 under index of dispersion Hoel P G 1943 On Indices of Dispersion Annals of Mathematical Statistics 14 2 155 162 doi 10 1214 aoms 1177731457 JSTOR 2235818 a b Bonett DG Seier E 2006 Confidence interval for a coefficient of dispersion in non normal distributions Biometrical Journal 48 1 144 148 doi 10 1002 bimj 200410148 PMID 16544819 S2CID 33665632 Statistical Calculation Definitions for Mass Appraisal PDF Iowa gov Archived from the original PDF on 11 November 2010 Median Ratio The ratio located midway between the highest ratio and the lowest ratio when individual ratios for a class of realty are ranked in ascending or descending order The median ratio is most frequently used to determine the level of assessment for a given class of real estate Assessment equity in New York Results from the 2010 market value survey Archived from the original on 6 November 2012 Summary of the Assessment Process PDF state sd us South Dakota Department of Revenue Property Special Taxes Division Archived from the original PDF on 10 May 2009 References editCox D R Lewis P A W 1966 The Statistical Analysis of Series of Events London Methuen Upton G Cook I 2006 Oxford Dictionary of Statistics 2nd ed Oxford University Press ISBN 978 0 19 954145 4 Retrieved from https en wikipedia org w index php title Index of dispersion amp oldid 1211979036, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.