fbpx
Wikipedia

Wallenius' noncentral hypergeometric distribution

In probability theory and statistics, Wallenius' noncentral hypergeometric distribution (named after Kenneth Ted Wallenius) is a generalization of the hypergeometric distribution where items are sampled with bias.

Probability mass function for Wallenius' Noncentral Hypergeometric Distribution for different values of the odds ratio ω.
m1 = 80, m2 = 60, n = 100, ω = 0.1 ... 20

This distribution can be illustrated as an urn model with bias. Assume, for example, that an urn contains m1 red balls and m2 white balls, totalling N = m1 + m2 balls. Each red ball has the weight ω1 and each white ball has the weight ω2. We will say that the odds ratio is ω = ω1 / ω2. Now we are taking n balls, one by one, in such a way that the probability of taking a particular ball at a particular draw is equal to its proportion of the total weight of all balls that lie in the urn at that moment. The number of red balls x1 that we get in this experiment is a random variable with Wallenius' noncentral hypergeometric distribution.

The matter is complicated by the fact that there is more than one noncentral hypergeometric distribution. Wallenius' noncentral hypergeometric distribution is obtained if balls are sampled one by one in such a way that there is competition between the balls. Fisher's noncentral hypergeometric distribution is obtained if the balls are sampled simultaneously or independently of each other. Unfortunately, both distributions are known in the literature as "the" noncentral hypergeometric distribution. It is important to be specific about which distribution is meant when using this name.

The two distributions are both equal to the (central) hypergeometric distribution when the odds ratio is 1.

The difference between these two probability distributions is subtle. See the Wikipedia entry on noncentral hypergeometric distributions for a more detailed explanation.

Univariate distribution edit

Univariate Wallenius' Noncentral Hypergeometric Distribution
Parameters  
 
 
 
Support  
 
 
PMF  
where  
Mean Approximated by solution   to
 
Variance  ,   where
 
 
Recursive calculation of probability f(x,n) in Wallenius' distribution. The light grey fields are possible points on the way to the final point. The arrows indicate an arbitrary trajectory.

Wallenius' distribution is particularly complicated because each ball has a probability of being taken that depends not only on its weight, but also on the total weight of its competitors. And the weight of the competing balls depends on the outcomes of all preceding draws.

This recursive dependency gives rise to a difference equation with a solution that is given in open form by the integral in the expression of the probability mass function in the table above.

Closed form expressions for the probability mass function exist (Lyons, 1980), but they are not very useful for practical calculations because of extreme numerical instability, except in degenerate cases.

Several other calculation methods are used, including recursion, Taylor expansion and numerical integration (Fog, 2007, 2008).

The most reliable calculation method is recursive calculation of f(x,n) from f(x,n-1) and f(x-1,n-1) using the recursion formula given below under properties. The probabilities of all (x,n) combinations on all possible trajectories leading to the desired point are calculated, starting with f(0,0) = 1 as shown on the figure to the right. The total number of probabilities to calculate is n(x+1)-x2. Other calculation methods must be used when n and x are so big that this method is too inefficient.

The probability that all balls have the same color is easier to calculate. See the formula below under multivariate distribution.

No exact formula for the mean is known (short of complete enumeration of all probabilities). The equation given above is reasonably accurate. This equation can be solved for μ by Newton-Raphson iteration. The same equation can be used for estimating the odds from an experimentally obtained value of the mean.

Properties of the univariate distribution edit

Wallenius' distribution has fewer symmetry relations than Fisher's noncentral hypergeometric distribution has. The only symmetry relates to the swapping of colors:

 

Unlike Fisher's distribution, Wallenius' distribution has no symmetry relating to the number of balls not taken.

The following recursion formula is useful for calculating probabilities:

 
 
 

Another recursion formula is also known:

 
 
 

The probability is limited by

 
 
 
 

where the underlined superscript indicates the falling factorial  .

Multivariate distribution edit

The distribution can be expanded to any number of colors c of balls in the urn. The multivariate distribution is used when there are more than two colors.

Multivariate Wallenius' Noncentral Hypergeometric Distribution
Parameters  
 
 
 
 
Support  
PMF  
where  
Mean Approximated by solution   to
 
 
Variance Approximated by variance of Fisher's noncentral hypergeometric distribution with same mean.

The probability mass function can be calculated by various Taylor expansion methods or by numerical integration (Fog, 2008).

The probability that all balls have the same color, j, can be calculated as:

 

for xj = nmj, where the underlined superscript denotes the falling factorial.

A reasonably good approximation to the mean can be calculated using the equation given above. The equation can be solved by defining θ so that

 

and solving

 

for θ by Newton-Raphson iteration.

The equation for the mean is also useful for estimating the odds from experimentally obtained values for the mean.

No good way of calculating the variance is known. The best known method is to approximate the multivariate Wallenius distribution by a multivariate Fisher's noncentral hypergeometric distribution with the same mean, and insert the mean as calculated above in the approximate formula for the variance of the latter distribution.

Properties of the multivariate distribution edit

The order of the colors is arbitrary so that any colors can be swapped.

The weights can be arbitrarily scaled:

  for all  .

Colors with zero number (mi = 0) or zero weight (ωi = 0) can be omitted from the equations.

Colors with the same weight can be joined:

 
 
 

where   is the (univariate, central) hypergeometric distribution probability.

Complementary Wallenius' noncentral hypergeometric distribution edit

 
Probability mass function for the Complementary Wallenius' Noncentral Hypergeometric Distribution for different values of the odds ratio ω.
m1 = 80, m2 = 60, n = 40, ω = 0.05 ... 10

The balls that are not taken in the urn experiment have a distribution that is different from Wallenius' noncentral hypergeometric distribution, due to a lack of symmetry. The distribution of the balls not taken can be called the complementary Wallenius' noncentral hypergeometric distribution.

Probabilities in the complementary distribution are calculated from Wallenius' distribution by replacing n with N-n, xi with mi - xi, and ωi with 1/ωi.

Software available edit

  • WalleniusHypergeometricDistribution in Mathematica.
  • An implementation for the R programming language is available as the package named BiasedUrn. Includes univariate and multivariate probability mass functions, distribution functions, quantiles, random variable generating functions, mean and variance.
  • Implementation in C++ is available from www.agner.org.

See also edit

References edit

  • Chesson, J. (1976). "A non-central multivariate hypergeometric distribution arising from biased sampling with application to selective predation". Journal of Applied Probability. Vol. 13, no. 4. Applied Probability Trust. pp. 795–797. doi:10.2307/3212535. JSTOR 3212535.
  • Fog, A. (2007). "Random number theory".
  • Fog, A. (2008). "Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution". Communications in Statictics, Simulation and Computation. 37 (2): 258–273. doi:10.1080/03610910701790269. S2CID 9040568.
  • Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005). Univariate Discrete Distributions. Hoboken, New Jersey: Wiley and Sons.
  • Lyons, N. I. (1980). "Closed Expressions for Noncentral Hypergeometric Probabilities". Communications in Statistics - Simulation and Computation. Vol. 9, no. 3. pp. 313–314. doi:10.1080/03610918008812156.
  • Manly, B. F. J. (1974). "A Model for Certain Types of Selection Experiments". Biometrics. Vol. 30, no. 2. International Biometric Society. pp. 281–294. doi:10.2307/2529649. JSTOR 2529649.
  • Wallenius, K. T. (1963). Biased Sampling: The Non-central Hypergeometric Probability Distribution. Ph.D. Thesis (Thesis). Stanford University, Department of Statistics.

wallenius, noncentral, hypergeometric, distribution, probability, theory, statistics, named, after, kenneth, wallenius, generalization, hypergeometric, distribution, where, items, sampled, with, bias, probability, mass, function, wallenius, noncentral, hyperge. In probability theory and statistics Wallenius noncentral hypergeometric distribution named after Kenneth Ted Wallenius is a generalization of the hypergeometric distribution where items are sampled with bias Probability mass function for Wallenius Noncentral Hypergeometric Distribution for different values of the odds ratio w m1 80 m2 60 n 100 w 0 1 20This distribution can be illustrated as an urn model with bias Assume for example that an urn contains m1 red balls and m2 white balls totalling N m1 m2 balls Each red ball has the weight w1 and each white ball has the weight w2 We will say that the odds ratio is w w1 w2 Now we are taking n balls one by one in such a way that the probability of taking a particular ball at a particular draw is equal to its proportion of the total weight of all balls that lie in the urn at that moment The number of red balls x1 that we get in this experiment is a random variable with Wallenius noncentral hypergeometric distribution The matter is complicated by the fact that there is more than one noncentral hypergeometric distribution Wallenius noncentral hypergeometric distribution is obtained if balls are sampled one by one in such a way that there is competition between the balls Fisher s noncentral hypergeometric distribution is obtained if the balls are sampled simultaneously or independently of each other Unfortunately both distributions are known in the literature as the noncentral hypergeometric distribution It is important to be specific about which distribution is meant when using this name The two distributions are both equal to the central hypergeometric distribution when the odds ratio is 1 The difference between these two probability distributions is subtle See the Wikipedia entry on noncentral hypergeometric distributions for a more detailed explanation Contents 1 Univariate distribution 1 1 Properties of the univariate distribution 2 Multivariate distribution 2 1 Properties of the multivariate distribution 3 Complementary Wallenius noncentral hypergeometric distribution 4 Software available 5 See also 6 ReferencesUnivariate distribution editUnivariate Wallenius Noncentral Hypergeometric DistributionParametersm 1 m 2 N displaystyle m 1 m 2 in mathbb N nbsp N m 1 m 2 displaystyle N m 1 m 2 nbsp n 0 N displaystyle n in 0 N nbsp w R displaystyle omega in mathbb R nbsp Supportx x m i n x m a x displaystyle x in x min x max nbsp x m i n max 0 n m 2 displaystyle x min max 0 n m 2 nbsp x m a x min n m 1 displaystyle x max min n m 1 nbsp PMF m 1 x m 2 n x 0 1 1 t w D x 1 t 1 D n x d t displaystyle binom m 1 x binom m 2 n x int 0 1 1 t omega D x 1 t 1 D n x operatorname d t nbsp where D w m 1 x m 2 n x displaystyle D omega m 1 x m 2 n x nbsp MeanApproximated by solution m displaystyle mu nbsp to m m 1 1 n m m 2 w 1 displaystyle frac mu m 1 left 1 frac n mu m 2 right omega 1 nbsp Variance N a b N 1 m 1 b m 2 a displaystyle approx frac Nab N 1 m 1 b m 2 a nbsp wherea m m 1 m b n m m m 2 n displaystyle a mu m 1 mu b n mu mu m 2 n nbsp nbsp Recursive calculation of probability f x n in Wallenius distribution The light grey fields are possible points on the way to the final point The arrows indicate an arbitrary trajectory Wallenius distribution is particularly complicated because each ball has a probability of being taken that depends not only on its weight but also on the total weight of its competitors And the weight of the competing balls depends on the outcomes of all preceding draws This recursive dependency gives rise to a difference equation with a solution that is given in open form by the integral in the expression of the probability mass function in the table above Closed form expressions for the probability mass function exist Lyons 1980 but they are not very useful for practical calculations because of extreme numerical instability except in degenerate cases Several other calculation methods are used including recursion Taylor expansion and numerical integration Fog 2007 2008 The most reliable calculation method is recursive calculation of f x n from f x n 1 and f x 1 n 1 using the recursion formula given below under properties The probabilities of all x n combinations on all possible trajectories leading to the desired point are calculated starting with f 0 0 1 as shown on the figure to the right The total number of probabilities to calculate is n x 1 x2 Other calculation methods must be used when n and x are so big that this method is too inefficient The probability that all balls have the same color is easier to calculate See the formula below under multivariate distribution No exact formula for the mean is known short of complete enumeration of all probabilities The equation given above is reasonably accurate This equation can be solved for m by Newton Raphson iteration The same equation can be used for estimating the odds from an experimentally obtained value of the mean Properties of the univariate distribution edit Wallenius distribution has fewer symmetry relations than Fisher s noncentral hypergeometric distribution has The only symmetry relates to the swapping of colors wnchypg x n m 1 m 2 w wnchypg n x n m 2 m 1 1 w displaystyle operatorname wnchypg x n m 1 m 2 omega operatorname wnchypg n x n m 2 m 1 1 omega nbsp Unlike Fisher s distribution Wallenius distribution has no symmetry relating to the number of balls not taken The following recursion formula is useful for calculating probabilities wnchypg x n m 1 m 2 w displaystyle operatorname wnchypg x n m 1 m 2 omega nbsp wnchypg x 1 n 1 m 1 m 2 w m 1 x 1 w m 1 x 1 w m 2 x n displaystyle operatorname wnchypg x 1 n 1 m 1 m 2 omega frac m 1 x 1 omega m 1 x 1 omega m 2 x n nbsp wnchypg x n 1 m 1 m 2 w m 2 x n 1 m 1 x w m 2 x n 1 displaystyle operatorname wnchypg x n 1 m 1 m 2 omega frac m 2 x n 1 m 1 x omega m 2 x n 1 nbsp dd Another recursion formula is also known wnchypg x n m 1 m 2 w displaystyle operatorname wnchypg x n m 1 m 2 omega nbsp wnchypg x 1 n 1 m 1 1 m 2 w m 1 w m 1 w m 2 displaystyle operatorname wnchypg x 1 n 1 m 1 1 m 2 omega frac m 1 omega m 1 omega m 2 nbsp wnchypg x n 1 m 1 m 2 1 w m 2 m 1 w m 2 displaystyle operatorname wnchypg x n 1 m 1 m 2 1 omega frac m 2 m 1 omega m 2 nbsp dd The probability is limited by f 1 x wnchypg x n m 1 m 2 w f 2 x for w lt 1 displaystyle operatorname f 1 x leq operatorname wnchypg x n m 1 m 2 omega leq operatorname f 2 x text for omega lt 1 nbsp f 1 x wnchypg x n m 1 m 2 w f 2 x for w gt 1 where displaystyle operatorname f 1 x geq operatorname wnchypg x n m 1 m 2 omega geq operatorname f 2 x text for omega gt 1 text where nbsp f 1 x m 1 x m 2 n x n m 1 m 2 w x m 2 w m 1 x n x displaystyle operatorname f 1 x binom m 1 x binom m 2 n x frac n m 1 m 2 omega underline x m 2 omega m 1 x underline n x nbsp f 2 x m 1 x m 2 n x n m 1 m 2 x 2 w x m 2 w m 1 n x displaystyle operatorname f 2 x binom m 1 x binom m 2 n x frac n m 1 m 2 x 2 omega underline x m 2 omega m 1 underline n x nbsp where the underlined superscript indicates the falling factorial a b a a 1 a b 1 displaystyle a underline b a a 1 ldots a b 1 nbsp Multivariate distribution editThe distribution can be expanded to any number of colors c of balls in the urn The multivariate distribution is used when there are more than two colors Multivariate Wallenius Noncentral Hypergeometric DistributionParametersc N displaystyle c in mathbb N nbsp m m 1 m c N c displaystyle mathbf m m 1 ldots m c in mathbb N c nbsp N i 1 c m i displaystyle N sum i 1 c m i nbsp n 0 N displaystyle n in 0 N nbsp w w 1 w c R c displaystyle boldsymbol omega omega 1 ldots omega c in mathbb R c nbsp SupportS x Z 0 c i 1 c x i n displaystyle mathrm S left mathbf x in mathbb Z 0 c sum i 1 c x i n right nbsp PMF i 1 c m i x i 0 1 i 1 c 1 t w i D x i d t displaystyle left prod i 1 c binom m i x i right int 0 1 prod i 1 c 1 t omega i D x i operatorname d t nbsp where D w m x i 1 c w i m i x i displaystyle D boldsymbol omega cdot mathbf m mathbf x sum i 1 c omega i m i x i nbsp MeanApproximated by solution m 1 m c displaystyle mu 1 ldots mu c nbsp to 1 m 1 m 1 1 w 1 1 m 2 m 2 1 w 2 1 m c m c 1 w c displaystyle left 1 frac mu 1 m 1 right 1 omega 1 left 1 frac mu 2 m 2 right 1 omega 2 ldots left 1 frac mu c m c right 1 omega c nbsp i 1 c m i n i 0 c 0 m i m i displaystyle wedge sum i 1 c mu i n wedge forall i in 0 c 0 leq mu i leq m i nbsp VarianceApproximated by variance of Fisher s noncentral hypergeometric distribution with same mean The probability mass function can be calculated by various Taylor expansion methods or by numerical integration Fog 2008 The probability that all balls have the same color j can be calculated as mwnchypg 0 0 x j 0 n m w m j n 1 w j i 1 c m i w i n displaystyle operatorname mwnchypg 0 ldots 0 x j 0 ldots n mathbf m boldsymbol omega frac m j underline n left frac 1 omega j sum i 1 c m i omega i right underline n nbsp for xj n mj where the underlined superscript denotes the falling factorial A reasonably good approximation to the mean can be calculated using the equation given above The equation can be solved by defining 8 so that m i m i 1 e w i 8 displaystyle mu i m i 1 e omega i theta nbsp and solving i 1 c m i n displaystyle sum i 1 c mu i n nbsp for 8 by Newton Raphson iteration The equation for the mean is also useful for estimating the odds from experimentally obtained values for the mean No good way of calculating the variance is known The best known method is to approximate the multivariate Wallenius distribution by a multivariate Fisher s noncentral hypergeometric distribution with the same mean and insert the mean as calculated above in the approximate formula for the variance of the latter distribution Properties of the multivariate distribution edit The order of the colors is arbitrary so that any colors can be swapped The weights can be arbitrarily scaled mwnchypg x n m w mwnchypg x n m r w displaystyle operatorname mwnchypg mathbf x n mathbf m boldsymbol omega operatorname mwnchypg mathbf x n mathbf m r boldsymbol omega nbsp for all r R displaystyle r in mathbb R nbsp Colors with zero number mi 0 or zero weight wi 0 can be omitted from the equations Colors with the same weight can be joined mwnchypg x n m w 1 w c 1 w c 1 displaystyle operatorname mwnchypg left mathbf x n mathbf m omega 1 ldots omega c 1 omega c 1 right nbsp mwnchypg x 1 x c 1 x c n m 1 m c 1 m c w 1 w c 1 displaystyle operatorname mwnchypg left x 1 ldots x c 1 x c n m 1 ldots m c 1 m c omega 1 ldots omega c 1 right cdot nbsp hypg x c x c 1 x c m c m c 1 m c displaystyle operatorname hypg x c x c 1 x c m c m c 1 m c nbsp dd where hypg x n m N displaystyle operatorname hypg x n m N nbsp is the univariate central hypergeometric distribution probability Complementary Wallenius noncentral hypergeometric distribution edit nbsp Probability mass function for the Complementary Wallenius Noncentral Hypergeometric Distribution for different values of the odds ratio w m1 80 m2 60 n 40 w 0 05 10The balls that are not taken in the urn experiment have a distribution that is different from Wallenius noncentral hypergeometric distribution due to a lack of symmetry The distribution of the balls not taken can be called the complementary Wallenius noncentral hypergeometric distribution Probabilities in the complementary distribution are calculated from Wallenius distribution by replacing n with N n xi with mi xi and wi with 1 wi Software available editWalleniusHypergeometricDistribution in Mathematica An implementation for the R programming language is available as the package named BiasedUrn Includes univariate and multivariate probability mass functions distribution functions quantiles random variable generating functions mean and variance Implementation in C is available from www agner org See also editNoncentral hypergeometric distributions Fisher s noncentral hypergeometric distribution Biased sample Bias Population genetics Fisher s exact testReferences editChesson J 1976 A non central multivariate hypergeometric distribution arising from biased sampling with application to selective predation Journal of Applied Probability Vol 13 no 4 Applied Probability Trust pp 795 797 doi 10 2307 3212535 JSTOR 3212535 Fog A 2007 Random number theory Fog A 2008 Calculation Methods for Wallenius Noncentral Hypergeometric Distribution Communications in Statictics Simulation and Computation 37 2 258 273 doi 10 1080 03610910701790269 S2CID 9040568 Johnson N L Kemp A W Kotz S 2005 Univariate Discrete Distributions Hoboken New Jersey Wiley and Sons Lyons N I 1980 Closed Expressions for Noncentral Hypergeometric Probabilities Communications in Statistics Simulation and Computation Vol 9 no 3 pp 313 314 doi 10 1080 03610918008812156 Manly B F J 1974 A Model for Certain Types of Selection Experiments Biometrics Vol 30 no 2 International Biometric Society pp 281 294 doi 10 2307 2529649 JSTOR 2529649 Wallenius K T 1963 Biased Sampling The Non central Hypergeometric Probability Distribution Ph D Thesis Thesis Stanford University Department of Statistics Retrieved from https en wikipedia org w index php title Wallenius 27 noncentral hypergeometric distribution amp oldid 1199290146, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.