fbpx
Wikipedia

Geometric distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

  • The probability distribution of the number of Bernoulli trials needed to get one success, supported on the set ;
  • The probability distribution of the number of failures before the first success, supported on the set .
Geometric
Probability mass function
Cumulative distribution function
Parameters success probability (real) success probability (real)
Support k trials where k failures where
PMF
CDF for ,
for
for ,
for
Mean
Median


(not unique if is an integer)


(not unique if is an integer)
Mode
Variance
Skewness
Excess kurtosis
Entropy
MGF
for

for
CF
PGF

Which of these is called the geometric distribution is a matter of convention and convenience.

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of ); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires independent trials, each with success probability . If the probability of success on each trial is , then the probability that the -th trial is the first success is

for

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

for

In either case, the sequence of probabilities is a geometric sequence.

For example, suppose an ordinary die is thrown repeatedly until the first time a "1" appears. The probability distribution of the number of times it is thrown is supported on the infinite set and is a geometric distribution with .

The geometric distribution is denoted by Geo(p) where .[1]

Definitions edit

Consider a sequence of trials, where each trial has only two possible outcomes (designated failure and success). The probability of success is assumed to be the same for each trial. In such a sequence of trials, the geometric distribution is useful to model the number of failures before the first success since the experiment can have an indefinite number of trials until success, unlike the binomial distribution which has a set number of trials. The distribution gives the probability that there are zero failures before the first success, one failure before the first success, two failures before the first success, and so on.[2]

Assumptions: When is the geometric distribution an appropriate model? edit

The geometric distribution is an appropriate model if the following assumptions are true.[3]

  • The phenomenon being modelled is a sequence of independent trials.
  • There are only two possible outcomes for each trial, often designated success or failure.
  • The probability of success, p, is the same for every trial.

If these conditions are true, then the geometric random variable Y is the count of the number of failures before the first success. The possible number of failures before the first success is 0, 1, 2, 3, and so on. In the graphs above, this formulation is shown on the right.

An alternative formulation is that the geometric random variable X is the total number of trials up to and including the first success, and the number of failures is X − 1. In the graphs above, this formulation is shown on the left.

Probability outcomes examples edit

The general formula to calculate the probability of k failures before the first success, where the probability of success is p and the probability of failure is q = 1 − p, is

 

for k = 0, 1, 2, 3, ...

E1) A doctor is seeking an antidepressant for a newly diagnosed patient. Suppose that, of the available anti-depressant drugs, the probability that any particular drug will be effective for a particular patient is p = 0.6. What is the probability that the first drug found to be effective for this patient is the first drug tried, the second drug tried, and so on? What is the expected number of drugs that will be tried to find one that is effective?

The probability that the first drug works. There are zero failures before the first success. Y = 0 failures. The probability Pr(zero failures before first success) is simply the probability that the first drug works.

 

The probability that the first drug fails, but the second drug works. There is one failure before the first success. Y = 1 failure. The probability for this sequence of events is Pr(first drug fails)   p(second drug succeeds), which is given by

 

The probability that the first drug fails, the second drug fails, but the third drug works. There are two failures before the first success. Y = 2 failures. The probability for this sequence of events is Pr(first drug fails)   p(second drug fails)   Pr(third drug is success)

 

E2) A newlywed couple plans to have children and will continue until the first girl. What is the probability that there are zero boys before the first girl, one boy before the first girl, two boys before the first girl, and so on?

The probability of having a girl (success) is p= 0.5 and the probability of having a boy (failure) is q = 1 − p = 0.5.

The probability of no boys before the first girl is

 

The probability of one boy before the first girl is

 

The probability of two boys before the first girl is

 

and so on.

Properties edit

Moments and cumulants edit

The expected value for the number of independent trials to get the first success, and the variance of a geometrically distributed random variable X is:

 

Similarly, the expected value and variance of the geometrically distributed random variable Y = X - 1 (See definition of distribution  ) is:

 

Proof edit

Expected value of X edit

Consider the expected value   of X as above, i.e. the average number of trials until a success. On the first trial we either succeed with probability  , or we fail with probability  . If we fail the remaining mean number of trials until a success is identical to the original mean. This follows from the fact that all trials are independent. From this we get the formula:

 

which if solved for   gives:

 

Expected value of Y edit

That the expected value of Y as above is (1 − p)/p can be trivially seen from   which follows from the linearity of expectation, or can be shown in the following way:

 

The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.

Let μ = (1 − p)/p be the expected value of Y. Then the cumulants   of the probability distribution of Y satisfy the recursion

 

Expected value examples edit

E3) A patient is waiting for a suitable matching kidney donor for a transplant. If the probability that a randomly selected donor is a suitable match is p = 0.1, what is the expected number of donors who will be tested before a matching donor is found?

With p = 0.1, the mean number of failures before the first success is E(Y) = (1 − p)/p =(1 − 0.1)/0.1 = 9.

For the alternative formulation, where X is the number of trials up to and including the first success, the expected value is E(X) = 1/p = 1/0.1 = 10.

For example 1 above, with p = 0.6, the mean number of failures before the first success is E(Y) = (1 − p)/p = (1 − 0.6)/0.6 = 0.67.

Higher-order moments edit

The moments for the number of failures before the first success are given by

 

where   is the polylogarithm function.

General properties edit

 
 
The geometric distribution supported on {0, 1, 2, 3, ... } is the only memoryless discrete distribution. Note that the geometric distribution supported on {1, 2, ... } is not memoryless.
  • Among all discrete probability distributions supported on {1, 2, 3, ... } with given expected value μ, the geometric distribution X with parameter p = 1/μ is the one with the largest entropy.[4]
  • The geometric distribution of the number Y of failures before the first success is infinitely divisible, i.e., for any positive integer n, there exist independent identically distributed random variables Y1, ..., Yn whose sum has the same distribution that Y has. These will not be geometrically distributed unless n = 1; they follow a negative binomial distribution.
  • The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables.[citation needed] For example, the hundreds digit D has this probability distribution:
 
where q = 1 − p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.

Related distributions edit

  • The geometric distribution Y is a special case of the negative binomial distribution, with r = 1. More generally, if Y1, ..., Yr are independent geometrically distributed variables with parameter p, then the sum
 
follows a negative binomial distribution with parameters r and p.[6]
  • The geometric distribution is a special case of discrete compound Poisson distribution.
  • If Y1, ..., Yr are independent geometrically distributed variables (with possibly different success parameters pm), then their minimum
 
is also geometrically distributed, with parameter   [7]
  • Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable Xk has a Poisson distribution with expected value rk/k. Then
 
has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value r/(1 − r).[citation needed]
  • The exponential distribution is the continuous analogue of the geometric distribution. If X is an exponentially distributed random variable with parameter λ, then
 
where   is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter p = 1 − eλ (thus λ = −ln(1 − p)[8]) and taking values in the set {0, 1, 2, ...}. This can be used to generate geometrically distributed pseudorandom numbers by first generating exponentially distributed pseudorandom numbers from a uniform pseudorandom number generator: then   is geometrically distributed with parameter  , if   is uniformly distributed in [0,1].
  • If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, since
 

More generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ:

 

therefore the distribution function of X/n converges to  , which is that of an exponential random variable.

Statistical inference edit

Parameter estimation edit

For both variants of the geometric distribution, the parameter p can be estimated by equating the expected value with the sample mean. This is the method of moments, which in this case happens to yield maximum likelihood estimates of p.[9][10]

Specifically, for the first variant let k = k1, ..., kn be a sample where ki ≥ 1 for i = 1, ..., n. Then p can be estimated as

 

In Bayesian inference, the Beta distribution is the conjugate prior distribution for the parameter p. If this parameter is given a Beta(αβ) prior, then the posterior distribution is

 

The posterior mean E[p] approaches the maximum likelihood estimate   as α and β approach zero.

In the alternative case, let k1, ..., kn be a sample where ki ≥ 0 for i = 1, ..., n. Then p can be estimated as

 

The posterior distribution of p given a Beta(αβ) prior is[11]

 

Again the posterior mean E[p] approaches the maximum likelihood estimate   as α and β approach zero.

For either estimate of   using Maximum Likelihood, the bias is equal to

 

which yields the bias-corrected maximum likelihood estimator

 

Computational methods edit

Geometric distribution using R edit

The R function dgeom(k, prob) calculates the probability that there are k failures before the first success, where the argument "prob" is the probability of success on each trial.

For example,

dgeom(0,0.6) = 0.6

dgeom(1,0.6) = 0.24

R uses the convention that k is the number of failures, so that the number of trials up to and including the first success is k + 1.

The following R code creates a graph of the geometric distribution from Y = 0 to 10, with p = 0.6.

Y=0:10 plot(Y, dgeom(Y,0.6), type="h", ylim=c(0,1), main="Geometric distribution for p=0.6", ylab="Pr(Y=Y)", xlab="Y=Number of failures before first success") 

Geometric distribution using Excel edit

The geometric distribution, for the number of failures before the first success, is a special case of the negative binomial distribution, for the number of failures before s successes.

The Excel function NEGBINOMDIST(number_f, number_s, probability_s) calculates the probability of k = number_f failures before s = number_s successes where p = probability_s is the probability of success on each trial. For the geometric distribution, let number_s = 1 success.[12]

For example,

=NEGBINOMDIST(0, 1, 0.6) = 0.6
=NEGBINOMDIST(1, 1, 0.6) = 0.24

Like R, Excel uses the convention that k is the number of failures, so that the number of trials up to and including the first success is k + 1.

See also edit

References edit

  1. ^ a b A modern introduction to probability and statistics : understanding why and how. Dekking, Michel, 1946-. London: Springer. 2005. pp. 48–50, 61–62, 152. ISBN 9781852338961. OCLC 262680588.{{cite book}}: CS1 maint: others (link)
  2. ^ Holmes, Alexander; Illowsky, Barbara; Dean, Susan (29 November 2017). Introductory Business Statistics. Houston, Texas: OpenStax.
  3. ^ Raikar, Sanat Pai (31 August 2023). "Geometric distribution". Encyclopedia Britannica.
  4. ^ Park, Sung Y.; Bera, Anil K. (June 2009). "Maximum entropy autoregressive conditional heteroskedasticity model". Journal of Econometrics. 150 (2): 219–230. doi:10.1016/j.jeconom.2008.12.014.
  5. ^ Gallager, R.; van Voorhis, D. (March 1975). "Optimal source codes for geometrically distributed integer alphabets (Corresp.)". IEEE Transactions on Information Theory. 21 (2): 228–230. doi:10.1109/TIT.1975.1055357. ISSN 0018-9448.
  6. ^ Pitman, Jim. Probability (1993 edition). Springer Publishers. pp 372.
  7. ^ Ciardo, Gianfranco; Leemis, Lawrence M.; Nicol, David (1 June 1995). "On the minimum of independent geometrically distributed random variables". Statistics & Probability Letters. 23 (4): 313–326. doi:10.1016/0167-7152(94)00130-Z. hdl:2060/19940028569. S2CID 1505801.
  8. ^ "Wolfram-Alpha: Computational Knowledge Engine". www.wolframalpha.com.
  9. ^ casella, george; berger, roger l (2002). statistical inference (2nd ed.). pp. 312–315. ISBN 0-534-24312-6.
  10. ^ "MLE Examples: Exponential and Geometric Distributions Old Kiwi - Rhea". www.projectrhea.org. Retrieved 2019-11-17.
  11. ^ "3. Conjugate families of distributions" (PDF). (PDF) from the original on 2010-04-08.
  12. ^ "3.5 Geometric Probability Distribution using Excel Spreadsheet". Statistics LibreTexts. 2021-07-24. Retrieved 2023-10-20.

External links edit

geometric, distribution, confused, with, hypergeometric, distribution, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, august,. Not to be confused with Hypergeometric distribution This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations August 2022 Learn how and when to remove this message In probability theory and statistics the geometric distribution is either one of two discrete probability distributions The probability distribution of the number X displaystyle X of Bernoulli trials needed to get one success supported on the set 1 2 3 displaystyle 1 2 3 ldots The probability distribution of the number Y X 1 displaystyle Y X 1 of failures before the first success supported on the set 0 1 2 displaystyle 0 1 2 ldots GeometricProbability mass functionCumulative distribution functionParameters0 lt p 1 displaystyle 0 lt p leq 1 success probability real 0 lt p 1 displaystyle 0 lt p leq 1 success probability real Supportk trials where k 1 2 3 displaystyle k in 1 2 3 dots k failures where k 0 1 2 3 displaystyle k in 0 1 2 3 dots PMF 1 p k 1 p displaystyle 1 p k 1 p 1 p k p displaystyle 1 p k p CDF1 1 p x displaystyle 1 1 p lfloor x rfloor for x 1 displaystyle x geq 1 0 displaystyle 0 for x lt 1 displaystyle x lt 1 1 1 p x 1 displaystyle 1 1 p lfloor x rfloor 1 for x 0 displaystyle x geq 0 0 displaystyle 0 for x lt 0 displaystyle x lt 0 Mean1 p displaystyle frac 1 p 1 p p displaystyle frac 1 p p Median 1 log 2 1 p displaystyle left lceil frac 1 log 2 1 p right rceil not unique if 1 log 2 1 p displaystyle 1 log 2 1 p is an integer 1 log 2 1 p 1 displaystyle left lceil frac 1 log 2 1 p right rceil 1 not unique if 1 log 2 1 p displaystyle 1 log 2 1 p is an integer Mode1 displaystyle 1 0 displaystyle 0 Variance1 p p 2 displaystyle frac 1 p p 2 1 p p 2 displaystyle frac 1 p p 2 Skewness2 p 1 p displaystyle frac 2 p sqrt 1 p 2 p 1 p displaystyle frac 2 p sqrt 1 p Excess kurtosis6 p 2 1 p displaystyle 6 frac p 2 1 p 6 p 2 1 p displaystyle 6 frac p 2 1 p Entropy 1 p log 1 p p log p p displaystyle tfrac 1 p log 1 p p log p p 1 p log 1 p p log p p displaystyle tfrac 1 p log 1 p p log p p MGFp e t 1 1 p e t displaystyle frac pe t 1 1 p e t for t lt ln 1 p displaystyle t lt ln 1 p p 1 1 p e t displaystyle frac p 1 1 p e t for t lt ln 1 p displaystyle t lt ln 1 p CFp e i t 1 1 p e i t displaystyle frac pe it 1 1 p e it p 1 1 p e i t displaystyle frac p 1 1 p e it PGFp z 1 1 p z displaystyle frac pz 1 1 p z p 1 1 p z displaystyle frac p 1 1 p z Which of these is called the geometric distribution is a matter of convention and convenience These two different geometric distributions should not be confused with each other Often the name shifted geometric distribution is adopted for the former one distribution of X displaystyle X however to avoid ambiguity it is considered wise to indicate which is intended by mentioning the support explicitly The geometric distribution gives the probability that the first occurrence of success requires k displaystyle k independent trials each with success probability p displaystyle p If the probability of success on each trial is p displaystyle p then the probability that the k displaystyle k th trial is the first success is Pr X k 1 p k 1 p displaystyle Pr X k 1 p k 1 p for k 1 2 3 4 displaystyle k 1 2 3 4 dots The above form of the geometric distribution is used for modeling the number of trials up to and including the first success By contrast the following form of the geometric distribution is used for modeling the number of failures until the first success Pr Y k Pr X k 1 1 p k p displaystyle Pr Y k Pr X k 1 1 p k p for k 0 1 2 3 displaystyle k 0 1 2 3 dots In either case the sequence of probabilities is a geometric sequence For example suppose an ordinary die is thrown repeatedly until the first time a 1 appears The probability distribution of the number of times it is thrown is supported on the infinite set 1 2 3 displaystyle 1 2 3 dots and is a geometric distribution with p 1 6 displaystyle p 1 6 The geometric distribution is denoted by Geo p where 0 lt p 1 displaystyle 0 lt p leq 1 1 Contents 1 Definitions 1 1 Assumptions When is the geometric distribution an appropriate model 1 2 Probability outcomes examples 2 Properties 2 1 Moments and cumulants 2 2 Proof 2 2 1 Expected value of X 2 2 2 Expected value of Y 2 2 3 Expected value examples 2 2 4 Higher order moments 2 3 General properties 3 Related distributions 4 Statistical inference 4 1 Parameter estimation 5 Computational methods 5 1 Geometric distribution using R 5 2 Geometric distribution using Excel 6 See also 7 References 8 External linksDefinitions editConsider a sequence of trials where each trial has only two possible outcomes designated failure and success The probability of success is assumed to be the same for each trial In such a sequence of trials the geometric distribution is useful to model the number of failures before the first success since the experiment can have an indefinite number of trials until success unlike the binomial distribution which has a set number of trials The distribution gives the probability that there are zero failures before the first success one failure before the first success two failures before the first success and so on 2 Assumptions When is the geometric distribution an appropriate model edit The geometric distribution is an appropriate model if the following assumptions are true 3 The phenomenon being modelled is a sequence of independent trials There are only two possible outcomes for each trial often designated success or failure The probability of success p is the same for every trial If these conditions are true then the geometric random variable Y is the count of the number of failures before the first success The possible number of failures before the first success is 0 1 2 3 and so on In the graphs above this formulation is shown on the right An alternative formulation is that the geometric random variable X is the total number of trials up to and including the first success and the number of failures is X 1 In the graphs above this formulation is shown on the left Probability outcomes examples edit The general formula to calculate the probability of k failures before the first success where the probability of success is p and the probability of failure is q 1 p is Pr Y k q k p displaystyle Pr Y k q k p nbsp for k 0 1 2 3 E1 A doctor is seeking an antidepressant for a newly diagnosed patient Suppose that of the available anti depressant drugs the probability that any particular drug will be effective for a particular patient is p 0 6 What is the probability that the first drug found to be effective for this patient is the first drug tried the second drug tried and so on What is the expected number of drugs that will be tried to find one that is effective The probability that the first drug works There are zero failures before the first success Y 0 failures The probability Pr zero failures before first success is simply the probability that the first drug works Pr Y 0 q 0 p 0 4 0 0 6 1 0 6 0 6 displaystyle Pr Y 0 q 0 p 0 4 0 times 0 6 1 times 0 6 0 6 nbsp The probability that the first drug fails but the second drug works There is one failure before the first success Y 1 failure The probability for this sequence of events is Pr first drug fails displaystyle times nbsp p second drug succeeds which is given by Pr Y 1 q 1 p 0 4 1 0 6 0 4 0 6 0 24 displaystyle Pr Y 1 q 1 p 0 4 1 times 0 6 0 4 times 0 6 0 24 nbsp The probability that the first drug fails the second drug fails but the third drug works There are two failures before the first success Y 2 failures The probability for this sequence of events is Pr first drug fails displaystyle times nbsp p second drug fails displaystyle times nbsp Pr third drug is success Pr Y 2 q 2 p 0 4 2 0 6 0 096 displaystyle Pr Y 2 q 2 p 0 4 2 times 0 6 0 096 nbsp E2 A newlywed couple plans to have children and will continue until the first girl What is the probability that there are zero boys before the first girl one boy before the first girl two boys before the first girl and so on The probability of having a girl success is p 0 5 and the probability of having a boy failure is q 1 p 0 5 The probability of no boys before the first girl is Pr Y 0 q 0 p 0 5 0 0 5 1 0 5 0 5 displaystyle Pr Y 0 q 0 p 0 5 0 times 0 5 1 times 0 5 0 5 nbsp The probability of one boy before the first girl is Pr Y 1 q 1 p 0 5 1 0 5 0 5 0 5 0 25 displaystyle Pr Y 1 q 1 p 0 5 1 times 0 5 0 5 times 0 5 0 25 nbsp The probability of two boys before the first girl is Pr Y 2 q 2 p 0 5 2 0 5 0 125 displaystyle Pr Y 2 q 2 p 0 5 2 times 0 5 0 125 nbsp and so on Properties editMoments and cumulants edit The expected value for the number of independent trials to get the first success and the variance of a geometrically distributed random variable X is E X 1 p var X 1 p p 2 displaystyle operatorname E X frac 1 p qquad operatorname var X frac 1 p p 2 nbsp Similarly the expected value and variance of the geometrically distributed random variable Y X 1 See definition of distribution Pr Y k displaystyle Pr Y k nbsp is E Y E X 1 E X 1 1 p p var Y 1 p p 2 displaystyle operatorname E Y operatorname E X 1 operatorname E X 1 frac 1 p p qquad operatorname var Y frac 1 p p 2 nbsp Proof edit Expected value of X edit Consider the expected value E X displaystyle mathrm E X nbsp of X as above i e the average number of trials until a success On the first trial we either succeed with probability p displaystyle p nbsp or we fail with probability 1 p displaystyle 1 p nbsp If we fail the remaining mean number of trials until a success is identical to the original mean This follows from the fact that all trials are independent From this we get the formula E X p 1 1 p 1 E X displaystyle mathrm E X p cdot 1 1 p cdot 1 mathrm E X nbsp which if solved for E X displaystyle mathrm E X nbsp gives E X 1 p displaystyle mathrm E X frac 1 p nbsp Expected value of Y edit That the expected value of Y as above is 1 p p can be trivially seen from E Y E X 1 E X 1 1 p 1 1 p p displaystyle mathrm E Y mathrm E X 1 mathrm E X 1 frac 1 p 1 frac 1 p p nbsp which follows from the linearity of expectation or can be shown in the following way E Y k 0 1 p k p k p k 0 1 p k k p 1 p k 0 1 p k 1 k p 1 p d d p k 0 1 p k p 1 p d d p 1 p 1 p p displaystyle begin aligned mathrm E Y amp sum k 0 infty 1 p k p cdot k amp p sum k 0 infty 1 p k k amp p 1 p sum k 0 infty 1 p k 1 cdot k amp p 1 p left frac d dp left sum k 0 infty 1 p k right right amp p 1 p frac d dp left frac 1 p right frac 1 p p end aligned nbsp The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge Let m 1 p p be the expected value of Y Then the cumulants k n displaystyle kappa n nbsp of the probability distribution of Y satisfy the recursion k n 1 m m 1 d k n d m displaystyle kappa n 1 mu mu 1 frac d kappa n d mu nbsp Expected value examples edit E3 A patient is waiting for a suitable matching kidney donor for a transplant If the probability that a randomly selected donor is a suitable match is p 0 1 what is the expected number of donors who will be tested before a matching donor is found With p 0 1 the mean number of failures before the first success is E Y 1 p p 1 0 1 0 1 9 For the alternative formulation where X is the number of trials up to and including the first success the expected value is E X 1 p 1 0 1 10 For example 1 above with p 0 6 the mean number of failures before the first success is E Y 1 p p 1 0 6 0 6 0 67 Higher order moments edit The moments for the number of failures before the first success are given by E Y n k 0 1 p k p k n p Li n 1 p for n 0 displaystyle begin aligned mathrm E Y n amp sum k 0 infty 1 p k p cdot k n amp p operatorname Li n 1 p amp text for n neq 0 end aligned nbsp where Li n 1 p displaystyle operatorname Li n 1 p nbsp is the polylogarithm function General properties edit The probability generating functions of X and Y are respectively G X s s p 1 s 1 p G Y s p 1 s 1 p s lt 1 p 1 displaystyle begin aligned G X s amp frac s p 1 s 1 p 10pt G Y s amp frac p 1 s 1 p quad s lt 1 p 1 end aligned nbsp dd Like its continuous analogue the exponential distribution the geometric distribution is memoryless That is the following holds for every m and n Pr X gt m n X gt n Pr X gt m displaystyle Pr X gt m n X gt n Pr X gt m nbsp dd The geometric distribution supported on 0 1 2 3 is the only memoryless discrete distribution Note that the geometric distribution supported on 1 2 is not memoryless Among all discrete probability distributions supported on 1 2 3 with given expected value m the geometric distribution X with parameter p 1 m is the one with the largest entropy 4 The geometric distribution of the number Y of failures before the first success is infinitely divisible i e for any positive integer n there exist independent identically distributed random variables Y1 Yn whose sum has the same distribution that Y has These will not be geometrically distributed unless n 1 they follow a negative binomial distribution The decimal digits of the geometrically distributed random variable Y are a sequence of independent and not identically distributed random variables citation needed For example the hundreds digit D has this probability distribution Pr D d q 100 d 1 q 100 q 200 q 900 displaystyle Pr D d q 100d over 1 q 100 q 200 cdots q 900 nbsp dd where q 1 p and similarly for the other digits and more generally similarly for numeral systems with other bases than 10 When the base is 2 this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable Golomb coding is the optimal prefix code clarification needed for the geometric discrete distribution 5 The sum of two independent Geo p distributed random variables is not a geometric distribution 1 Related distributions editThe geometric distribution Y is a special case of the negative binomial distribution with r 1 More generally if Y1 Yr are independent geometrically distributed variables with parameter p then the sum Z m 1 r Y m displaystyle Z sum m 1 r Y m nbsp dd follows a negative binomial distribution with parameters r and p 6 The geometric distribution is a special case of discrete compound Poisson distribution If Y1 Yr are independent geometrically distributed variables with possibly different success parameters pm then their minimum W min m 1 r Y m displaystyle W min m in 1 ldots r Y m nbsp dd is also geometrically distributed with parameter p 1 m 1 p m displaystyle p 1 prod m 1 p m nbsp 7 Suppose 0 lt r lt 1 and for k 1 2 3 the random variable Xk has a Poisson distribution with expected value rk k Then k 1 k X k displaystyle sum k 1 infty k X k nbsp dd has a geometric distribution taking values in the set 0 1 2 with expected value r 1 r citation needed The exponential distribution is the continuous analogue of the geometric distribution If X is an exponentially distributed random variable with parameter l then Y X displaystyle Y lfloor X rfloor nbsp dd where displaystyle lfloor quad rfloor nbsp is the floor or greatest integer function is a geometrically distributed random variable with parameter p 1 e l thus l ln 1 p 8 and taking values in the set 0 1 2 This can be used to generate geometrically distributed pseudorandom numbers by first generating exponentially distributed pseudorandom numbers from a uniform pseudorandom number generator then ln U ln 1 p displaystyle lfloor ln U ln 1 p rfloor nbsp is geometrically distributed with parameter p displaystyle p nbsp if U displaystyle U nbsp is uniformly distributed in 0 1 If p 1 n and X is geometrically distributed with parameter p then the distribution of X n approaches an exponential distribution with expected value 1 as n since Pr X n gt a Pr X gt n a 1 p n a 1 1 n n a 1 1 n n a e 1 a e a as n displaystyle begin aligned Pr X n gt a Pr X gt na amp 1 p na left 1 frac 1 n right na left left 1 frac 1 n right n right a amp to e 1 a e a text as n to infty end aligned nbsp dd More generally if p l n where l is a parameter then as n the distribution of X n approaches an exponential distribution with rate l Pr X gt n x lim n 1 l n n x e l x displaystyle Pr X gt nx lim n to infty 1 lambda n nx e lambda x nbsp therefore the distribution function of X n converges to 1 e l x displaystyle 1 e lambda x nbsp which is that of an exponential random variable Statistical inference editParameter estimation edit For both variants of the geometric distribution the parameter p can be estimated by equating the expected value with the sample mean This is the method of moments which in this case happens to yield maximum likelihood estimates of p 9 10 Specifically for the first variant let k k1 kn be a sample where ki 1 for i 1 n Then p can be estimated as p 1 n i 1 n k i 1 n i 1 n k i displaystyle widehat p left frac 1 n sum i 1 n k i right 1 frac n sum i 1 n k i nbsp In Bayesian inference the Beta distribution is the conjugate prior distribution for the parameter p If this parameter is given a Beta a b prior then the posterior distribution is p B e t a a n b i 1 n k i 1 displaystyle p sim mathrm Beta left alpha n beta sum i 1 n k i 1 right nbsp The posterior mean E p approaches the maximum likelihood estimate p displaystyle widehat p nbsp as a and b approach zero In the alternative case let k1 kn be a sample where ki 0 for i 1 n Then p can be estimated as p 1 1 n i 1 n k i 1 n i 1 n k i n displaystyle widehat p left 1 frac 1 n sum i 1 n k i right 1 frac n sum i 1 n k i n nbsp The posterior distribution of p given a Beta a b prior is 11 p B e t a a n b i 1 n k i displaystyle p sim mathrm Beta left alpha n beta sum i 1 n k i right nbsp Again the posterior mean E p approaches the maximum likelihood estimate p displaystyle widehat p nbsp as a and b approach zero For either estimate of p displaystyle widehat p nbsp using Maximum Likelihood the bias is equal to b E p m l e p p 1 p n displaystyle b equiv operatorname E bigg hat p mathrm mle p bigg frac p 1 p n nbsp which yields the bias corrected maximum likelihood estimator p mle p mle b displaystyle hat p text mle hat p text mle hat b nbsp Computational methods editGeometric distribution using R edit The R function span class nf dgeom span span class p span span class n k span span class p span span class w span span class n prob span span class p span calculates the probability that there are k failures before the first success where the argument prob is the probability of success on each trial For example span class nf dgeom span span class p span span class m 0 span span class p span span class m 0 6 span span class p span span class w span span class o span span class w span span class m 0 6 span span class nf dgeom span span class p span span class m 1 span span class p span span class m 0 6 span span class p span span class w span span class o span span class w span span class m 0 24 span R uses the convention that k is the number of failures so that the number of trials up to and including the first success is k 1 The following R code creates a graph of the geometric distribution from Y 0 to 10 with p 0 6 Y 0 10 plot Y dgeom Y 0 6 type h ylim c 0 1 main Geometric distribution for p 0 6 ylab Pr Y Y xlab Y Number of failures before first success Geometric distribution using Excel edit The geometric distribution for the number of failures before the first success is a special case of the negative binomial distribution for the number of failures before s successes The Excel function NEGBINOMDIST number f number s probability s calculates the probability of k number f failures before s number s successes where p probability s is the probability of success on each trial For the geometric distribution let number s 1 success 12 For example NEGBINOMDIST 0 1 0 6 0 6 NEGBINOMDIST 1 1 0 6 0 24 Like R Excel uses the convention that k is the number of failures so that the number of trials up to and including the first success is k 1 See also editHypergeometric distribution Coupon collector s problem Compound Poisson distribution Negative binomial distributionReferences edit a b A modern introduction to probability and statistics understanding why and how Dekking Michel 1946 London Springer 2005 pp 48 50 61 62 152 ISBN 9781852338961 OCLC 262680588 a href Template Cite book html title Template Cite book cite book a CS1 maint others link Holmes Alexander Illowsky Barbara Dean Susan 29 November 2017 Introductory Business Statistics Houston Texas OpenStax Raikar Sanat Pai 31 August 2023 Geometric distribution Encyclopedia Britannica Park Sung Y Bera Anil K June 2009 Maximum entropy autoregressive conditional heteroskedasticity model Journal of Econometrics 150 2 219 230 doi 10 1016 j jeconom 2008 12 014 Gallager R van Voorhis D March 1975 Optimal source codes for geometrically distributed integer alphabets Corresp IEEE Transactions on Information Theory 21 2 228 230 doi 10 1109 TIT 1975 1055357 ISSN 0018 9448 Pitman Jim Probability 1993 edition Springer Publishers pp 372 Ciardo Gianfranco Leemis Lawrence M Nicol David 1 June 1995 On the minimum of independent geometrically distributed random variables Statistics amp Probability Letters 23 4 313 326 doi 10 1016 0167 7152 94 00130 Z hdl 2060 19940028569 S2CID 1505801 Wolfram Alpha Computational Knowledge Engine www wolframalpha com casella george berger roger l 2002 statistical inference 2nd ed pp 312 315 ISBN 0 534 24312 6 MLE Examples Exponential and Geometric Distributions Old Kiwi Rhea www projectrhea org Retrieved 2019 11 17 3 Conjugate families of distributions PDF Archived PDF from the original on 2010 04 08 3 5 Geometric Probability Distribution using Excel Spreadsheet Statistics LibreTexts 2021 07 24 Retrieved 2023 10 20 External links editGeometric distribution on MathWorld Retrieved from https en wikipedia org w index php title Geometric distribution amp oldid 1217748983, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.