fbpx
Wikipedia

Beta distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

Beta
Probability density function
Cumulative distribution function
Notation Beta(α, β)
Parameters α > 0 shape (real)
β > 0 shape (real)
Support or
PDF
where and is the Gamma function.
CDF

(the regularized incomplete beta function)
Mean





(see section: Geometric mean)

where is the digamma function
Median
Mode

for α, β > 1

any value in for α, β = 1

{0, 1} (bimodal) for α, β < 1

0 for α ≤ 1, β > 1

1 for α > 1, β ≤ 1
Variance

(see trigamma function and see section: Geometric variance)
Skewness
Ex. kurtosis
Entropy
MGF
CF (see Confluent hypergeometric function)
Fisher information
see section: Fisher information matrix
Method of Moments

The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. The beta distribution is a suitable model for the random behavior of percentages and proportions.

In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial and geometric distributions.

The formulation of the beta distribution discussed here is also known as the beta distribution of the first kind, whereas beta distribution of the second kind is an alternative name for the beta prime distribution. The generalization to multiple variables is called a Dirichlet distribution.

Definitions

Probability density function

 
An animation of the Beta distribution for different values of its parameters.

The probability density function (PDF) of the beta distribution, for 0 ≤ x ≤ 1, and shape parameters α, β > 0, is a power function of the variable x and of its reflection (1 − x) as follows:

 

where Γ(z) is the gamma function. The beta function,  , is a normalization constant to ensure that the total probability is 1. In the above equations x is a realization—an observed value that actually occurred—of a random variable X.

This definition includes both ends x = 0 and x = 1, which is consistent with definitions for other continuous distributions supported on a bounded interval which are special cases of the beta distribution, for example the arcsine distribution, and consistent with several authors, like N. L. Johnson and S. Kotz.[1][2][3][4] However, the inclusion of x = 0 and x = 1 does not work for α, β < 1; accordingly, several other authors, including W. Feller,[5][6][7] choose to exclude the ends x = 0 and x = 1, (so that the two ends are not actually part of the domain of the density function) and consider instead 0 < x < 1.

Several authors, including N. L. Johnson and S. Kotz,[1] use the symbols p and q (instead of α and β) for the shape parameters of the beta distribution, reminiscent of the symbols traditionally used for the parameters of the Bernoulli distribution, because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters α and β approach the value of zero.

In the following, a random variable X beta-distributed with parameters α and β will be denoted by:[8][9]

 

Other notations for beta-distributed random variables used in the statistical literature are  [10] and  .[5]

Cumulative distribution function

 
CDF for symmetric beta distribution vs. x and α = β
 
CDF for skewed beta distribution vs. x and β = 5α

The cumulative distribution function is

 

where   is the incomplete beta function and   is the regularized incomplete beta function.

Alternative parameterizations

Two parameters

Mean and sample size

The beta distribution may also be reparameterized in terms of its mean μ (0 < μ < 1) and the sum of the two shape parameters ν = α + β > 0([9] p. 83). Denoting by αPosterior and βPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes theorem to a binomial likelihood function and a prior probability, the interpretation of the addition of both shape parameters to be sample size = ν = α·Posterior + β·Posterior is only correct for the Haldane prior probability Beta(0,0). Specifically, for the Bayes (uniform) prior Beta(1,1) the correct interpretation would be sample size = α·Posterior + β Posterior − 2, or ν = (sample size) + 2. For sample size much larger than 2, the difference between these two priors becomes negligible. (See section Bayesian inference for further details.) ν = α + β is referred to as the "sample size" of a Beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes theorem.

This parametrization may be useful in Bayesian parameter estimation. For example, one may administer a test to a number of individuals. If it is assumed that each person's score (0 ≤ θ ≤ 1) is drawn from a population-level Beta distribution, then an important statistic is the mean of this population-level distribution. The mean and sample size parameters are related to the shape parameters α and β via[9]

α = μν, β = (1 − μ)ν

Under this parametrization, one may place an uninformative prior probability over the mean, and a vague prior probability (such as an exponential or gamma distribution) over the positive reals for the sample size, if they are independent, and prior data and/or beliefs justify it.

Mode and concentration

Concave beta distributions, which have  , can be parametrized in terms of mode and "concentration". The mode,  , and concentration,  , can be used to define the usual shape parameters as follows:[11]

 

For the mode,  , to be well-defined, we need  , or equivalently  . If instead we define the concentration as  , the condition simplifies to   and the beta density at   and   can be written as:

 

where   directly scales the sufficient statistics,   and  . Note also that in the limit,  , the distribution becomes flat.

Mean and variance

Solving the system of (coupled) equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters α and β, one can express the α and β parameters in terms of the mean (μ) and the variance (var):

 

This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters α and β. For example, by expressing the mode, skewness, excess kurtosis and differential entropy in terms of the mean and the variance:

           

Four parameters

A beta distribution with the two shape parameters α and β is supported on the range [0,1] or (0,1). It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum, a, and maximum c (c > a), values of the distribution,[1] by a linear transformation substituting the non-dimensional variable x in terms of the new variable y (with support [a,c] or (a,c)) and the parameters a and c:

 

The probability density function of the four parameter beta distribution is equal to the two parameter distribution, scaled by the range (c-a), (so that the total area under the density curve equals a probability of one), and with the "y" variable shifted and scaled as follows:

 

That a random variable Y is Beta-distributed with four parameters α, β, a, and c will be denoted by:

 

Some measures of central location are scaled (by (c-a)) and shifted (by a), as follows:

 

Note: the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean, median and mode can.

The shape parameters of Y can be written in term of its mean and variance as

 

The statistical dispersion measures are scaled (they do not need to be shifted because they are already centered on the mean) by the range (c-a), linearly for the mean deviation and nonlinearly for the variance:

 
 
 

Since the skewness and excess kurtosis are non-dimensional quantities (as moments centered on the mean and normalized by the standard deviation), they are independent of the parameters a and c, and therefore equal to the expressions given above in terms of X (with support [0,1] or (0,1)):

 
 

Properties

Measures of central tendency

Mode

The mode of a Beta distributed random variable X with α, β > 1 is the most likely value of the distribution (corresponding to the peak in the PDF), and is given by the following expression:[1]

 

When both parameters are less than one (α, β < 1), this is the anti-mode: the lowest point of the probability density curve.[3]

Letting α = β, the expression for the mode simplifies to 1/2, showing that for α = β > 1 the mode (resp. anti-mode when α, β < 1), is at the center of the distribution: it is symmetric in those cases. See Shapes section in this article for a full list of mode cases, for arbitrary values of α and β. For several of these cases, the maximum value of the density function occurs at one or both ends. In some cases the (maximum) value of the density function occurring at the end is finite. For example, in the case of α = 2, β = 1 (or α = 1, β = 2), the density function becomes a right-triangle distribution which is finite at both ends. In several other cases there is a singularity at one end, where the value of the density function approaches infinity. For example, in the case α = β = 1/2, the Beta distribution simplifies to become the arcsine distribution. There is debate among mathematicians about some of these cases and whether the ends (x = 0, and x = 1) can be called modes or not.[6][8]

 
Mode for Beta distribution for 1 ≤ α ≤ 5 and 1 ≤ β ≤ 5
  • Whether the ends are part of the domain of the density function
  • Whether a singularity can ever be called a mode
  • Whether cases with two maxima should be called bimodal

Median

 
Median for Beta distribution for 0 ≤ α ≤ 5 and 0 ≤ β ≤ 5
 
(Mean–Median) for Beta distribution versus alpha and beta from 0 to 2

The median of the beta distribution is the unique real number   for which the regularized incomplete beta function  . There is no general closed-form expression for the median of the beta distribution for arbitrary values of α and β. Closed-form expressions for particular values of the parameters α and β follow:[citation needed]

  • For symmetric cases α = β, median = 1/2.
  • For α = 1 and β > 0, median   (this case is the mirror-image of the power function [0,1] distribution)
  • For α > 0 and β = 1, median =   (this case is the power function [0,1] distribution[6])
  • For α = 3 and β = 2, median = 0.6142724318676105..., the real solution to the quartic equation 1 − 8x3 + 6x4 = 0, which lies in [0,1].
  • For α = 2 and β = 3, median = 0.38572756813238945... = 1−median(Beta(3, 2))

The following are the limits with one parameter finite (non-zero) and the other approaching these limits:[citation needed]

 

A reasonable approximation of the value of the median of the beta distribution, for both α and β greater or equal to one, is given by the formula[12]

 

When α, β ≥ 1, the relative error (the absolute error divided by the median) in this approximation is less than 4% and for both α ≥ 2 and β ≥ 2 it is less than 1%. The absolute error divided by the difference between the mean and the mode is similarly small:

  

Mean

 
Mean for Beta distribution for 0 ≤ α ≤ 5 and 0 ≤ β ≤ 5

The expected value (mean) (μ) of a Beta distribution random variable X with two parameters α and β is a function of only the ratio β/α of these parameters:[1]

 

Letting α = β in the above expression one obtains μ = 1/2, showing that for α = β the mean is at the center of the distribution: it is symmetric. Also, the following limits can be obtained from the above expression:

 

Therefore, for β/α → 0, or for α/β → ∞, the mean is located at the right end, x = 1. For these limit ratios, the beta distribution becomes a one-point degenerate distribution with a Dirac delta function spike at the right end, x = 1, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the right end, x = 1.

Similarly, for β/α → ∞, or for α/β → 0, the mean is located at the left end, x = 0. The beta distribution becomes a 1-point Degenerate distribution with a Dirac delta function spike at the left end, x = 0, with probability 1, and zero probability everywhere else. There is 100% probability (absolute certainty) concentrated at the left end, x = 0. Following are the limits with one parameter finite (non-zero) and the other approaching these limits:

 

While for typical unimodal distributions (with centrally located modes, inflexion points at both sides of the mode, and longer tails) (with Beta(αβ) such that α, β > 2) it is known that the sample mean (as an estimate of location) is not as robust as the sample median, the opposite is the case for uniform or "U-shaped" bimodal distributions (with Beta(αβ) such that α, β ≤ 1), with the modes located at the ends of the distribution. As Mosteller and Tukey remark ([13] p. 207) "the average of the two extreme observations uses all the sample information. This illustrates how, for short-tailed distributions, the extreme observations should get more weight." By contrast, it follows that the median of "U-shaped" bimodal distributions with modes at the edge of the distribution (with Beta(αβ) such that α, β ≤ 1) is not robust, as the sample median drops the extreme sample observations from consideration. A practical application of this occurs for example for random walks, since the probability for the time of the last visit to the origin in a random walk is distributed as the arcsine distribution Beta(1/2, 1/2):[5][14] the mean of a number of realizations of a random walk is a much more robust estimator than the median (which is an inappropriate sample measure estimate in this case).

Geometric mean

 
(Mean − GeometricMean) for Beta distribution versus α and β from 0 to 2, showing the asymmetry between α and β for the geometric mean
 
Geometric means for Beta distribution Purple = G(x), Yellow = G(1 − x), smaller values α and β in front
 
Geometric means for Beta distribution. purple = G(x), yellow = G(1 − x), larger values α and β in front

The logarithm of the geometric mean GX of a distribution with random variable X is the arithmetic mean of ln(X), or, equivalently, its expected value:

 

For a beta distribution, the expected value integral gives:

 

where ψ is the digamma function.

Therefore, the geometric mean of a beta distribution with shape parameters α and β is the exponential of the digamma functions of α and β as follows:

 

While for a beta distribution with equal shape parameters α = β, it follows that skewness = 0 and mode = mean = median = 1/2, the geometric mean is less than 1/2: 0 < GX < 1/2. The reason for this is that the logarithmic transformation strongly weights the values of X close to zero, as ln(X) strongly tends towards negative infinity as X approaches zero, while ln(X) flattens towards zero as X → 1.

Along a line α = β, the following limits apply:

 

Following are the limits with one parameter finite (non-zero) and the other approaching these limits:

 

The accompanying plot shows the difference between the mean and the geometric mean for shape parameters α and β from zero to 2. Besides the fact that the difference between them approaches zero as α and β approach infinity and that the difference becomes large for values of α and β approaching zero, one can observe an evident asymmetry of the geometric mean with respect to the shape parameters α and β. The difference between the geometric mean and the mean is larger for small values of α in relation to β than when exchanging the magnitudes of β and α.

N. L.Johnson and S. Kotz[1] suggest the logarithmic approximation to the digamma function ψ(α) ≈ ln(α − 1/2) which results in the following approximation to the geometric mean:

 

Numerical values for the relative error in this approximation follow: [(α = β = 1): 9.39%]; [(α = β = 2): 1.29%]; [(α = 2, β = 3): 1.51%]; [(α = 3, β = 2): 0.44%]; [(α = β = 3): 0.51%]; [(α = β = 4): 0.26%]; [(α = 3, β = 4): 0.55%]; [(α = 4, β = 3): 0.24%].

Similarly, one can calculate the value of shape parameters required for the geometric mean to equal 1/2. Given the value of the parameter β, what would be the value of the other parameter, α, required for the geometric mean to equal 1/2?. The answer is that (for β > 1), the value of α required tends towards β + 1/2 as β → ∞. For example, all these couples have the same geometric mean of 1/2: [β = 1, α = 1.4427], [β = 2, α = 2.46958], [β = 3, α = 3.47943], [β = 4, α = 4.48449], [β = 5, α = 5.48756], [β = 10, α = 10.4938], [β = 100, α = 100.499].

The fundamental property of the geometric mean, which can be proven to be false for any other mean, is

 

This makes the geometric mean the only correct mean when averaging normalized results, that is results that are presented as ratios to reference values.[15] This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions. The geometric mean plays a central role in maximum likelihood estimation, see section "Parameter estimation, maximum likelihood." Actually, when performing maximum likelihood estimation, besides the geometric mean GX based on the random variable X, also another geometric mean appears naturally: the geometric mean based on the linear transformation ––(1 − X), the mirror-image of X, denoted by G(1−X):

 

Along a line α = β, the following limits apply:

 

Following are the limits with one parameter finite (non-zero) and the other approaching these limits:

 

It has the following approximate value:

 

Although both GX and G(1−X) are asymmetric, in the case that both shape parameters are equal α = β, the geometric means are equal: GX = G(1−X). This equality follows from the following symmetry displayed between both geometric means:

 

Harmonic mean

 
Harmonic mean for beta distribution for 0 < α < 5 and 0 < β < 5
 
Harmonic mean for beta distribution versus α and β from 0 to 2
 
Harmonic means for beta distribution Purple = H(X), Yellow = H(1 − X), smaller values α and β in front
 
Harmonic Means for Beta distribution Purple = H(X), Yellow = H(1 − X), larger values α and β in front

The inverse of the harmonic mean (HX) of a distribution with random variable X is the arithmetic mean of 1/X, or, equivalently, its expected value. Therefore, the harmonic mean (HX) of a beta distribution with shape parameters α and β is:

 

The harmonic mean (HX) of a Beta distribution with α < 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter α less than unity.

Letting α = β in the above expression one obtains

 

showing that for α = β the harmonic mean ranges from 0, for α = β = 1, to 1/2, for α = β → ∞.

Following are the limits with one parameter finite (non-zero) and the other approaching these limits:

 

The harmonic mean plays a role in maximum likelihood estimation for the four parameter case, in addition to the geometric mean. Actually, when performing maximum likelihood estimation for the four parameter case, besides the harmonic mean HX based on the random variable X, also another harmonic mean appears naturally: the harmonic mean based on the linear transformation (1 − X), the mirror-image of X, denoted by H1 − X:

 

The harmonic mean (H(1 − X)) of a Beta distribution with β < 1 is undefined, because its defining expression is not bounded in [0, 1] for shape parameter β less than unity.

Letting α = β in the above expression one obtains

 

showing that for α = β the harmonic mean ranges from 0, for α = β = 1, to 1/2, for α = β → ∞.

Following are the limits with one parameter finite (non-zero) and the other approaching these limits:

 

Although both HX and H1−X are asymmetric, in the case that both shape parameters are equal α = β, the harmonic means are equal: HX = H1−X. This equality follows from the following symmetry displayed between both harmonic means:

 

Measures of statistical dispersion

Variance

The variance (the second moment centered on the mean) of a Beta distribution random variable X with parameters α and β is:[1][16]

 

Letting α = β in the above expression one obtains

 

showing that for α = β the variance decreases monotonically as α = β increases. Setting α = β = 0 in this expression, one finds the maximum variance var(X) = 1/4[1] which only occurs approaching the limit, at α = β = 0.

The beta distribution may also be parametrized in terms of its mean μ (0 < μ < 1) and sample size ν = α + β (ν > 0) (see subsection Mean and sample size):

 

Using this parametrization, one can express the variance in terms of the mean μ and the sample size ν as follows:

 

Since ν = α + β > 0, it follows that var(X) < μ(1 − μ).

For a symmetric distribution, the mean is at the middle of the distribution, μ = 1/2, and therefore:

 

Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:

 

 

Geometric variance and covariance

 
log geometric variances vs. α and β
 
log geometric variances vs. α and β

The logarithm of the geometric variance, ln(varGX), of a distribution with random variable X is the second moment of the logarithm of X centered on the geometric mean of X, ln(GX):

 

and therefore, the geometric variance is:

 

In the Fisher information matrix, and the curvature of the log likelihood function, the logarithm of the geometric variance of the reflected variable 1 − X and the logarithm of the geometric covariance between X and 1 − X appear:

 

For a beta distribution, higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two Gamma distributions and differentiating through the integral. They can be expressed in terms of higher order poly-gamma functions. See the section § Moments of logarithmically transformed random variables. The variance of the logarithmic variables and covariance of ln X and ln(1−X) are:

 
 
 

where the trigamma function, denoted ψ1(α), is the second of the polygamma functions, and is defined as the derivative of the digamma function:

 

Therefore,

 
 
 

The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters α and β. The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters α and β greater than 2, and that the log geometric variances rapidly rise in value for shape parameter values α and β less than unity. The log geometric variances are positive for all values of the shape parameters. The log geometric covariance is negative for all values of the shape parameters, and it reaches large negative values for α and β less than unity.

Following are the limits with one parameter finite (non-zero) and the other approaching these limits:

 

Limits with two parameters varying:

 

Although both ln(varGX) and ln(varG(1 − X)) are asymmetric, when the shape parameters are equal, α = β, one has: ln(varGX) = ln(varG(1−X)). This equality follows from the following symmetry displayed between both log geometric variances:

 

The log geometric covariance is symmetric:

 

Mean absolute deviation around the mean

 
Ratio of Mean Abs.Dev. to Std.Dev. for Beta distribution with α and β ranging from 0 to 5
 
Ratio of Mean Abs.Dev. to Std.Dev. for Beta distribution with mean 0 ≤ μ ≤ 1 and sample size 0 < ν ≤ 10

The mean absolute deviation around the mean for the beta distribution with shape parameters α and β is:[6]

 

The mean absolute deviation around the mean is a more robust estimator of statistical dispersion than the standard deviation for beta distributions with tails and inflection points at each side of the mode, Beta(αβ) distributions with α,β > 2, as it depends on the linear (absolute) deviations rather than the square deviations from the mean. Therefore, the effect of very large deviations from the mean are not as overly weighted.

Using Stirling's approximation to the Gamma function, N.L.Johnson and S.Kotz[1] derived the following approximation for values of the shape parameters greater than unity (the relative error for this approximation is only −3.5% for α = β = 1, and it decreases to zero as α → ∞, β → ∞):

 

At the limit α → ∞, β → ∞, the ratio of the mean absolute deviation to the standard deviation (for the beta distribution) becomes equal to the ratio of the same measures for the normal distribution:  . For α = β = 1 this ratio equals  , so that from α = β = 1 to α, β → ∞ the ratio decreases by 8.5%. For α = β = 0 the standard deviation is exactly equal to the mean absolute deviation around the mean. Therefore, this ratio decreases by 15% from α = β = 0 to α = β = 1, and by 25% from α = β = 0 to α, β → ∞ . However, for skewed beta distributions such that α → 0 or β → 0, the ratio of the standard deviation to the mean absolute deviation approaches infinity (although each of them, individually, approaches zero) because the mean absolute deviation approaches zero faster than the standard deviation.

Using the parametrization in terms of mean μ and sample size ν = α + β > 0:

α = μν, β = (1−μ)ν

one can express the mean absolute deviation around the mean in terms of the mean μ and the sample size ν as follows:

 

For a symmetric distribution, the mean is at the middle of the distribution, μ = 1/2, and therefore:

 

Also, the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:

 

Mean absolute difference

The mean absolute difference for the Beta distribution is:

 

The Gini coefficient for the Beta distribution is half of the relative mean absolute difference:

 

Skewness

 
Skewness for Beta Distribution as a function of variance and mean

The skewness (the third moment centered on the mean, normalized by the 3/2 power of the variance) of the beta distribution is[1]

 

Letting α = β in the above expression one obtains γ1 = 0, showing once again that for α = β the distribution is symmetric and hence the skewness is zero. Positive skew (right-tailed) for α < β, negative skew (left-tailed) for α > β.

Using the parametrization in terms of mean μ and sample size ν = α + β:

 

one can express the skewness in terms of the mean μ and the sample size ν as follows:

 

The skewness can also be expressed just in terms of the variance var and the mean μ as follows:

 

The accompanying plot of skewness as a function of variance and mean shows that maximum variance (1/4) is coupled with zero skewness and the symmetry condition (μ = 1/2), and that maximum skewness (positive or negative infinity) occurs when the mean is located at one end or the other, so that the "mass" of the probability distribution is concentrated at the ends (minimum variance).

The following expression for the square of the skewness, in terms of the sample size ν = α + β and the variance var, is useful for the method of moments estimation of four parameters:

 

This expression correctly gives a skewness of zero for α = β, since in that case (see § Variance):  .

For the symmetric case (α = β), skewness = 0 over the whole range, and the following limits apply:

 

For the asymmetric cases (α ≠ β) the following limits (with only the noted variable approaching the limit) can be obtained from the above expressions:

 

  

Kurtosis

 
Excess Kurtosis for Beta Distribution as a function of variance and mean

The beta distribution has been applied in acoustic analysis to assess damage to gears, as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear.[17] Kurtosis has also been used to distinguish the seismic signal generated by a person's footsteps from other signals. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, one can separate different targets based on the seismic waves they generate. Kurtosis is sensitive to impulsive signals, so it's much more sensitive to the signal generated by human footsteps than other signals generated by vehicles, winds, noise, etc.[18] Unfortunately, the notation for kurtosis has not been standardized. Kenney and Keeping[19] use the symbol γ2 for the excess kurtosis, but Abramowitz and Stegun[20] use different terminology. To prevent confusion[21] between kurtosis (the fourth moment centered on the mean, normalized by the square of the variance) and excess kurtosis, when using symbols, they will be spelled out as follows:[6][7]

beta, distribution, confused, with, beta, function, probability, theory, statistics, beta, distribution, family, continuous, probability, distributions, defined, interval, terms, positive, parameters, denoted, alpha, beta, that, appear, exponents, variable, co. Not to be confused with Beta function In probability theory and statistics the beta distribution is a family of continuous probability distributions defined on the interval 0 1 in terms of two positive parameters denoted by alpha a and beta b that appear as exponents of the variable and its complement to 1 respectively and control the shape of the distribution BetaProbability density functionCumulative distribution functionNotationBeta a b Parametersa gt 0 shape real b gt 0 shape real Supportx 0 1 displaystyle x in 0 1 or x 0 1 displaystyle x in 0 1 PDFx a 1 1 x b 1 B a b displaystyle frac x alpha 1 1 x beta 1 mathrm B alpha beta where B a b G a G b G a b displaystyle mathrm B alpha beta frac Gamma alpha Gamma beta Gamma alpha beta and G displaystyle Gamma is the Gamma function CDFI x a b displaystyle I x alpha beta the regularized incomplete beta function MeanE X a a b displaystyle operatorname E X frac alpha alpha beta E ln X ps a ps a b displaystyle operatorname E ln X psi alpha psi alpha beta E X ln X a a b ps a 1 ps a b 1 displaystyle operatorname E X ln X frac alpha alpha beta left psi alpha 1 psi alpha beta 1 right see section Geometric mean where ps displaystyle psi is the digamma functionMedianI 1 2 1 a b in general a 1 3 a b 2 3 for a b gt 1 displaystyle begin matrix I frac 1 2 1 alpha beta text in general 0 5em approx frac alpha tfrac 1 3 alpha beta tfrac 2 3 text for alpha beta gt 1 end matrix Modea 1 a b 2 displaystyle frac alpha 1 alpha beta 2 for a b gt 1any value in 0 1 displaystyle 0 1 for a b 1 0 1 bimodal for a b lt 10 for a 1 b gt 1 1 for a gt 1 b 1Variancevar X a b a b 2 a b 1 displaystyle operatorname var X frac alpha beta alpha beta 2 alpha beta 1 var ln X ps 1 a ps 1 a b displaystyle operatorname var ln X psi 1 alpha psi 1 alpha beta see trigamma function and see section Geometric variance Skewness2 b a a b 1 a b 2 a b displaystyle frac 2 beta alpha sqrt alpha beta 1 alpha beta 2 sqrt alpha beta Ex kurtosis6 a b 2 a b 1 a b a b 2 a b a b 2 a b 3 displaystyle frac 6 alpha beta 2 alpha beta 1 alpha beta alpha beta 2 alpha beta alpha beta 2 alpha beta 3 Entropyln B a b a 1 ps a b 1 ps b a b 2 ps a b displaystyle begin matrix ln mathrm B alpha beta alpha 1 psi alpha beta 1 psi beta 0 5em alpha beta 2 psi alpha beta end matrix MGF1 k 1 r 0 k 1 a r a b r t k k displaystyle 1 sum k 1 infty left prod r 0 k 1 frac alpha r alpha beta r right frac t k k CF1 F 1 a a b i t displaystyle 1 F 1 alpha alpha beta i t see Confluent hypergeometric function Fisher information var ln X cov ln X ln 1 X cov ln X ln 1 X var ln 1 X displaystyle begin bmatrix operatorname var ln X amp operatorname cov ln X ln 1 X operatorname cov ln X ln 1 X amp operatorname var ln 1 X end bmatrix see section Fisher information matrixMethod of Momentsa E X 1 E X V X 1 E X displaystyle alpha left frac E X 1 E X V X 1 right E X b E X 1 E X V X 1 1 E X displaystyle beta left frac E X 1 E X V X 1 right 1 E X The beta distribution has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines The beta distribution is a suitable model for the random behavior of percentages and proportions In Bayesian inference the beta distribution is the conjugate prior probability distribution for the Bernoulli binomial negative binomial and geometric distributions The formulation of the beta distribution discussed here is also known as the beta distribution of the first kind whereas beta distribution of the second kind is an alternative name for the beta prime distribution The generalization to multiple variables is called a Dirichlet distribution Contents 1 Definitions 1 1 Probability density function 1 2 Cumulative distribution function 1 3 Alternative parameterizations 1 3 1 Two parameters 1 3 1 1 Mean and sample size 1 3 1 2 Mode and concentration 1 3 1 3 Mean and variance 1 3 2 Four parameters 2 Properties 2 1 Measures of central tendency 2 1 1 Mode 2 1 2 Median 2 1 3 Mean 2 1 4 Geometric mean 2 1 5 Harmonic mean 2 2 Measures of statistical dispersion 2 2 1 Variance 2 2 2 Geometric variance and covariance 2 2 3 Mean absolute deviation around the mean 2 2 4 Mean absolute difference 2 3 Skewness 2 4 Kurtosis 2 5 Characteristic function 2 6 Other moments 2 6 1 Moment generating function 2 6 2 Higher moments 2 6 3 Moments of transformed random variables 2 6 3 1 Moments of linearly transformed product and inverted random variables 2 6 3 2 Moments of logarithmically transformed random variables 2 7 Quantities of information entropy 2 8 Relationships between statistical measures 2 8 1 Mean mode and median relationship 2 8 2 Mean geometric mean and harmonic mean relationship 2 8 3 Kurtosis bounded by the square of the skewness 2 9 Symmetry 2 10 Geometry of the probability density function 2 10 1 Inflection points 2 10 2 Shapes 2 10 2 1 Symmetric a b 2 10 2 2 Skewed a b 3 Related distributions 3 1 Transformations 3 2 Special and limiting cases 3 3 Derived from other distributions 3 4 Combination with other distributions 3 5 Compounding with other distributions 3 6 Generalisations 4 Statistical inference 4 1 Parameter estimation 4 1 1 Method of moments 4 1 1 1 Two unknown parameters 4 1 1 2 Four unknown parameters 4 1 2 Maximum likelihood 4 1 2 1 Two unknown parameters 4 1 2 2 Four unknown parameters 4 1 3 Fisher information matrix 4 1 3 1 Two parameters 4 1 3 2 Four parameters 4 2 Bayesian inference 4 2 1 Rule of succession 4 2 2 Bayes Laplace prior probability Beta 1 1 4 2 3 Haldane s prior probability Beta 0 0 4 2 4 Jeffreys prior probability Beta 1 2 1 2 for a Bernoulli or for a binomial distribution 4 2 5 Effect of different prior probability choices on the posterior beta distribution 5 Occurrence and applications 5 1 Order statistics 5 2 Subjective logic 5 3 Wavelet analysis 5 4 Population genetics 5 5 Project management task cost and schedule modeling 6 Random variate generation 7 Normal approximation to the Beta distribution 8 History 9 References 10 External linksDefinitions EditProbability density function Edit An animation of the Beta distribution for different values of its parameters The probability density function PDF of the beta distribution for 0 x 1 and shape parameters a b gt 0 is a power function of the variable x and of its reflection 1 x as follows f x a b c o n s t a n t x a 1 1 x b 1 x a 1 1 x b 1 0 1 u a 1 1 u b 1 d u G a b G a G b x a 1 1 x b 1 1 B a b x a 1 1 x b 1 displaystyle begin aligned f x alpha beta amp mathrm constant cdot x alpha 1 1 x beta 1 3pt amp frac x alpha 1 1 x beta 1 displaystyle int 0 1 u alpha 1 1 u beta 1 du 6pt amp frac Gamma alpha beta Gamma alpha Gamma beta x alpha 1 1 x beta 1 6pt amp frac 1 mathrm B alpha beta x alpha 1 1 x beta 1 end aligned where G z is the gamma function The beta function B displaystyle mathrm B is a normalization constant to ensure that the total probability is 1 In the above equations x is a realization an observed value that actually occurred of a random variable X This definition includes both ends x 0 and x 1 which is consistent with definitions for other continuous distributions supported on a bounded interval which are special cases of the beta distribution for example the arcsine distribution and consistent with several authors like N L Johnson and S Kotz 1 2 3 4 However the inclusion of x 0 and x 1 does not work for a b lt 1 accordingly several other authors including W Feller 5 6 7 choose to exclude the ends x 0 and x 1 so that the two ends are not actually part of the domain of the density function and consider instead 0 lt x lt 1 Several authors including N L Johnson and S Kotz 1 use the symbols p and q instead of a and b for the shape parameters of the beta distribution reminiscent of the symbols traditionally used for the parameters of the Bernoulli distribution because the beta distribution approaches the Bernoulli distribution in the limit when both shape parameters a and b approach the value of zero In the following a random variable X beta distributed with parameters a and b will be denoted by 8 9 X Beta a b displaystyle X sim operatorname Beta alpha beta Other notations for beta distributed random variables used in the statistical literature are X B e a b displaystyle X sim mathcal B e alpha beta 10 and X b a b displaystyle X sim beta alpha beta 5 Cumulative distribution function Edit CDF for symmetric beta distribution vs x and a b CDF for skewed beta distribution vs x and b 5a The cumulative distribution function is F x a b B x a b B a b I x a b displaystyle F x alpha beta frac mathrm B x alpha beta mathrm B alpha beta I x alpha beta where B x a b displaystyle mathrm B x alpha beta is the incomplete beta function and I x a b displaystyle I x alpha beta is the regularized incomplete beta function Alternative parameterizations Edit Two parameters Edit Mean and sample size Edit The beta distribution may also be reparameterized in terms of its mean m 0 lt m lt 1 and the sum of the two shape parameters n a b gt 0 9 p 83 Denoting by aPosterior and bPosterior the shape parameters of the posterior beta distribution resulting from applying Bayes theorem to a binomial likelihood function and a prior probability the interpretation of the addition of both shape parameters to be sample size n a Posterior b Posterior is only correct for the Haldane prior probability Beta 0 0 Specifically for the Bayes uniform prior Beta 1 1 the correct interpretation would be sample size a Posterior b Posterior 2 or n sample size 2 For sample size much larger than 2 the difference between these two priors becomes negligible See section Bayesian inference for further details n a b is referred to as the sample size of a Beta distribution but one should remember that it is strictly speaking the sample size of a binomial likelihood function only when using a Haldane Beta 0 0 prior in Bayes theorem This parametrization may be useful in Bayesian parameter estimation For example one may administer a test to a number of individuals If it is assumed that each person s score 0 8 1 is drawn from a population level Beta distribution then an important statistic is the mean of this population level distribution The mean and sample size parameters are related to the shape parameters a and b via 9 a mn b 1 m nUnder this parametrization one may place an uninformative prior probability over the mean and a vague prior probability such as an exponential or gamma distribution over the positive reals for the sample size if they are independent and prior data and or beliefs justify it Mode and concentration Edit Concave beta distributions which have a b gt 1 displaystyle alpha beta gt 1 can be parametrized in terms of mode and concentration The mode w a 1 a b 2 displaystyle omega frac alpha 1 alpha beta 2 and concentration k a b displaystyle kappa alpha beta can be used to define the usual shape parameters as follows 11 a w k 2 1 b 1 w k 2 1 displaystyle begin aligned alpha amp omega kappa 2 1 beta amp 1 omega kappa 2 1 end aligned For the mode 0 lt w lt 1 displaystyle 0 lt omega lt 1 to be well defined we need a b gt 1 displaystyle alpha beta gt 1 or equivalently k gt 2 displaystyle kappa gt 2 If instead we define the concentration as c a b 2 displaystyle c alpha beta 2 the condition simplifies to c gt 0 displaystyle c gt 0 and the beta density at a 1 c w displaystyle alpha 1 c omega and b 1 c 1 w displaystyle beta 1 c 1 omega can be written as f x w c x c w 1 x c 1 w B 1 c w 1 c 1 w displaystyle f x omega c frac x c omega 1 x c 1 omega mathrm B bigl 1 c omega 1 c 1 omega bigr where c displaystyle c directly scales the sufficient statistics log x displaystyle log x and log 1 x displaystyle log 1 x Note also that in the limit c 0 displaystyle c to 0 the distribution becomes flat Mean and variance Edit Solving the system of coupled equations given in the above sections as the equations for the mean and the variance of the beta distribution in terms of the original parameters a and b one can express the a and b parameters in terms of the mean m and the variance var n a b m 1 m v a r 1 where n a b gt 0 therefore var lt m 1 m a m n m m 1 m var 1 if var lt m 1 m b 1 m n 1 m m 1 m var 1 if var lt m 1 m displaystyle begin aligned nu amp alpha beta frac mu 1 mu mathrm var 1 text where nu alpha beta gt 0 text therefore text var lt mu 1 mu alpha amp mu nu mu left frac mu 1 mu text var 1 right text if text var lt mu 1 mu beta amp 1 mu nu 1 mu left frac mu 1 mu text var 1 right text if text var lt mu 1 mu end aligned This parametrization of the beta distribution may lead to a more intuitive understanding than the one based on the original parameters a and b For example by expressing the mode skewness excess kurtosis and differential entropy in terms of the mean and the variance Four parameters Edit A beta distribution with the two shape parameters a and b is supported on the range 0 1 or 0 1 It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum a and maximum c c gt a values of the distribution 1 by a linear transformation substituting the non dimensional variable x in terms of the new variable y with support a c or a c and the parameters a and c y x c a a therefore x y a c a displaystyle y x c a a text therefore x frac y a c a The probability density function of the four parameter beta distribution is equal to the two parameter distribution scaled by the range c a so that the total area under the density curve equals a probability of one and with the y variable shifted and scaled as follows f y a b a c f x a b c a y a c a a 1 c y c a b 1 c a B a b y a a 1 c y b 1 c a a b 1 B a b displaystyle f y alpha beta a c frac f x alpha beta c a frac left frac y a c a right alpha 1 left frac c y c a right beta 1 c a B alpha beta frac y a alpha 1 c y beta 1 c a alpha beta 1 B alpha beta dd That a random variable Y is Beta distributed with four parameters a b a and c will be denoted by Y Beta a b a c displaystyle Y sim operatorname Beta alpha beta a c Some measures of central location are scaled by c a and shifted by a as follows m Y m X c a a a a b c a a a c b a a b mode Y mode X c a a a 1 a b 2 c a a a 1 c b 1 a a b 2 if a b gt 1 median Y median X c a a I 1 2 1 a b c a a displaystyle begin aligned mu Y amp mu X c a a left frac alpha alpha beta right c a a frac alpha c beta a alpha beta text mode Y amp text mode X c a a left frac alpha 1 alpha beta 2 right c a a frac alpha 1 c beta 1 a alpha beta 2 qquad text if alpha beta gt 1 text median Y amp text median X c a a left I frac 1 2 1 alpha beta right c a a end aligned Note the geometric mean and harmonic mean cannot be transformed by a linear transformation in the way that the mean median and mode can The shape parameters of Y can be written in term of its mean and variance as a a m Y a c a m Y c m Y m Y 2 s Y 2 s Y 2 c a b c m Y a c a m Y c m Y m Y 2 s Y 2 s Y 2 c a displaystyle begin aligned alpha amp frac a mu Y a c a mu Y c mu Y mu Y 2 sigma Y 2 sigma Y 2 c a beta amp frac c mu Y a c a mu Y c mu Y mu Y 2 sigma Y 2 sigma Y 2 c a end aligned The statistical dispersion measures are scaled they do not need to be shifted because they are already centered on the mean by the range c a linearly for the mean deviation and nonlinearly for the variance mean deviation around mean Y displaystyle text mean deviation around mean Y mean deviation around mean X c a 2 a a b b B a b a b a b 1 c a displaystyle text mean deviation around mean X c a frac 2 alpha alpha beta beta mathrm B alpha beta alpha beta alpha beta 1 c a var Y var X c a 2 a b c a 2 a b 2 a b 1 displaystyle text var Y text var X c a 2 frac alpha beta c a 2 alpha beta 2 alpha beta 1 dd Since the skewness and excess kurtosis are non dimensional quantities as moments centered on the mean and normalized by the standard deviation they are independent of the parameters a and c and therefore equal to the expressions given above in terms of X with support 0 1 or 0 1 skewness Y skewness X 2 b a a b 1 a b 2 a b displaystyle text skewness Y text skewness X frac 2 beta alpha sqrt alpha beta 1 alpha beta 2 sqrt alpha beta dd kurtosis excess Y kurtosis excess X 6 a b 2 a b 1 a b a b 2 a b a b 2 a b 3 displaystyle text kurtosis excess Y text kurtosis excess X frac 6 alpha beta 2 alpha beta 1 alpha beta alpha beta 2 alpha beta alpha beta 2 alpha beta 3 dd Properties EditMeasures of central tendency Edit Mode Edit The mode of a Beta distributed random variable X with a b gt 1 is the most likely value of the distribution corresponding to the peak in the PDF and is given by the following expression 1 a 1 a b 2 displaystyle frac alpha 1 alpha beta 2 When both parameters are less than one a b lt 1 this is the anti mode the lowest point of the probability density curve 3 Letting a b the expression for the mode simplifies to 1 2 showing that for a b gt 1 the mode resp anti mode when a b lt 1 is at the center of the distribution it is symmetric in those cases See Shapes section in this article for a full list of mode cases for arbitrary values of a and b For several of these cases the maximum value of the density function occurs at one or both ends In some cases the maximum value of the density function occurring at the end is finite For example in the case of a 2 b 1 or a 1 b 2 the density function becomes a right triangle distribution which is finite at both ends In several other cases there is a singularity at one end where the value of the density function approaches infinity For example in the case a b 1 2 the Beta distribution simplifies to become the arcsine distribution There is debate among mathematicians about some of these cases and whether the ends x 0 and x 1 can be called modes or not 6 8 Mode for Beta distribution for 1 a 5 and 1 b 5 Whether the ends are part of the domain of the density function Whether a singularity can ever be called a mode Whether cases with two maxima should be called bimodalMedian Edit Median for Beta distribution for 0 a 5 and 0 b 5 Mean Median for Beta distribution versus alpha and beta from 0 to 2 The median of the beta distribution is the unique real number x I 1 2 1 a b displaystyle x I frac 1 2 1 alpha beta for which the regularized incomplete beta function I x a b 1 2 displaystyle I x alpha beta tfrac 1 2 There is no general closed form expression for the median of the beta distribution for arbitrary values of a and b Closed form expressions for particular values of the parameters a and b follow citation needed For symmetric cases a b median 1 2 For a 1 and b gt 0 median 1 2 1 b displaystyle 1 2 frac 1 beta this case is the mirror image of the power function 0 1 distribution For a gt 0 and b 1 median 2 1 a displaystyle 2 frac 1 alpha this case is the power function 0 1 distribution 6 For a 3 and b 2 median 0 6142724318676105 the real solution to the quartic equation 1 8x3 6x4 0 which lies in 0 1 For a 2 and b 3 median 0 38572756813238945 1 median Beta 3 2 The following are the limits with one parameter finite non zero and the other approaching these limits citation needed lim b 0 median lim a median 1 lim a 0 median lim b median 0 displaystyle begin aligned lim beta to 0 text median lim alpha to infty text median 1 lim alpha to 0 text median lim beta to infty text median 0 end aligned A reasonable approximation of the value of the median of the beta distribution for both a and b greater or equal to one is given by the formula 12 median a 1 3 a b 2 3 for a b 1 displaystyle text median approx frac alpha tfrac 1 3 alpha beta tfrac 2 3 text for alpha beta geq 1 When a b 1 the relative error the absolute error divided by the median in this approximation is less than 4 and for both a 2 and b 2 it is less than 1 The absolute error divided by the difference between the mean and the mode is similarly small Mean Edit Mean for Beta distribution for 0 a 5 and 0 b 5 The expected value mean m of a Beta distribution random variable X with two parameters a and b is a function of only the ratio b a of these parameters 1 m E X 0 1 x f x a b d x 0 1 x x a 1 1 x b 1 B a b d x a a b 1 1 b a displaystyle begin aligned mu operatorname E X amp int 0 1 xf x alpha beta dx amp int 0 1 x frac x alpha 1 1 x beta 1 mathrm B alpha beta dx amp frac alpha alpha beta amp frac 1 1 frac beta alpha end aligned Letting a b in the above expression one obtains m 1 2 showing that for a b the mean is at the center of the distribution it is symmetric Also the following limits can be obtained from the above expression lim b a 0 m 1 lim b a m 0 displaystyle begin aligned lim frac beta alpha to 0 mu 1 lim frac beta alpha to infty mu 0 end aligned Therefore for b a 0 or for a b the mean is located at the right end x 1 For these limit ratios the beta distribution becomes a one point degenerate distribution with a Dirac delta function spike at the right end x 1 with probability 1 and zero probability everywhere else There is 100 probability absolute certainty concentrated at the right end x 1 Similarly for b a or for a b 0 the mean is located at the left end x 0 The beta distribution becomes a 1 point Degenerate distribution with a Dirac delta function spike at the left end x 0 with probability 1 and zero probability everywhere else There is 100 probability absolute certainty concentrated at the left end x 0 Following are the limits with one parameter finite non zero and the other approaching these limits lim b 0 m lim a m 1 lim a 0 m lim b m 0 displaystyle begin aligned lim beta to 0 mu lim alpha to infty mu 1 lim alpha to 0 mu lim beta to infty mu 0 end aligned While for typical unimodal distributions with centrally located modes inflexion points at both sides of the mode and longer tails with Beta a b such that a b gt 2 it is known that the sample mean as an estimate of location is not as robust as the sample median the opposite is the case for uniform or U shaped bimodal distributions with Beta a b such that a b 1 with the modes located at the ends of the distribution As Mosteller and Tukey remark 13 p 207 the average of the two extreme observations uses all the sample information This illustrates how for short tailed distributions the extreme observations should get more weight By contrast it follows that the median of U shaped bimodal distributions with modes at the edge of the distribution with Beta a b such that a b 1 is not robust as the sample median drops the extreme sample observations from consideration A practical application of this occurs for example for random walks since the probability for the time of the last visit to the origin in a random walk is distributed as the arcsine distribution Beta 1 2 1 2 5 14 the mean of a number of realizations of a random walk is a much more robust estimator than the median which is an inappropriate sample measure estimate in this case Geometric mean Edit Mean GeometricMean for Beta distribution versus a and b from 0 to 2 showing the asymmetry between a and b for the geometric mean Geometric means for Beta distribution Purple G x Yellow G 1 x smaller values a and b in front Geometric means for Beta distribution purple G x yellow G 1 x larger values a and b in front The logarithm of the geometric mean GX of a distribution with random variable X is the arithmetic mean of ln X or equivalently its expected value ln G X E ln X displaystyle ln G X operatorname E ln X For a beta distribution the expected value integral gives E ln X 0 1 ln x f x a b d x 0 1 ln x x a 1 1 x b 1 B a b d x 1 B a b 0 1 x a 1 1 x b 1 a d x 1 B a b a 0 1 x a 1 1 x b 1 d x 1 B a b B a b a ln B a b a ln G a a ln G a b a ps a ps a b displaystyle begin aligned operatorname E ln X amp int 0 1 ln x f x alpha beta dx 4pt amp int 0 1 ln x frac x alpha 1 1 x beta 1 mathrm B alpha beta dx 4pt amp frac 1 mathrm B alpha beta int 0 1 frac partial x alpha 1 1 x beta 1 partial alpha dx 4pt amp frac 1 mathrm B alpha beta frac partial partial alpha int 0 1 x alpha 1 1 x beta 1 dx 4pt amp frac 1 mathrm B alpha beta frac partial mathrm B alpha beta partial alpha 4pt amp frac partial ln mathrm B alpha beta partial alpha 4pt amp frac partial ln Gamma alpha partial alpha frac partial ln Gamma alpha beta partial alpha 4pt amp psi alpha psi alpha beta end aligned where ps is the digamma function Therefore the geometric mean of a beta distribution with shape parameters a and b is the exponential of the digamma functions of a and b as follows G X e E ln X e ps a ps a b displaystyle G X e operatorname E ln X e psi alpha psi alpha beta While for a beta distribution with equal shape parameters a b it follows that skewness 0 and mode mean median 1 2 the geometric mean is less than 1 2 0 lt GX lt 1 2 The reason for this is that the logarithmic transformation strongly weights the values of X close to zero as ln X strongly tends towards negative infinity as X approaches zero while ln X flattens towards zero as X 1 Along a line a b the following limits apply lim a b 0 G X 0 lim a b G X 1 2 displaystyle begin aligned amp lim alpha beta to 0 G X 0 amp lim alpha beta to infty G X tfrac 1 2 end aligned Following are the limits with one parameter finite non zero and the other approaching these limits lim b 0 G X lim a G X 1 lim a 0 G X lim b G X 0 displaystyle begin aligned lim beta to 0 G X lim alpha to infty G X 1 lim alpha to 0 G X lim beta to infty G X 0 end aligned The accompanying plot shows the difference between the mean and the geometric mean for shape parameters a and b from zero to 2 Besides the fact that the difference between them approaches zero as a and b approach infinity and that the difference becomes large for values of a and b approaching zero one can observe an evident asymmetry of the geometric mean with respect to the shape parameters a and b The difference between the geometric mean and the mean is larger for small values of a in relation to b than when exchanging the magnitudes of b and a N L Johnson and S Kotz 1 suggest the logarithmic approximation to the digamma function ps a ln a 1 2 which results in the following approximation to the geometric mean G X a 1 2 a b 1 2 if a b gt 1 displaystyle G X approx frac alpha frac 1 2 alpha beta frac 1 2 text if alpha beta gt 1 Numerical values for the relative error in this approximation follow a b 1 9 39 a b 2 1 29 a 2 b 3 1 51 a 3 b 2 0 44 a b 3 0 51 a b 4 0 26 a 3 b 4 0 55 a 4 b 3 0 24 Similarly one can calculate the value of shape parameters required for the geometric mean to equal 1 2 Given the value of the parameter b what would be the value of the other parameter a required for the geometric mean to equal 1 2 The answer is that for b gt 1 the value of a required tends towards b 1 2 as b For example all these couples have the same geometric mean of 1 2 b 1 a 1 4427 b 2 a 2 46958 b 3 a 3 47943 b 4 a 4 48449 b 5 a 5 48756 b 10 a 10 4938 b 100 a 100 499 The fundamental property of the geometric mean which can be proven to be false for any other mean is G X i Y i G X i G Y i displaystyle G left frac X i Y i right frac G X i G Y i This makes the geometric mean the only correct mean when averaging normalized results that is results that are presented as ratios to reference values 15 This is relevant because the beta distribution is a suitable model for the random behavior of percentages and it is particularly suitable to the statistical modelling of proportions The geometric mean plays a central role in maximum likelihood estimation see section Parameter estimation maximum likelihood Actually when performing maximum likelihood estimation besides the geometric mean GX based on the random variable X also another geometric mean appears naturally the geometric mean based on the linear transformation 1 X the mirror image of X denoted by G 1 X G 1 X e E ln 1 X e ps b ps a b displaystyle G 1 X e operatorname E ln 1 X e psi beta psi alpha beta Along a line a b the following limits apply lim a b 0 G 1 X 0 lim a b G 1 X 1 2 displaystyle begin aligned amp lim alpha beta to 0 G 1 X 0 amp lim alpha beta to infty G 1 X tfrac 1 2 end aligned Following are the limits with one parameter finite non zero and the other approaching these limits lim b 0 G 1 X lim a G 1 X 0 lim a 0 G 1 X lim b G 1 X 1 displaystyle begin aligned lim beta to 0 G 1 X lim alpha to infty G 1 X 0 lim alpha to 0 G 1 X lim beta to infty G 1 X 1 end aligned It has the following approximate value G 1 X b 1 2 a b 1 2 if a b gt 1 displaystyle G 1 X approx frac beta frac 1 2 alpha beta frac 1 2 text if alpha beta gt 1 Although both GX and G 1 X are asymmetric in the case that both shape parameters are equal a b the geometric means are equal GX G 1 X This equality follows from the following symmetry displayed between both geometric means G X B a b G 1 X B b a displaystyle G X mathrm B alpha beta G 1 X mathrm B beta alpha Harmonic mean Edit Harmonic mean for beta distribution for 0 lt a lt 5 and 0 lt b lt 5 Harmonic mean for beta distribution versus a and b from 0 to 2 Harmonic means for beta distribution Purple H X Yellow H 1 X smaller values a and b in front Harmonic Means for Beta distribution Purple H X Yellow H 1 X larger values a and b in front The inverse of the harmonic mean HX of a distribution with random variable X is the arithmetic mean of 1 X or equivalently its expected value Therefore the harmonic mean HX of a beta distribution with shape parameters a and b is H X 1 E 1 X 1 0 1 f x a b x d x 1 0 1 x a 1 1 x b 1 x B a b d x a 1 a b 1 if a gt 1 and b gt 0 displaystyle begin aligned H X amp frac 1 operatorname E left frac 1 X right amp frac 1 int 0 1 frac f x alpha beta x dx amp frac 1 int 0 1 frac x alpha 1 1 x beta 1 x mathrm B alpha beta dx amp frac alpha 1 alpha beta 1 text if alpha gt 1 text and beta gt 0 end aligned The harmonic mean HX of a Beta distribution with a lt 1 is undefined because its defining expression is not bounded in 0 1 for shape parameter a less than unity Letting a b in the above expression one obtains H X a 1 2 a 1 displaystyle H X frac alpha 1 2 alpha 1 showing that for a b the harmonic mean ranges from 0 for a b 1 to 1 2 for a b Following are the limits with one parameter finite non zero and the other approaching these limits lim a 0 H X is undefined lim a 1 H X lim b H X 0 lim b 0 H X lim a H X 1 displaystyle begin aligned amp lim alpha to 0 H X text is undefined amp lim alpha to 1 H X lim beta to infty H X 0 amp lim beta to 0 H X lim alpha to infty H X 1 end aligned The harmonic mean plays a role in maximum likelihood estimation for the four parameter case in addition to the geometric mean Actually when performing maximum likelihood estimation for the four parameter case besides the harmonic mean HX based on the random variable X also another harmonic mean appears naturally the harmonic mean based on the linear transformation 1 X the mirror image of X denoted by H1 X H 1 X 1 E 1 1 X b 1 a b 1 if b gt 1 and a gt 0 displaystyle H 1 X frac 1 operatorname E left frac 1 1 X right frac beta 1 alpha beta 1 text if beta gt 1 text and alpha gt 0 The harmonic mean H 1 X of a Beta distribution with b lt 1 is undefined because its defining expression is not bounded in 0 1 for shape parameter b less than unity Letting a b in the above expression one obtains H 1 X b 1 2 b 1 displaystyle H 1 X frac beta 1 2 beta 1 showing that for a b the harmonic mean ranges from 0 for a b 1 to 1 2 for a b Following are the limits with one parameter finite non zero and the other approaching these limits lim b 0 H 1 X is undefined lim b 1 H 1 X lim a H 1 X 0 lim a 0 H 1 X lim b H 1 X 1 displaystyle begin aligned amp lim beta to 0 H 1 X text is undefined amp lim beta to 1 H 1 X lim alpha to infty H 1 X 0 amp lim alpha to 0 H 1 X lim beta to infty H 1 X 1 end aligned Although both HX and H1 X are asymmetric in the case that both shape parameters are equal a b the harmonic means are equal HX H1 X This equality follows from the following symmetry displayed between both harmonic means H X B a b H 1 X B b a if a b gt 1 displaystyle H X mathrm B alpha beta H 1 X mathrm B beta alpha text if alpha beta gt 1 Measures of statistical dispersion Edit Variance Edit The variance the second moment centered on the mean of a Beta distribution random variable X with parameters a and b is 1 16 var X E X m 2 a b a b 2 a b 1 displaystyle operatorname var X operatorname E X mu 2 frac alpha beta alpha beta 2 alpha beta 1 Letting a b in the above expression one obtains var X 1 4 2 b 1 displaystyle operatorname var X frac 1 4 2 beta 1 showing that for a b the variance decreases monotonically as a b increases Setting a b 0 in this expression one finds the maximum variance var X 1 4 1 which only occurs approaching the limit at a b 0 The beta distribution may also be parametrized in terms of its mean m 0 lt m lt 1 and sample size n a b n gt 0 see subsection Mean and sample size a m n where n a b gt 0 b 1 m n where n a b gt 0 displaystyle begin aligned alpha amp mu nu text where nu alpha beta gt 0 beta amp 1 mu nu text where nu alpha beta gt 0 end aligned Using this parametrization one can express the variance in terms of the mean m and the sample size n as follows var X m 1 m 1 n displaystyle operatorname var X frac mu 1 mu 1 nu Since n a b gt 0 it follows that var X lt m 1 m For a symmetric distribution the mean is at the middle of the distribution m 1 2 and therefore var X 1 4 1 n if m 1 2 displaystyle operatorname var X frac 1 4 1 nu text if mu tfrac 1 2 Also the following limits with only the noted variable approaching the limit can be obtained from the above expressions lim b 0 var X lim a 0 var X lim b var X lim a var X lim n var X lim m 0 var X lim m 1 var X 0 lim n 0 var X m 1 m displaystyle begin aligned amp lim beta to 0 operatorname var X lim alpha to 0 operatorname var X lim beta to infty operatorname var X lim alpha to infty operatorname var X lim nu to infty operatorname var X lim mu to 0 operatorname var X lim mu to 1 operatorname var X 0 amp lim nu to 0 operatorname var X mu 1 mu end aligned Geometric variance and covariance Edit log geometric variances vs a and b log geometric variances vs a and b The logarithm of the geometric variance ln varGX of a distribution with random variable X is the second moment of the logarithm of X centered on the geometric mean of X ln GX ln var G X E ln X ln G X 2 E ln X E ln X 2 E ln X 2 E ln X 2 var ln X displaystyle begin aligned ln operatorname var GX amp operatorname E left ln X ln G X 2 right amp operatorname E ln X operatorname E left ln X 2 right amp operatorname E left ln X 2 right operatorname E ln X 2 amp operatorname var ln X end aligned and therefore the geometric variance is var G X e var ln X displaystyle operatorname var GX e operatorname var ln X In the Fisher information matrix and the curvature of the log likelihood function the logarithm of the geometric variance of the reflected variable 1 X and the logarithm of the geometric covariance between X and 1 X appear ln v a r G 1 X E ln 1 X ln G 1 X 2 E ln 1 X E ln 1 X 2 E ln 1 X 2 E ln 1 X 2 var ln 1 X v a r G 1 X e var ln 1 X ln c o v G X 1 X E ln X ln G X ln 1 X ln G 1 X E ln X E ln X ln 1 X E ln 1 X E ln X ln 1 X E ln X E ln 1 X cov ln X ln 1 X cov G X 1 X e cov ln X ln 1 X displaystyle begin aligned ln operatorname var G 1 X amp operatorname E ln 1 X ln G 1 X 2 amp operatorname E ln 1 X operatorname E ln 1 X 2 amp operatorname E ln 1 X 2 operatorname E ln 1 X 2 amp operatorname var ln 1 X amp operatorname var G 1 X amp e operatorname var ln 1 X amp ln operatorname cov G X 1 X amp operatorname E ln X ln G X ln 1 X ln G 1 X amp operatorname E ln X operatorname E ln X ln 1 X operatorname E ln 1 X amp operatorname E left ln X ln 1 X right operatorname E ln X operatorname E ln 1 X amp operatorname cov ln X ln 1 X amp operatorname cov G X 1 X amp e operatorname cov ln X ln 1 X end aligned For a beta distribution higher order logarithmic moments can be derived by using the representation of a beta distribution as a proportion of two Gamma distributions and differentiating through the integral They can be expressed in terms of higher order poly gamma functions See the section Moments of logarithmically transformed random variables The variance of the logarithmic variables and covariance of ln X and ln 1 X are var ln X ps 1 a ps 1 a b displaystyle operatorname var ln X psi 1 alpha psi 1 alpha beta var ln 1 X ps 1 b ps 1 a b displaystyle operatorname var ln 1 X psi 1 beta psi 1 alpha beta cov ln X ln 1 X ps 1 a b displaystyle operatorname cov ln X ln 1 X psi 1 alpha beta where the trigamma function denoted ps1 a is the second of the polygamma functions and is defined as the derivative of the digamma function ps 1 a d 2 ln G a d a 2 d ps a d a displaystyle psi 1 alpha frac d 2 ln Gamma alpha d alpha 2 frac d psi alpha d alpha Therefore ln var G X var ln X ps 1 a ps 1 a b displaystyle ln operatorname var GX operatorname var ln X psi 1 alpha psi 1 alpha beta ln var G 1 X var ln 1 X ps 1 b ps 1 a b displaystyle ln operatorname var G 1 X operatorname var ln 1 X psi 1 beta psi 1 alpha beta ln cov G X 1 X cov ln X ln 1 X ps 1 a b displaystyle ln operatorname cov GX 1 X operatorname cov ln X ln 1 X psi 1 alpha beta The accompanying plots show the log geometric variances and log geometric covariance versus the shape parameters a and b The plots show that the log geometric variances and log geometric covariance are close to zero for shape parameters a and b greater than 2 and that the log geometric variances rapidly rise in value for shape parameter values a and b less than unity The log geometric variances are positive for all values of the shape parameters The log geometric covariance is negative for all values of the shape parameters and it reaches large negative values for a and b less than unity Following are the limits with one parameter finite non zero and the other approaching these limits lim a 0 ln var G X lim b 0 ln var G 1 X lim b 0 ln var G X lim a ln var G X lim a 0 ln var G 1 X lim b ln var G 1 X lim a ln cov G X 1 X lim b ln cov G X 1 X 0 lim b ln var G X ps 1 a lim a ln var G 1 X ps 1 b lim a 0 ln cov G X 1 X ps 1 b lim b 0 ln cov G X 1 X ps 1 a displaystyle begin aligned amp lim alpha to 0 ln operatorname var GX lim beta to 0 ln operatorname var G 1 X infty amp lim beta to 0 ln operatorname var GX lim alpha to infty ln operatorname var GX lim alpha to 0 ln operatorname var G 1 X lim beta to infty ln operatorname var G 1 X lim alpha to infty ln operatorname cov GX 1 X lim beta to infty ln operatorname cov GX 1 X 0 amp lim beta to infty ln operatorname var GX psi 1 alpha amp lim alpha to infty ln operatorname var G 1 X psi 1 beta amp lim alpha to 0 ln operatorname cov GX 1 X psi 1 beta amp lim beta to 0 ln operatorname cov GX 1 X psi 1 alpha end aligned Limits with two parameters varying lim a lim b ln var G X lim b lim a ln var G 1 X lim a lim b 0 ln cov G X 1 X lim b lim a 0 ln cov G X 1 X 0 lim a lim b 0 ln var G X lim b lim a 0 ln var G 1 X lim a 0 lim b 0 ln cov G X 1 X lim b 0 lim a 0 ln cov G X 1 X displaystyle begin aligned amp lim alpha to infty lim beta to infty ln operatorname var GX lim beta to infty lim alpha to infty ln operatorname var G 1 X lim alpha to infty lim beta to 0 ln operatorname cov GX 1 X lim beta to infty lim alpha to 0 ln operatorname cov GX 1 X 0 amp lim alpha to infty lim beta to 0 ln operatorname var GX lim beta to infty lim alpha to 0 ln operatorname var G 1 X infty amp lim alpha to 0 lim beta to 0 ln operatorname cov GX 1 X lim beta to 0 lim alpha to 0 ln operatorname cov GX 1 X infty end aligned Although both ln varGX and ln varG 1 X are asymmetric when the shape parameters are equal a b one has ln varGX ln varG 1 X This equality follows from the following symmetry displayed between both log geometric variances ln var G X B a b ln var G 1 X B b a displaystyle ln operatorname var GX mathrm B alpha beta ln operatorname var G 1 X mathrm B beta alpha The log geometric covariance is symmetric ln cov G X 1 X B a b ln cov G X 1 X B b a displaystyle ln operatorname cov GX 1 X mathrm B alpha beta ln operatorname cov GX 1 X mathrm B beta alpha Mean absolute deviation around the mean Edit Ratio of Mean Abs Dev to Std Dev for Beta distribution with a and b ranging from 0 to 5 Ratio of Mean Abs Dev to Std Dev for Beta distribution with mean 0 m 1 and sample size 0 lt n 10 The mean absolute deviation around the mean for the beta distribution with shape parameters a and b is 6 E X E X 2 a a b b B a b a b a b 1 displaystyle operatorname E X E X frac 2 alpha alpha beta beta mathrm B alpha beta alpha beta alpha beta 1 The mean absolute deviation around the mean is a more robust estimator of statistical dispersion than the standard deviation for beta distributions with tails and inflection points at each side of the mode Beta a b distributions with a b gt 2 as it depends on the linear absolute deviations rather than the square deviations from the mean Therefore the effect of very large deviations from the mean are not as overly weighted Using Stirling s approximation to the Gamma function N L Johnson and S Kotz 1 derived the following approximation for values of the shape parameters greater than unity the relative error for this approximation is only 3 5 for a b 1 and it decreases to zero as a b mean abs dev from mean standard deviation E X E X var X 2 p 1 7 12 a b 1 12 a 1 12 b if a b gt 1 displaystyle begin aligned frac text mean abs dev from mean text standard deviation amp frac operatorname E X E X sqrt operatorname var X amp approx sqrt frac 2 pi left 1 frac 7 12 alpha beta frac 1 12 alpha frac 1 12 beta right text if alpha beta gt 1 end aligned At the limit a b the ratio of the mean absolute deviation to the standard deviation for the beta distribution becomes equal to the ratio of the same measures for the normal distribution 2 p displaystyle sqrt frac 2 pi For a b 1 this ratio equals 3 2 displaystyle frac sqrt 3 2 so that from a b 1 to a b the ratio decreases by 8 5 For a b 0 the standard deviation is exactly equal to the mean absolute deviation around the mean Therefore this ratio decreases by 15 from a b 0 to a b 1 and by 25 from a b 0 to a b However for skewed beta distributions such that a 0 or b 0 the ratio of the standard deviation to the mean absolute deviation approaches infinity although each of them individually approaches zero because the mean absolute deviation approaches zero faster than the standard deviation Using the parametrization in terms of mean m and sample size n a b gt 0 a mn b 1 m none can express the mean absolute deviation around the mean in terms of the mean m and the sample size n as follows E X E X 2 m m n 1 m 1 m n n B m n 1 m n displaystyle operatorname E X E X frac 2 mu mu nu 1 mu 1 mu nu nu mathrm B mu nu 1 mu nu For a symmetric distribution the mean is at the middle of the distribution m 1 2 and therefore E X E X 2 1 n n B n 2 n 2 2 1 n G n n G n 2 2 lim n 0 lim m 1 2 E X E X 1 2 lim n lim m 1 2 E X E X 0 displaystyle begin aligned operatorname E X E X frac 2 1 nu nu mathrm B tfrac nu 2 tfrac nu 2 amp frac 2 1 nu Gamma nu nu Gamma tfrac nu 2 2 lim nu to 0 left lim mu to frac 1 2 operatorname E X E X right amp tfrac 1 2 lim nu to infty left lim mu to frac 1 2 operatorname E X E X right amp 0 end aligned Also the following limits with only the noted variable approaching the limit can be obtained from the above expressions lim b 0 E X E X lim a 0 E X E X 0 lim b E X E X lim a E X E X 0 lim m 0 E X E X lim m 1 E X E X 0 lim n 0 E X E X m 1 m lim n E X E X 0 displaystyle begin aligned lim beta to 0 operatorname E X E X amp lim alpha to 0 operatorname E X E X 0 lim beta to infty operatorname E X E X amp lim alpha to infty operatorname E X E X 0 lim mu to 0 operatorname E X E X amp lim mu to 1 operatorname E X E X 0 lim nu to 0 operatorname E X E X amp sqrt mu 1 mu lim nu to infty operatorname E X E X amp 0 end aligned Mean absolute difference Edit The mean absolute difference for the Beta distribution is M D 0 1 0 1 f x a b f y a b x y d x d y 4 a b B a b a b B a a B b b displaystyle mathrm MD int 0 1 int 0 1 f x alpha beta f y alpha beta x y dx dy left frac 4 alpha beta right frac B alpha beta alpha beta B alpha alpha B beta beta The Gini coefficient for the Beta distribution is half of the relative mean absolute difference G 2 a B a b a b B a a B b b displaystyle mathrm G left frac 2 alpha right frac B alpha beta alpha beta B alpha alpha B beta beta Skewness Edit Skewness for Beta Distribution as a function of variance and mean The skewness the third moment centered on the mean normalized by the 3 2 power of the variance of the beta distribution is 1 g 1 E X m 3 var X 3 2 2 b a a b 1 a b 2 a b displaystyle gamma 1 frac operatorname E X mu 3 operatorname var X 3 2 frac 2 beta alpha sqrt alpha beta 1 alpha beta 2 sqrt alpha beta Letting a b in the above expression one obtains g1 0 showing once again that for a b the distribution is symmetric and hence the skewness is zero Positive skew right tailed for a lt b negative skew left tailed for a gt b Using the parametrization in terms of mean m and sample size n a b a m n where n a b gt 0 b 1 m n where n a b gt 0 displaystyle begin aligned alpha amp mu nu text where nu alpha beta gt 0 beta amp 1 mu nu text where nu alpha beta gt 0 end aligned one can express the skewness in terms of the mean m and the sample size n as follows g 1 E X m 3 var X 3 2 2 1 2 m 1 n 2 n m 1 m displaystyle gamma 1 frac operatorname E X mu 3 operatorname var X 3 2 frac 2 1 2 mu sqrt 1 nu 2 nu sqrt mu 1 mu The skewness can also be expressed just in terms of the variance var and the mean m as follows g 1 E X m 3 var X 3 2 2 1 2 m var m 1 m var if var lt m 1 m displaystyle gamma 1 frac operatorname E X mu 3 operatorname var X 3 2 frac 2 1 2 mu sqrt text var mu 1 mu operatorname var text if operatorname var lt mu 1 mu The accompanying plot of skewness as a function of variance and mean shows that maximum variance 1 4 is coupled with zero skewness and the symmetry condition m 1 2 and that maximum skewness positive or negative infinity occurs when the mean is located at one end or the other so that the mass of the probability distribution is concentrated at the ends minimum variance The following expression for the square of the skewness in terms of the sample size n a b and the variance var is useful for the method of moments estimation of four parameters g 1 2 E X m 3 2 var X 3 4 2 n 2 1 var 4 1 n displaystyle gamma 1 2 frac operatorname E X mu 3 2 operatorname var X 3 frac 4 2 nu 2 bigg frac 1 text var 4 1 nu bigg This expression correctly gives a skewness of zero for a b since in that case see Variance var 1 4 1 n displaystyle operatorname var frac 1 4 1 nu For the symmetric case a b skewness 0 over the whole range and the following limits apply lim a b 0 g 1 lim a b g 1 lim n 0 g 1 lim n g 1 lim m 1 2 g 1 0 displaystyle lim alpha beta to 0 gamma 1 lim alpha beta to infty gamma 1 lim nu to 0 gamma 1 lim nu to infty gamma 1 lim mu to frac 1 2 gamma 1 0 For the asymmetric cases a b the following limits with only the noted variable approaching the limit can be obtained from the above expressions lim a 0 g 1 lim m 0 g 1 lim b 0 g 1 lim m 1 g 1 lim a g 1 2 b lim b 0 lim a g 1 lim b lim a g 1 0 lim b g 1 2 a lim a 0 lim b g 1 lim a lim b g 1 0 lim n 0 g 1 1 2 m m 1 m lim m 0 lim n 0 g 1 lim m 1 lim n 0 g 1 displaystyle begin aligned amp lim alpha to 0 gamma 1 lim mu to 0 gamma 1 infty amp lim beta to 0 gamma 1 lim mu to 1 gamma 1 infty amp lim alpha to infty gamma 1 frac 2 sqrt beta quad lim beta to 0 lim alpha to infty gamma 1 infty quad lim beta to infty lim alpha to infty gamma 1 0 amp lim beta to infty gamma 1 frac 2 sqrt alpha quad lim alpha to 0 lim beta to infty gamma 1 infty quad lim alpha to infty lim beta to infty gamma 1 0 amp lim nu to 0 gamma 1 frac 1 2 mu sqrt mu 1 mu quad lim mu to 0 lim nu to 0 gamma 1 infty quad lim mu to 1 lim nu to 0 gamma 1 infty end aligned Kurtosis Edit Excess Kurtosis for Beta Distribution as a function of variance and mean The beta distribution has been applied in acoustic analysis to assess damage to gears as the kurtosis of the beta distribution has been reported to be a good indicator of the condition of a gear 17 Kurtosis has also been used to distinguish the seismic signal generated by a person s footsteps from other signals As persons or other targets moving on the ground generate continuous signals in the form of seismic waves one can separate different targets based on the seismic waves they generate Kurtosis is sensitive to impulsive signals so it s much more sensitive to the signal generated by human footsteps than other signals generated by vehicles winds noise etc 18 Unfortunately the notation for kurtosis has not been standardized Kenney and Keeping 19 use the symbol g2 for the excess kurtosis but Abramowitz and Stegun 20 use different terminology To prevent confusion 21 between kurtosis the fourth moment centered on the mean normalized by the square of the variance and excess kurtosis when using symbols they will be spelled out as follows 6 7 excess kurtosis kurtosis 3 E X m 4 var X 2 3 6 a 3 a 2 2 b 1 b 2 b 1 2 a b b 2 a b a b 2, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.