fbpx
Wikipedia

Bayes estimator

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

Definition edit

Suppose an unknown parameter   is known to have a prior distribution  . Let   be an estimator of   (based on some measurements x), and let   be a loss function, such as squared error. The Bayes risk of   is defined as  , where the expectation is taken over the probability distribution of  : this defines the risk function as a function of  . An estimator   is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators. Equivalently, the estimator which minimizes the posterior expected loss   for each   also minimizes the Bayes risk and therefore is a Bayes estimator.[1]

If the prior is improper then an estimator which minimizes the posterior expected loss for each   is called a generalized Bayes estimator.[2]

Examples edit

Minimum mean square error estimation edit

The most common risk function used for Bayesian estimation is the mean square error (MSE), also called squared error risk. The MSE is defined by

 

where the expectation is taken over the joint distribution of   and  .

Posterior mean edit

Using the MSE as risk, the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution,[3]

 

This is known as the minimum mean square error (MMSE) estimator.

Bayes estimators for conjugate priors edit

If there is no inherent reason to prefer one prior probability distribution over another, a conjugate prior is sometimes chosen for simplicity. A conjugate prior is defined as a prior distribution belonging to some parametric family, for which the resulting posterior distribution also belongs to the same family. This is an important property, since the Bayes estimator, as well as its statistical properties (variance, confidence interval, etc.), can all be derived from the posterior distribution.

Conjugate priors are especially useful for sequential estimation, where the posterior of the current measurement is used as the prior in the next measurement. In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.

Following are some examples of conjugate priors.

  • If   is Normal,  , and the prior is normal,  , then the posterior is also Normal and the Bayes estimator under MSE is given by
 
  • If   are iid Poisson random variables  , and if the prior is Gamma distributed  , then the posterior is also Gamma distributed, and the Bayes estimator under MSE is given by
 
  • If   are iid uniformly distributed  , and if the prior is Pareto distributed  , then the posterior is also Pareto distributed, and the Bayes estimator under MSE is given by
 

Alternative risk functions edit

Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter. The MSE is the most common risk function in use, primarily due to its simplicity. However, alternative risk functions are also occasionally used. The following are several examples of such alternatives. We denote the posterior generalized distribution function by  .

Posterior median and other quantiles edit

  • A "linear" loss function, with  , which yields the posterior median as the Bayes' estimate:
 
 
  • Another "linear" loss function, which assigns different "weights"   to over or sub estimation. It yields a quantile from the posterior distribution, and is a generalization of the previous loss function:
 
 

Posterior mode edit

  • The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter   are recommended, in order to use the mode as an approximation ( ):
 

Other loss functions can be conceived, although the mean squared error is the most widely used and validated. Other loss functions are used in statistics, particularly in robust statistics.

Generalized Bayes estimators edit

The prior distribution   has thus far been assumed to be a true probability distribution, in that

 

However, occasionally this can be a restrictive requirement. For example, there is no distribution (covering the set, R, of all real numbers) for which every real number is equally likely. Yet, in some sense, such a "distribution" seems like a natural choice for a non-informative prior, i.e., a prior distribution which does not imply a preference for any particular value of the unknown parameter. One can still define a function  , but this would not be a proper probability distribution since it has infinite mass,

 

Such measures  , which are not probability distributions, are referred to as improper priors.

The use of an improper prior means that the Bayes risk is undefined (since the prior is not a probability distribution and we cannot take an expectation under it). As a consequence, it is no longer meaningful to speak of a Bayes estimator that minimizes the Bayes risk. Nevertheless, in many cases, one can define the posterior distribution

 

This is a definition, and not an application of Bayes' theorem, since Bayes' theorem can only be applied when all distributions are proper. However, it is not uncommon for the resulting "posterior" to be a valid probability distribution. In this case, the posterior expected loss

 

is typically well-defined and finite. Recall that, for a proper prior, the Bayes estimator minimizes the posterior expected loss. When the prior is improper, an estimator which minimizes the posterior expected loss is referred to as a generalized Bayes estimator.[2]

Example edit

A typical example is estimation of a location parameter with a loss function of the type  . Here   is a location parameter, i.e.,  .

It is common to use the improper prior   in this case, especially when no other more subjective information is available. This yields

 

so the posterior expected loss

 

The generalized Bayes estimator is the value   that minimizes this expression for a given  . This is equivalent to minimizing

  for a given          (1)

In this case it can be shown that the generalized Bayes estimator has the form  , for some constant  . To see this, let   be the value minimizing (1) when  . Then, given a different value  , we must minimize

         (2)

This is identical to (1), except that   has been replaced by  . Thus, the expression minimizing is given by  , so that the optimal estimator has the form

 

Empirical Bayes estimators edit

A Bayes estimator derived through the empirical Bayes method is called an empirical Bayes estimator. Empirical Bayes methods enable the use of auxiliary empirical data, from observations of related parameters, in the development of a Bayes estimator. This is done under the assumption that the estimated parameters are obtained from a common prior. For example, if independent observations of different parameters are performed, then the estimation performance of a particular parameter can sometimes be improved by using data from other observations.

There are both parametric and non-parametric approaches to empirical Bayes estimation.[4]

Example edit

The following is a simple example of parametric empirical Bayes estimation. Given past observations   having conditional distribution  , one is interested in estimating   based on  . Assume that the  's have a common prior   which depends on unknown parameters. For example, suppose that   is normal with unknown mean   and variance   We can then use the past observations to determine the mean and variance of   in the following way.

First, we estimate the mean   and variance   of the marginal distribution of   using the maximum likelihood approach:

 
 

Next, we use the law of total expectation to compute   and the law of total variance to compute   such that

 
 

where   and   are the moments of the conditional distribution  , which are assumed to be known. In particular, suppose that   and that  ; we then have

 
 

Finally, we obtain the estimated moments of the prior,

 
 

For example, if  , and if we assume a normal prior (which is a conjugate prior in this case), we conclude that  , from which the Bayes estimator of   based on   can be calculated.

Properties edit

Admissibility edit

Bayes rules having finite Bayes risk are typically admissible. The following are some specific examples of admissibility theorems.

  • If a Bayes rule is unique then it is admissible.[5] For example, as stated above, under mean squared error (MSE) the Bayes rule is unique and therefore admissible.
  • If θ belongs to a discrete set, then all Bayes rules are admissible.
  • If θ belongs to a continuous (non-discrete) set, and if the risk function R(θ,δ) is continuous in θ for every δ, then all Bayes rules are admissible.

By contrast, generalized Bayes rules often have undefined Bayes risk in the case of improper priors. These rules are often inadmissible and the verification of their admissibility can be difficult. For example, the generalized Bayes estimator of a location parameter θ based on Gaussian samples (described in the "Generalized Bayes estimator" section above) is inadmissible for  ; this is known as Stein's phenomenon.

Asymptotic efficiency edit

Let θ be an unknown random variable, and suppose that   are iid samples with density  . Let   be a sequence of Bayes estimators of θ based on an increasing number of measurements. We are interested in analyzing the asymptotic performance of this sequence of estimators, i.e., the performance of   for large n.

To this end, it is customary to regard θ as a deterministic parameter whose true value is  . Under specific conditions,[6] for large samples (large values of n), the posterior density of θ is approximately normal. In other words, for large n, the effect of the prior probability on the posterior is negligible. Moreover, if δ is the Bayes estimator under MSE risk, then it is asymptotically unbiased and it converges in distribution to the normal distribution:

 

where I0) is the Fisher information of θ0. It follows that the Bayes estimator δn under MSE is asymptotically efficient.

Another estimator which is asymptotically normal and efficient is the maximum likelihood estimator (MLE). The relations between the maximum likelihood and Bayes estimators can be shown in the following simple example.

Example: estimating p in a binomial distribution edit

Consider the estimator of θ based on binomial sample x~b(θ,n) where θ denotes the probability for success. Assuming θ is distributed according to the conjugate prior, which in this case is the Beta distribution B(a,b), the posterior distribution is known to be B(a+x,b+n-x). Thus, the Bayes estimator under MSE is

 

The MLE in this case is x/n and so we get,

 

The last equation implies that, for n → ∞, the Bayes estimator (in the described problem) is close to the MLE.

On the other hand, when n is small, the prior information is still relevant to the decision problem and affects the estimate. To see the relative weight of the prior information, assume that a=b; in this case each measurement brings in 1 new bit of information; the formula above shows that the prior information has the same weight as a+b bits of the new information. In applications, one often knows very little about fine details of the prior distribution; in particular, there is no reason to assume that it coincides with B(a,b) exactly. In such a case, one possible interpretation of this calculation is: "there is a non-pathological prior distribution with the mean value 0.5 and the standard deviation d which gives the weight of prior information equal to 1/(4d2)-1 bits of new information."

Another example of the same phenomena is the case when the prior estimate and a measurement are normally distributed. If the prior is centered at B with deviation Σ, and the measurement is centered at b with deviation σ, then the posterior is centered at  , with weights in this weighted average being α=σ², β=Σ². Moreover, the squared posterior deviation is Σ²+σ². In other words, the prior is combined with the measurement in exactly the same way as if it were an extra measurement to take into account.

For example, if Σ=σ/2, then the deviation of 4 measurements combined matches the deviation of the prior (assuming that errors of measurements are independent). And the weights α,β in the formula for posterior match this: the weight of the prior is 4 times the weight of the measurement. Combining this prior with n measurements with average v results in the posterior centered at  ; in particular, the prior plays the same role as 4 measurements made in advance. In general, the prior has the weight of (σ/Σ)² measurements.

Compare to the example of binomial distribution: there the prior has the weight of (σ/Σ)²−1 measurements. One can see that the exact weight does depend on the details of the distribution, but when σ≫Σ, the difference becomes small.

Practical example of Bayes estimators edit

The Internet Movie Database uses a formula for calculating and comparing the ratings of films by its users, including their Top Rated 250 Titles which is claimed to give "a true Bayesian estimate".[7] The following Bayesian formula was initially used to calculate a weighted average score for the Top 250, though the formula has since changed:

 

where:

  = weighted rating
  = average rating for the movie as a number from 1 to 10 (mean) = (Rating)
  = number of votes/ratings for the movie = (votes)
  = weight given to the prior estimate (in this case, the number of votes IMDB deemed necessary for average rating to approach statistical validity)
  = the mean vote across the whole pool (currently 7.0)

Note that W is just the weighted arithmetic mean of R and C with weight vector (v, m). As the number of ratings surpasses m, the confidence of the average rating surpasses the confidence of the mean vote for all films (C), and the weighted bayesian rating (W) approaches a straight average (R). The closer v (the number of ratings for the film) is to zero, the closer W is to C, where W is the weighted rating and C is the average rating of all films. So, in simpler terms, the fewer ratings/votes cast for a film, the more that film's Weighted Rating will skew towards the average across all films, while films with many ratings/votes will have a rating approaching its pure arithmetic average rating.

IMDb's approach ensures that a film with only a few ratings, all at 10, would not rank above "the Godfather", for example, with a 9.2 average from over 500,000 ratings.

See also edit

Notes edit

  1. ^ Lehmann and Casella, Theorem 4.1.1
  2. ^ a b Lehmann and Casella, Definition 4.2.9
  3. ^ Jaynes, E.T. (2007). Probability Theory: The Logic of Science (5. print. ed.). Cambridge [u.a.]: Cambridge Univ. Press. p. 172. ISBN 978-0-521-59271-0.
  4. ^ Berger (1980), section 4.5.
  5. ^ Lehmann and Casella (1998), Theorem 5.2.4.
  6. ^ Lehmann and Casella (1998), section 6.8
  7. ^ IMDb Top 250

References edit

External links edit

bayes, estimator, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, november, 2009, learn, when, remove, this, message, estimati. This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations November 2009 Learn how and when to remove this message In estimation theory and decision theory a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function i e the posterior expected loss Equivalently it maximizes the posterior expectation of a utility function An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation Contents 1 Definition 2 Examples 2 1 Minimum mean square error estimation 2 1 1 Posterior mean 2 2 Bayes estimators for conjugate priors 2 3 Alternative risk functions 2 3 1 Posterior median and other quantiles 2 3 2 Posterior mode 3 Generalized Bayes estimators 3 1 Example 4 Empirical Bayes estimators 4 1 Example 5 Properties 5 1 Admissibility 5 2 Asymptotic efficiency 5 2 1 Example estimating p in a binomial distribution 6 Practical example of Bayes estimators 7 See also 8 Notes 9 References 10 External linksDefinition editSuppose an unknown parameter 8 displaystyle theta nbsp is known to have a prior distribution p displaystyle pi nbsp Let 8 8 x displaystyle widehat theta widehat theta x nbsp be an estimator of 8 displaystyle theta nbsp based on some measurements x and let L 8 8 displaystyle L theta widehat theta nbsp be a loss function such as squared error The Bayes risk of 8 displaystyle widehat theta nbsp is defined as E p L 8 8 displaystyle E pi L theta widehat theta nbsp where the expectation is taken over the probability distribution of 8 displaystyle theta nbsp this defines the risk function as a function of 8 displaystyle widehat theta nbsp An estimator 8 displaystyle widehat theta nbsp is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators Equivalently the estimator which minimizes the posterior expected loss E L 8 8 x displaystyle E L theta widehat theta x nbsp for each x displaystyle x nbsp also minimizes the Bayes risk and therefore is a Bayes estimator 1 If the prior is improper then an estimator which minimizes the posterior expected loss for each x displaystyle x nbsp is called a generalized Bayes estimator 2 Examples editMinimum mean square error estimation edit Main article Minimum mean square error The most common risk function used for Bayesian estimation is the mean square error MSE also called squared error risk The MSE is defined by M S E E 8 x 8 2 displaystyle mathrm MSE E left widehat theta x theta 2 right nbsp where the expectation is taken over the joint distribution of 8 displaystyle theta nbsp and x displaystyle x nbsp Posterior mean edit Using the MSE as risk the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution 3 8 x E 8 x 8 p 8 x d 8 displaystyle widehat theta x E theta x int theta p theta x d theta nbsp This is known as the minimum mean square error MMSE estimator Bayes estimators for conjugate priors edit Main article Conjugate prior If there is no inherent reason to prefer one prior probability distribution over another a conjugate prior is sometimes chosen for simplicity A conjugate prior is defined as a prior distribution belonging to some parametric family for which the resulting posterior distribution also belongs to the same family This is an important property since the Bayes estimator as well as its statistical properties variance confidence interval etc can all be derived from the posterior distribution Conjugate priors are especially useful for sequential estimation where the posterior of the current measurement is used as the prior in the next measurement In sequential estimation unless a conjugate prior is used the posterior distribution typically becomes more complex with each added measurement and the Bayes estimator cannot usually be calculated without resorting to numerical methods Following are some examples of conjugate priors If x 8 displaystyle x theta nbsp is Normal x 8 N 8 s 2 displaystyle x theta sim N theta sigma 2 nbsp and the prior is normal 8 N m t 2 displaystyle theta sim N mu tau 2 nbsp then the posterior is also Normal and the Bayes estimator under MSE is given by 8 x s 2 s 2 t 2 m t 2 s 2 t 2 x displaystyle widehat theta x frac sigma 2 sigma 2 tau 2 mu frac tau 2 sigma 2 tau 2 x nbsp If x 1 x n displaystyle x 1 x n nbsp are iid Poisson random variables x i 8 P 8 displaystyle x i theta sim P theta nbsp and if the prior is Gamma distributed 8 G a b displaystyle theta sim G a b nbsp then the posterior is also Gamma distributed and the Bayes estimator under MSE is given by 8 X n X a n b displaystyle widehat theta X frac n overline X a n b nbsp If x 1 x n displaystyle x 1 x n nbsp are iid uniformly distributed x i 8 U 0 8 displaystyle x i theta sim U 0 theta nbsp and if the prior is Pareto distributed 8 P a 8 0 a displaystyle theta sim Pa theta 0 a nbsp then the posterior is also Pareto distributed and the Bayes estimator under MSE is given by 8 X a n max 8 0 x 1 x n a n 1 displaystyle widehat theta X frac a n max theta 0 x 1 x n a n 1 nbsp Alternative risk functions edit Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter The MSE is the most common risk function in use primarily due to its simplicity However alternative risk functions are also occasionally used The following are several examples of such alternatives We denote the posterior generalized distribution function by F displaystyle F nbsp Posterior median and other quantiles edit Main article Bias of an estimator Median unbiased estimators A linear loss function with a gt 0 displaystyle a gt 0 nbsp which yields the posterior median as the Bayes estimate L 8 8 a 8 8 displaystyle L theta widehat theta a theta widehat theta nbsp F 8 x X 1 2 displaystyle F widehat theta x X tfrac 1 2 nbsp Another linear loss function which assigns different weights a b gt 0 displaystyle a b gt 0 nbsp to over or sub estimation It yields a quantile from the posterior distribution and is a generalization of the previous loss function L 8 8 a 8 8 for 8 8 0 b 8 8 for 8 8 lt 0 displaystyle L theta widehat theta begin cases a theta widehat theta amp mbox for theta widehat theta geq 0 b theta widehat theta amp mbox for theta widehat theta lt 0 end cases nbsp F 8 x X a a b displaystyle F widehat theta x X frac a a b nbsp Posterior mode edit The following loss function is trickier it yields either the posterior mode or a point close to it depending on the curvature and properties of the posterior distribution Small values of the parameter K gt 0 displaystyle K gt 0 nbsp are recommended in order to use the mode as an approximation L gt 0 displaystyle L gt 0 nbsp L 8 8 0 for 8 8 lt K L for 8 8 K displaystyle L theta widehat theta begin cases 0 amp mbox for theta widehat theta lt K L amp mbox for theta widehat theta geq K end cases nbsp Other loss functions can be conceived although the mean squared error is the most widely used and validated Other loss functions are used in statistics particularly in robust statistics Generalized Bayes estimators editSee also Admissible decision rule Bayes rules and generalized Bayes rules The prior distribution p displaystyle p nbsp has thus far been assumed to be a true probability distribution in that p 8 d 8 1 displaystyle int p theta d theta 1 nbsp However occasionally this can be a restrictive requirement For example there is no distribution covering the set R of all real numbers for which every real number is equally likely Yet in some sense such a distribution seems like a natural choice for a non informative prior i e a prior distribution which does not imply a preference for any particular value of the unknown parameter One can still define a function p 8 1 displaystyle p theta 1 nbsp but this would not be a proper probability distribution since it has infinite mass p 8 d 8 displaystyle int p theta d theta infty nbsp Such measures p 8 displaystyle p theta nbsp which are not probability distributions are referred to as improper priors The use of an improper prior means that the Bayes risk is undefined since the prior is not a probability distribution and we cannot take an expectation under it As a consequence it is no longer meaningful to speak of a Bayes estimator that minimizes the Bayes risk Nevertheless in many cases one can define the posterior distribution p 8 x p x 8 p 8 p x 8 p 8 d 8 displaystyle p theta x frac p x theta p theta int p x theta p theta d theta nbsp This is a definition and not an application of Bayes theorem since Bayes theorem can only be applied when all distributions are proper However it is not uncommon for the resulting posterior to be a valid probability distribution In this case the posterior expected loss L 8 a p 8 x d 8 displaystyle int L theta a p theta x d theta nbsp is typically well defined and finite Recall that for a proper prior the Bayes estimator minimizes the posterior expected loss When the prior is improper an estimator which minimizes the posterior expected loss is referred to as a generalized Bayes estimator 2 Example edit A typical example is estimation of a location parameter with a loss function of the type L a 8 displaystyle L a theta nbsp Here 8 displaystyle theta nbsp is a location parameter i e p x 8 f x 8 displaystyle p x theta f x theta nbsp It is common to use the improper prior p 8 1 displaystyle p theta 1 nbsp in this case especially when no other more subjective information is available This yields p 8 x p x 8 p 8 p x f x 8 p x displaystyle p theta x frac p x theta p theta p x frac f x theta p x nbsp so the posterior expected loss E L a 8 x L a 8 p 8 x d 8 1 p x L a 8 f x 8 d 8 displaystyle E L a theta x int L a theta p theta x d theta frac 1 p x int L a theta f x theta d theta nbsp The generalized Bayes estimator is the value a x displaystyle a x nbsp that minimizes this expression for a given x displaystyle x nbsp This is equivalent to minimizing L a 8 f x 8 d 8 displaystyle int L a theta f x theta d theta nbsp for a given x displaystyle x nbsp 1 In this case it can be shown that the generalized Bayes estimator has the form x a 0 displaystyle x a 0 nbsp for some constant a 0 displaystyle a 0 nbsp To see this let a 0 displaystyle a 0 nbsp be the value minimizing 1 when x 0 displaystyle x 0 nbsp Then given a different value x 1 displaystyle x 1 nbsp we must minimize L a 8 f x 1 8 d 8 L a x 1 8 f 8 d 8 displaystyle int L a theta f x 1 theta d theta int L a x 1 theta f theta d theta nbsp 2 This is identical to 1 except that a displaystyle a nbsp has been replaced by a x 1 displaystyle a x 1 nbsp Thus the expression minimizing is given by a x 1 a 0 displaystyle a x 1 a 0 nbsp so that the optimal estimator has the form a x a 0 x displaystyle a x a 0 x nbsp Empirical Bayes estimators editMain article Empirical Bayes method A Bayes estimator derived through the empirical Bayes method is called an empirical Bayes estimator Empirical Bayes methods enable the use of auxiliary empirical data from observations of related parameters in the development of a Bayes estimator This is done under the assumption that the estimated parameters are obtained from a common prior For example if independent observations of different parameters are performed then the estimation performance of a particular parameter can sometimes be improved by using data from other observations There are both parametric and non parametric approaches to empirical Bayes estimation 4 Example edit The following is a simple example of parametric empirical Bayes estimation Given past observations x 1 x n displaystyle x 1 ldots x n nbsp having conditional distribution f x i 8 i displaystyle f x i theta i nbsp one is interested in estimating 8 n 1 displaystyle theta n 1 nbsp based on x n 1 displaystyle x n 1 nbsp Assume that the 8 i displaystyle theta i nbsp s have a common prior p displaystyle pi nbsp which depends on unknown parameters For example suppose that p displaystyle pi nbsp is normal with unknown mean m p displaystyle mu pi nbsp and variance s p displaystyle sigma pi nbsp We can then use the past observations to determine the mean and variance of p displaystyle pi nbsp in the following way First we estimate the mean m m displaystyle mu m nbsp and variance s m displaystyle sigma m nbsp of the marginal distribution of x 1 x n displaystyle x 1 ldots x n nbsp using the maximum likelihood approach m m 1 n x i displaystyle widehat mu m frac 1 n sum x i nbsp s m 2 1 n x i m m 2 displaystyle widehat sigma m 2 frac 1 n sum x i widehat mu m 2 nbsp Next we use the law of total expectation to compute m m displaystyle mu m nbsp and the law of total variance to compute s m 2 displaystyle sigma m 2 nbsp such that m m E p m f 8 displaystyle mu m E pi mu f theta nbsp s m 2 E p s f 2 8 E p m f 8 m m 2 displaystyle sigma m 2 E pi sigma f 2 theta E pi mu f theta mu m 2 nbsp where m f 8 displaystyle mu f theta nbsp and s f 8 displaystyle sigma f theta nbsp are the moments of the conditional distribution f x i 8 i displaystyle f x i theta i nbsp which are assumed to be known In particular suppose that m f 8 8 displaystyle mu f theta theta nbsp and that s f 2 8 K displaystyle sigma f 2 theta K nbsp we then have m p m m displaystyle mu pi mu m nbsp s p 2 s m 2 s f 2 s m 2 K displaystyle sigma pi 2 sigma m 2 sigma f 2 sigma m 2 K nbsp Finally we obtain the estimated moments of the prior m p m m displaystyle widehat mu pi widehat mu m nbsp s p 2 s m 2 K displaystyle widehat sigma pi 2 widehat sigma m 2 K nbsp For example if x i 8 i N 8 i 1 displaystyle x i theta i sim N theta i 1 nbsp and if we assume a normal prior which is a conjugate prior in this case we conclude that 8 n 1 N m p s p 2 displaystyle theta n 1 sim N widehat mu pi widehat sigma pi 2 nbsp from which the Bayes estimator of 8 n 1 displaystyle theta n 1 nbsp based on x n 1 displaystyle x n 1 nbsp can be calculated Properties editAdmissibility edit See also Admissible decision rule Bayes rules having finite Bayes risk are typically admissible The following are some specific examples of admissibility theorems If a Bayes rule is unique then it is admissible 5 For example as stated above under mean squared error MSE the Bayes rule is unique and therefore admissible If 8 belongs to a discrete set then all Bayes rules are admissible If 8 belongs to a continuous non discrete set and if the risk function R 8 d is continuous in 8 for every d then all Bayes rules are admissible By contrast generalized Bayes rules often have undefined Bayes risk in the case of improper priors These rules are often inadmissible and the verification of their admissibility can be difficult For example the generalized Bayes estimator of a location parameter 8 based on Gaussian samples described in the Generalized Bayes estimator section above is inadmissible for p gt 2 displaystyle p gt 2 nbsp this is known as Stein s phenomenon Asymptotic efficiency edit Let 8 be an unknown random variable and suppose that x 1 x 2 displaystyle x 1 x 2 ldots nbsp are iid samples with density f x i 8 displaystyle f x i theta nbsp Let d n d n x 1 x n displaystyle delta n delta n x 1 ldots x n nbsp be a sequence of Bayes estimators of 8 based on an increasing number of measurements We are interested in analyzing the asymptotic performance of this sequence of estimators i e the performance of d n displaystyle delta n nbsp for large n To this end it is customary to regard 8 as a deterministic parameter whose true value is 8 0 displaystyle theta 0 nbsp Under specific conditions 6 for large samples large values of n the posterior density of 8 is approximately normal In other words for large n the effect of the prior probability on the posterior is negligible Moreover if d is the Bayes estimator under MSE risk then it is asymptotically unbiased and it converges in distribution to the normal distribution n d n 8 0 N 0 1 I 8 0 displaystyle sqrt n delta n theta 0 to N left 0 frac 1 I theta 0 right nbsp where I 80 is the Fisher information of 80 It follows that the Bayes estimator dn under MSE is asymptotically efficient Another estimator which is asymptotically normal and efficient is the maximum likelihood estimator MLE The relations between the maximum likelihood and Bayes estimators can be shown in the following simple example Example estimating p in a binomial distribution edit Consider the estimator of 8 based on binomial sample x b 8 n where 8 denotes the probability for success Assuming 8 is distributed according to the conjugate prior which in this case is the Beta distribution B a b the posterior distribution is known to be B a x b n x Thus the Bayes estimator under MSE is d n x E 8 x a x a b n displaystyle delta n x E theta x frac a x a b n nbsp The MLE in this case is x n and so we get d n x a b a b n E 8 n a b n d M L E displaystyle delta n x frac a b a b n E theta frac n a b n delta MLE nbsp The last equation implies that for n the Bayes estimator in the described problem is close to the MLE On the other hand when n is small the prior information is still relevant to the decision problem and affects the estimate To see the relative weight of the prior information assume that a b in this case each measurement brings in 1 new bit of information the formula above shows that the prior information has the same weight as a b bits of the new information In applications one often knows very little about fine details of the prior distribution in particular there is no reason to assume that it coincides with B a b exactly In such a case one possible interpretation of this calculation is there is a non pathological prior distribution with the mean value 0 5 and the standard deviation d which gives the weight of prior information equal to 1 4d2 1 bits of new information Another example of the same phenomena is the case when the prior estimate and a measurement are normally distributed If the prior is centered at B with deviation S and the measurement is centered at b with deviation s then the posterior is centered at a a b B b a b b displaystyle frac alpha alpha beta B frac beta alpha beta b nbsp with weights in this weighted average being a s b S Moreover the squared posterior deviation is S s In other words the prior is combined with the measurement in exactly the same way as if it were an extra measurement to take into account For example if S s 2 then the deviation of 4 measurements combined matches the deviation of the prior assuming that errors of measurements are independent And the weights a b in the formula for posterior match this the weight of the prior is 4 times the weight of the measurement Combining this prior with n measurements with average v results in the posterior centered at 4 4 n V n 4 n v displaystyle frac 4 4 n V frac n 4 n v nbsp in particular the prior plays the same role as 4 measurements made in advance In general the prior has the weight of s S measurements Compare to the example of binomial distribution there the prior has the weight of s S 1 measurements One can see that the exact weight does depend on the details of the distribution but when s S the difference becomes small Practical example of Bayes estimators editThe Internet Movie Database uses a formula for calculating and comparing the ratings of films by its users including their Top Rated 250 Titles which is claimed to give a true Bayesian estimate 7 The following Bayesian formula was initially used to calculate a weighted average score for the Top 250 though the formula has since changed W R v C m v m displaystyle W Rv Cm over v m nbsp where W displaystyle W nbsp weighted rating R displaystyle R nbsp average rating for the movie as a number from 1 to 10 mean Rating v displaystyle v nbsp number of votes ratings for the movie votes m displaystyle m nbsp weight given to the prior estimate in this case the number of votes IMDB deemed necessary for average rating to approach statistical validity C displaystyle C nbsp the mean vote across the whole pool currently 7 0 Note that W is just the weighted arithmetic mean of R and C with weight vector v m As the number of ratings surpasses m the confidence of the average rating surpasses the confidence of the mean vote for all films C and the weighted bayesian rating W approaches a straight average R The closer v the number of ratings for the film is to zero the closer W is to C where W is the weighted rating and C is the average rating of all films So in simpler terms the fewer ratings votes cast for a film the more that film s Weighted Rating will skew towards the average across all films while films with many ratings votes will have a rating approaching its pure arithmetic average rating IMDb s approach ensures that a film with only a few ratings all at 10 would not rank above the Godfather for example with a 9 2 average from over 500 000 ratings See also editRecursive Bayesian estimation Generalized expected utilityNotes edit Lehmann and Casella Theorem 4 1 1 a b Lehmann and Casella Definition 4 2 9 Jaynes E T 2007 Probability Theory The Logic of Science 5 print ed Cambridge u a Cambridge Univ Press p 172 ISBN 978 0 521 59271 0 Berger 1980 section 4 5 Lehmann and Casella 1998 Theorem 5 2 4 Lehmann and Casella 1998 section 6 8 IMDb Top 250References editLehmann E L Casella G 1998 Theory of Point Estimation 2nd ed Springer ISBN 0 387 98502 6 Berger James O 1985 Statistical decision theory and Bayesian Analysis 2nd ed New York Springer Verlag ISBN 0 387 96098 8 MR 0804611 External links edit Bayesian estimator Encyclopedia of Mathematics EMS Press 2001 1994 Retrieved from https en wikipedia org w index php title Bayes estimator amp oldid 1197033912, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.