fbpx
Wikipedia

Jeffreys prior

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys,[1] is a non-informative prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

It has the key feature that it is invariant under a change of coordinates for the parameter vector . That is, the relative probability assigned to a volume of a probability space using a Jeffreys prior will be the same regardless of the parameterization used to define the Jeffreys prior. This makes it of special interest for use with scale parameters.[2] As a concrete example, a Bernoulli distribution can be parametrized by the probability of occurrence p, or by the odds ratio. A naive uniform prior in this case is not invariant to this reparametrization, but the Jeffreys prior is.

In maximum likelihood estimation of exponential family models, penalty terms based on the Jeffreys prior were shown to reduce asymptotic bias in point estimates.[3][4]

Reparameterization edit

One-parameter case edit

If   and   are two possible parametrizations of a statistical model, and   is a continuously differentiable function of  , we say that the prior   is "invariant" under a reparametrization if

 

that is, if the priors   and   are related by the usual change of variables theorem.

Since the Fisher information transforms under reparametrization as

 

defining the priors as   and   gives us the desired "invariance".[5]

Multiple-parameter case edit

Analogous to the one-parameter case, let   and   be two possible parametrizations of a statistical model, with   a continuously differentiable function of  . We call the prior   "invariant" under reparametrization if

 

where   is the Jacobian matrix with entries

 

Since the Fisher information matrix transforms under reparametrization as

 

we have that

 

and thus defining the priors as   and   gives us the desired "invariance".

Attributes edit

From a practical and mathematical standpoint, a valid reason to use this non-informative prior instead of others, like the ones obtained through a limit in conjugate families of distributions, is that the relative probability of a volume of the probability space is not dependent upon the set of parameter variables that is chosen to describe parameter space.

Sometimes the Jeffreys prior cannot be normalized, and is thus an improper prior. For example, the Jeffreys prior for the distribution mean is uniform over the entire real line in the case of a Gaussian distribution of known variance.

Use of the Jeffreys prior violates the strong version of the likelihood principle, which is accepted by many, but by no means all, statisticians. When using the Jeffreys prior, inferences about   depend not just on the probability of the observed data as a function of  , but also on the universe of all possible experimental outcomes, as determined by the experimental design, because the Fisher information is computed from an expectation over the chosen universe. Accordingly, the Jeffreys prior, and hence the inferences made using it, may be different for two experiments involving the same   parameter even when the likelihood functions for the two experiments are the same—a violation of the strong likelihood principle.

Minimum description length edit

In the minimum description length approach to statistics the goal is to describe data as compactly as possible where the length of a description is measured in bits of the code used. For a parametric family of distributions one compares a code with the best code based on one of the distributions in the parameterized family. The main result is that in exponential families, asymptotically for large sample size, the code based on the distribution that is a mixture of the elements in the exponential family with the Jeffreys prior is optimal. This result holds if one restricts the parameter set to a compact subset in the interior of the full parameter space[citation needed]. If the full parameter is used a modified version of the result should be used.

Examples edit

The Jeffreys prior for a parameter (or a set of parameters) depends upon the statistical model.

Gaussian distribution with mean parameter edit

For the Gaussian distribution of the real value  

 

with   fixed, the Jeffreys prior for the mean   is

 

That is, the Jeffreys prior for   does not depend upon  ; it is the unnormalized uniform distribution on the real line — the distribution that is 1 (or some other fixed constant) for all points. This is an improper prior, and is, up to the choice of constant, the unique translation-invariant distribution on the reals (the Haar measure with respect to addition of reals), corresponding to the mean being a measure of location and translation-invariance corresponding to no information about location.

Gaussian distribution with standard deviation parameter edit

For the Gaussian distribution of the real value  

 

with   fixed, the Jeffreys prior for the standard deviation   is

 

Equivalently, the Jeffreys prior for   is the unnormalized uniform distribution on the real line, and thus this distribution is also known as the logarithmic prior. Similarly, the Jeffreys prior for   is also uniform. It is the unique (up to a multiple) prior (on the positive reals) that is scale-invariant (the Haar measure with respect to multiplication of positive reals), corresponding to the standard deviation being a measure of scale and scale-invariance corresponding to no information about scale. As with the uniform distribution on the reals, it is an improper prior.

Poisson distribution with rate parameter edit

For the Poisson distribution of the non-negative integer  ,

 

the Jeffreys prior for the rate parameter   is

 

Equivalently, the Jeffreys prior for   is the unnormalized uniform distribution on the non-negative real line.

Bernoulli trial edit

For a coin that is "heads" with probability   and is "tails" with probability  , for a given   the probability is  . The Jeffreys prior for the parameter   is

 

This is the arcsine distribution and is a beta distribution with  . Furthermore, if   then

 

That is, the Jeffreys prior for   is uniform in the interval  . Equivalently,   is uniform on the whole circle  .

N-sided die with biased probabilities edit

Similarly, for a throw of an  -sided die with outcome probabilities  , each non-negative and satisfying  , the Jeffreys prior for   is the Dirichlet distribution with all (alpha) parameters set to one half. This amounts to using a pseudocount of one half for each possible outcome.

Equivalently, if we write   for each  , then the Jeffreys prior for   is uniform on the (N − 1)-dimensional unit sphere (i.e., it is uniform on the surface of an N-dimensional unit ball).

Generalizations edit

Probability-matching prior edit

In 1963, Welch and Peers showed that for a scalar parameter θ the Jeffreys prior is "probability-matching" in the sense that posterior predictive probabilities agree with frequentist probabilities and credible intervals of a chosen width coincide with frequentist confidence intervals.[6] In a follow-up, Peers showed that this was not true for the multi-parameter case,[7] instead leading to the notion of probability-matching priors with are only implicitly defined as the probability distribution solving a certain partial differential equation involving the Fisher information.[8]

α-parallel prior edit

Using tools from information geometry, the Jeffreys prior can be generalized in pursuit of obtaining priors that encode geometric information of the statistical model, so as to be invariant under a change of the coordinate of parameters.[9] A special case, the so-called Weyl prior, is defined as a volume form on a Weyl manifold.[10]  

References edit

  1. ^ Jeffreys H (1946). "An invariant form for the prior probability in estimation problems". Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences. 186 (1007): 453–461. Bibcode:1946RSPSA.186..453J. doi:10.1098/rspa.1946.0056. JSTOR 97883. PMID 20998741.
  2. ^ Jaynes ET (September 1968). "Prior probabilities" (PDF). IEEE Transactions on Systems Science and Cybernetics. 4 (3): 227–241. doi:10.1109/TSSC.1968.300117.
  3. ^ Firth, David (1992). "Bias reduction, the Jeffreys prior and GLIM". In Fahrmeir, Ludwig; Francis, Brian; Gilchrist, Robert; Tutz, Gerhard (eds.). Advances in GLIM and Statistical Modelling. New York: Springer. pp. 91–100. doi:10.1007/978-1-4612-2952-0_15. ISBN 0-387-97873-9.
  4. ^ Magis, David (2015). "A Note on Weighted Likelihood and Jeffreys Modal Estimation of Proficiency Levels in Polytomous Item Response Models". Psychometrika. 80: 200–204. doi:10.1007/s11336-013-9378-5.
  5. ^ Robert CP, Chopin N, Rousseau J (2009). "Harold Jeffreys's Theory of Probability Revisited". Statistical Science. 24 (2). arXiv:0804.3173. doi:10.1214/09-STS284.
  6. ^ Welch, B. L.; Peers, H. W. (1963). "On Formulae for Confidence Points Based on Integrals of Weighted Likelihoods". Journal of the Royal Statistical Society. Series B (Methodological). 25 (2): 318–329. doi:10.1111/j.2517-6161.1963.tb00512.x.
  7. ^ Peers, H. W. (1965). "On Confidence Points and Bayesian Probability Points in the Case of Several Parameters". Journal of the Royal Statistical Society. Series B (Methodological). 27 (1): 9–16. doi:10.1111/j.2517-6161.1965.tb00581.x.
  8. ^ Scricciolo, Catia (1999). "Probability matching priors: a review". Journal of the Italian Statistical Society. 8. 83. doi:10.1007/BF03178943.
  9. ^ Takeuchi, J.; Amari, S. (2005). "α-parallel prior and its properties". IEEE Transactions on Information Theory. 51 (3): 1011–1023. doi:10.1109/TIT.2004.842703.
  10. ^ Jiang, Ruichao; Tavakoli, Javad; Zhao, Yiqiang (2020). "Weyl Prior and Bayesian Statistics". Entropy. 22 (4). 467. doi:10.3390/e22040467. PMC 7516948.

Further reading edit

  • Kass RE, Wasserman L (1996). "The Selection of Prior Distributions by Formal Rules". Journal of the American Statistical Association. 91 (435): 1343–1370. doi:10.1080/01621459.1996.10477003.
  • Lee, Peter M. (2012). "Jeffreys' rule". Bayesian Statistics: An Introduction (4th ed.). Wiley. pp. 96–102. ISBN 978-1-118-33257-3.

jeffreys, prior, bayesian, probability, named, after, harold, jeffreys, informative, prior, distribution, parameter, space, density, function, proportional, square, root, determinant, fisher, information, matrix, displaystyle, left, theta, right, propto, sqrt,. In Bayesian probability the Jeffreys prior named after Sir Harold Jeffreys 1 is a non informative prior distribution for a parameter space its density function is proportional to the square root of the determinant of the Fisher information matrix p 8 det I 8 displaystyle p left vec theta right propto sqrt det mathcal I left vec theta right It has the key feature that it is invariant under a change of coordinates for the parameter vector 8 displaystyle vec theta That is the relative probability assigned to a volume of a probability space using a Jeffreys prior will be the same regardless of the parameterization used to define the Jeffreys prior This makes it of special interest for use with scale parameters 2 As a concrete example a Bernoulli distribution can be parametrized by the probability of occurrence p or by the odds ratio A naive uniform prior in this case is not invariant to this reparametrization but the Jeffreys prior is In maximum likelihood estimation of exponential family models penalty terms based on the Jeffreys prior were shown to reduce asymptotic bias in point estimates 3 4 Contents 1 Reparameterization 1 1 One parameter case 1 2 Multiple parameter case 2 Attributes 3 Minimum description length 4 Examples 4 1 Gaussian distribution with mean parameter 4 2 Gaussian distribution with standard deviation parameter 4 3 Poisson distribution with rate parameter 4 4 Bernoulli trial 4 5 N sided die with biased probabilities 5 Generalizations 5 1 Probability matching prior 5 2 a parallel prior 6 References 7 Further readingReparameterization editOne parameter case edit If 8 displaystyle theta nbsp and f displaystyle varphi nbsp are two possible parametrizations of a statistical model and 8 displaystyle theta nbsp is a continuously differentiable function of f displaystyle varphi nbsp we say that the prior p 8 8 displaystyle p theta theta nbsp is invariant under a reparametrization if p f f p 8 8 d 8 d f displaystyle p varphi varphi p theta theta left frac d theta d varphi right nbsp that is if the priors p 8 8 displaystyle p theta theta nbsp and p f f displaystyle p varphi varphi nbsp are related by the usual change of variables theorem Since the Fisher information transforms under reparametrization as I f f I 8 8 d 8 d f 2 displaystyle I varphi varphi I theta theta left frac d theta d varphi right 2 nbsp defining the priors as p f f I f f displaystyle p varphi varphi propto sqrt I varphi varphi nbsp and p 8 8 I 8 8 displaystyle p theta theta propto sqrt I theta theta nbsp gives us the desired invariance 5 Multiple parameter case edit Analogous to the one parameter case let 8 displaystyle vec theta nbsp and f displaystyle vec varphi nbsp be two possible parametrizations of a statistical model with 8 displaystyle vec theta nbsp a continuously differentiable function of f displaystyle vec varphi nbsp We call the prior p 8 8 displaystyle p theta vec theta nbsp invariant under reparametrization if p f f p 8 8 det J displaystyle p varphi vec varphi p theta vec theta det J nbsp where J displaystyle J nbsp is the Jacobian matrix with entries J i j 8 i f j displaystyle J ij frac partial theta i partial varphi j nbsp Since the Fisher information matrix transforms under reparametrization as I f f J T I 8 8 J displaystyle I varphi vec varphi J T I theta vec theta J nbsp we have that det I f f det I 8 8 det J 2 displaystyle det I varphi varphi det I theta theta det J 2 nbsp and thus defining the priors as p f f det I f f displaystyle p varphi vec varphi propto sqrt det I varphi vec varphi nbsp and p 8 8 det I 8 8 displaystyle p theta vec theta propto sqrt det I theta vec theta nbsp gives us the desired invariance Attributes editFrom a practical and mathematical standpoint a valid reason to use this non informative prior instead of others like the ones obtained through a limit in conjugate families of distributions is that the relative probability of a volume of the probability space is not dependent upon the set of parameter variables that is chosen to describe parameter space Sometimes the Jeffreys prior cannot be normalized and is thus an improper prior For example the Jeffreys prior for the distribution mean is uniform over the entire real line in the case of a Gaussian distribution of known variance Use of the Jeffreys prior violates the strong version of the likelihood principle which is accepted by many but by no means all statisticians When using the Jeffreys prior inferences about 8 displaystyle vec theta nbsp depend not just on the probability of the observed data as a function of 8 displaystyle vec theta nbsp but also on the universe of all possible experimental outcomes as determined by the experimental design because the Fisher information is computed from an expectation over the chosen universe Accordingly the Jeffreys prior and hence the inferences made using it may be different for two experiments involving the same 8 displaystyle vec theta nbsp parameter even when the likelihood functions for the two experiments are the same a violation of the strong likelihood principle Minimum description length editIn the minimum description length approach to statistics the goal is to describe data as compactly as possible where the length of a description is measured in bits of the code used For a parametric family of distributions one compares a code with the best code based on one of the distributions in the parameterized family The main result is that in exponential families asymptotically for large sample size the code based on the distribution that is a mixture of the elements in the exponential family with the Jeffreys prior is optimal This result holds if one restricts the parameter set to a compact subset in the interior of the full parameter space citation needed If the full parameter is used a modified version of the result should be used Examples editThe Jeffreys prior for a parameter or a set of parameters depends upon the statistical model Gaussian distribution with mean parameter edit For the Gaussian distribution of the real value x displaystyle x nbsp f x m e x m 2 2 s 2 2 p s 2 displaystyle f x mid mu frac e x mu 2 2 sigma 2 sqrt 2 pi sigma 2 nbsp with s displaystyle sigma nbsp fixed the Jeffreys prior for the mean m displaystyle mu nbsp is p m I m E d d m log f x m 2 E x m s 2 2 f x m x m s 2 2 d x s 2 s 4 1 displaystyle begin aligned p mu amp propto sqrt I mu sqrt operatorname E left left frac d d mu log f x mid mu right 2 right sqrt operatorname E left left frac x mu sigma 2 right 2 right amp sqrt int infty infty f x mid mu left frac x mu sigma 2 right 2 dx sqrt sigma 2 sigma 4 propto 1 end aligned nbsp That is the Jeffreys prior for m displaystyle mu nbsp does not depend upon m displaystyle mu nbsp it is the unnormalized uniform distribution on the real line the distribution that is 1 or some other fixed constant for all points This is an improper prior and is up to the choice of constant the unique translation invariant distribution on the reals the Haar measure with respect to addition of reals corresponding to the mean being a measure of location and translation invariance corresponding to no information about location Gaussian distribution with standard deviation parameter edit For the Gaussian distribution of the real value x displaystyle x nbsp f x s e x m 2 2 s 2 2 p s 2 displaystyle f x mid sigma frac e x mu 2 2 sigma 2 sqrt 2 pi sigma 2 nbsp with m displaystyle mu nbsp fixed the Jeffreys prior for the standard deviation s gt 0 displaystyle sigma gt 0 nbsp is p s I s E d d s log f x s 2 E x m 2 s 2 s 3 2 f x s x m 2 s 2 s 3 2 d x 2 s 2 1 s displaystyle begin aligned p sigma amp propto sqrt I sigma sqrt operatorname E left left frac d d sigma log f x mid sigma right 2 right sqrt operatorname E left left frac x mu 2 sigma 2 sigma 3 right 2 right amp sqrt int infty infty f x mid sigma left frac x mu 2 sigma 2 sigma 3 right 2 dx sqrt frac 2 sigma 2 propto frac 1 sigma end aligned nbsp Equivalently the Jeffreys prior for log s d s s textstyle log sigma int d sigma sigma nbsp is the unnormalized uniform distribution on the real line and thus this distribution is also known as the logarithmic prior Similarly the Jeffreys prior for log s 2 2 log s displaystyle log sigma 2 2 log sigma nbsp is also uniform It is the unique up to a multiple prior on the positive reals that is scale invariant the Haar measure with respect to multiplication of positive reals corresponding to the standard deviation being a measure of scale and scale invariance corresponding to no information about scale As with the uniform distribution on the reals it is an improper prior Poisson distribution with rate parameter edit For the Poisson distribution of the non negative integer n displaystyle n nbsp f n l e l l n n displaystyle f n mid lambda e lambda frac lambda n n nbsp the Jeffreys prior for the rate parameter l 0 displaystyle lambda geq 0 nbsp is p l I l E d d l log f n l 2 E n l l 2 n 0 f n l n l l 2 1 l displaystyle begin aligned p lambda amp propto sqrt I lambda sqrt operatorname E left left frac d d lambda log f n mid lambda right 2 right sqrt operatorname E left left frac n lambda lambda right 2 right amp sqrt sum n 0 infty f n mid lambda left frac n lambda lambda right 2 sqrt frac 1 lambda end aligned nbsp Equivalently the Jeffreys prior for l d l l textstyle sqrt lambda int d lambda sqrt lambda nbsp is the unnormalized uniform distribution on the non negative real line Bernoulli trial edit For a coin that is heads with probability g 0 1 displaystyle gamma in 0 1 nbsp and is tails with probability 1 g displaystyle 1 gamma nbsp for a given H T 0 1 1 0 displaystyle H T in 0 1 1 0 nbsp the probability is g H 1 g T displaystyle gamma H 1 gamma T nbsp The Jeffreys prior for the parameter g displaystyle gamma nbsp is p g I g E d d g log f x g 2 E H g T 1 g 2 g 1 g 0 1 g 2 1 g 0 g 1 1 g 2 1 g 1 g displaystyle begin aligned p gamma amp propto sqrt I gamma sqrt operatorname E left left frac d d gamma log f x mid gamma right 2 right sqrt operatorname E left left frac H gamma frac T 1 gamma right 2 right amp sqrt gamma left frac 1 gamma frac 0 1 gamma right 2 1 gamma left frac 0 gamma frac 1 1 gamma right 2 frac 1 sqrt gamma 1 gamma end aligned nbsp This is the arcsine distribution and is a beta distribution with a b 1 2 displaystyle alpha beta 1 2 nbsp Furthermore if g sin 2 8 displaystyle gamma sin 2 theta nbsp then Pr 8 Pr g d g d 8 1 sin 2 8 1 sin 2 8 2 sin 8 cos 8 2 displaystyle Pr theta Pr gamma frac d gamma d theta propto frac 1 sqrt sin 2 theta 1 sin 2 theta 2 sin theta cos theta 2 nbsp That is the Jeffreys prior for 8 displaystyle theta nbsp is uniform in the interval 0 p 2 displaystyle 0 pi 2 nbsp Equivalently 8 displaystyle theta nbsp is uniform on the whole circle 0 2 p displaystyle 0 2 pi nbsp N sided die with biased probabilities edit Similarly for a throw of an N displaystyle N nbsp sided die with outcome probabilities g g 1 g N displaystyle vec gamma gamma 1 ldots gamma N nbsp each non negative and satisfying i 1 N g i 1 displaystyle sum i 1 N gamma i 1 nbsp the Jeffreys prior for g displaystyle vec gamma nbsp is the Dirichlet distribution with all alpha parameters set to one half This amounts to using a pseudocount of one half for each possible outcome Equivalently if we write g i f i 2 displaystyle gamma i varphi i 2 nbsp for each i displaystyle i nbsp then the Jeffreys prior for f displaystyle vec varphi nbsp is uniform on the N 1 dimensional unit sphere i e it is uniform on the surface of an N dimensional unit ball Generalizations editProbability matching prior edit In 1963 Welch and Peers showed that for a scalar parameter 8 the Jeffreys prior is probability matching in the sense that posterior predictive probabilities agree with frequentist probabilities and credible intervals of a chosen width coincide with frequentist confidence intervals 6 In a follow up Peers showed that this was not true for the multi parameter case 7 instead leading to the notion of probability matching priors with are only implicitly defined as the probability distribution solving a certain partial differential equation involving the Fisher information 8 a parallel prior edit Using tools from information geometry the Jeffreys prior can be generalized in pursuit of obtaining priors that encode geometric information of the statistical model so as to be invariant under a change of the coordinate of parameters 9 A special case the so called Weyl prior is defined as a volume form on a Weyl manifold 10 References edit Jeffreys H 1946 An invariant form for the prior probability in estimation problems Proceedings of the Royal Society of London Series A Mathematical and Physical Sciences 186 1007 453 461 Bibcode 1946RSPSA 186 453J doi 10 1098 rspa 1946 0056 JSTOR 97883 PMID 20998741 Jaynes ET September 1968 Prior probabilities PDF IEEE Transactions on Systems Science and Cybernetics 4 3 227 241 doi 10 1109 TSSC 1968 300117 Firth David 1992 Bias reduction the Jeffreys prior and GLIM In Fahrmeir Ludwig Francis Brian Gilchrist Robert Tutz Gerhard eds Advances in GLIM and Statistical Modelling New York Springer pp 91 100 doi 10 1007 978 1 4612 2952 0 15 ISBN 0 387 97873 9 Magis David 2015 A Note on Weighted Likelihood and Jeffreys Modal Estimation of Proficiency Levels in Polytomous Item Response Models Psychometrika 80 200 204 doi 10 1007 s11336 013 9378 5 Robert CP Chopin N Rousseau J 2009 Harold Jeffreys s Theory of Probability Revisited Statistical Science 24 2 arXiv 0804 3173 doi 10 1214 09 STS284 Welch B L Peers H W 1963 On Formulae for Confidence Points Based on Integrals of Weighted Likelihoods Journal of the Royal Statistical Society Series B Methodological 25 2 318 329 doi 10 1111 j 2517 6161 1963 tb00512 x Peers H W 1965 On Confidence Points and Bayesian Probability Points in the Case of Several Parameters Journal of the Royal Statistical Society Series B Methodological 27 1 9 16 doi 10 1111 j 2517 6161 1965 tb00581 x Scricciolo Catia 1999 Probability matching priors a review Journal of the Italian Statistical Society 8 83 doi 10 1007 BF03178943 Takeuchi J Amari S 2005 a parallel prior and its properties IEEE Transactions on Information Theory 51 3 1011 1023 doi 10 1109 TIT 2004 842703 Jiang Ruichao Tavakoli Javad Zhao Yiqiang 2020 Weyl Prior and Bayesian Statistics Entropy 22 4 467 doi 10 3390 e22040467 PMC 7516948 Further reading editKass RE Wasserman L 1996 The Selection of Prior Distributions by Formal Rules Journal of the American Statistical Association 91 435 1343 1370 doi 10 1080 01621459 1996 10477003 Lee Peter M 2012 Jeffreys rule Bayesian Statistics An Introduction 4th ed Wiley pp 96 102 ISBN 978 1 118 33257 3 Retrieved from https en wikipedia org w index php title Jeffreys prior amp oldid 1193498534, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.