fbpx
Wikipedia

Cramér–Rao bound

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic (fixed, though unknown) parameter. The result is named in honor of Harald Cramér and C. R. Rao,[1][2][3] but has also been derived independently by Maurice Fréchet,[4] Georges Darmois,[5] and by Alexander Aitken and Harold Silverstone.[6][7] It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

Illustration of the Cramer-Rao bound: there is no unbiased estimator which is able to estimate the (2-dimensional) parameter with less variance than the Cramer-Rao bound, illustrated as standard deviation ellipse.

An unbiased estimator that achieves this bound is said to be (fully) efficient. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is, therefore, the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur either if for any unbiased estimator, there exists another with a strictly smaller variance, or if an MVU estimator exists, but its variance is strictly greater than the inverse of the Fisher information.

The Cramér–Rao bound can also be used to bound the variance of biased estimators of given bias. In some cases, a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramér–Rao lower bound; see estimator bias.

Significant progress over the Cramér–Rao lower bound was proposed by A. Bhattacharyya through a series of works, called Bhattacharyya Bound.[8][9][10][11]

Statement edit

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.

Scalar unbiased case edit

Suppose   is an unknown deterministic parameter that is to be estimated from   independent observations (measurements) of  , each from a distribution according to some probability density function  . The variance of any unbiased estimator   of   is then bounded[12] by the reciprocal of the Fisher information  :

 

where the Fisher information   is defined by

 

and   is the natural logarithm of the likelihood function for a single sample   and   denotes the expected value with respect to the density   of  . If not indicated, in what follows, the expectation is taken with respect to  .

If   is twice differentiable and certain regularity conditions hold, then the Fisher information can also be defined as follows:[13]

 

The efficiency of an unbiased estimator   measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as

 

or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao lower bound thus gives

 .

General scalar case edit

A more general form of the bound can be obtained by considering a biased estimator  , whose expectation is not   but a function of this parameter, say,  . Hence   is not generally equal to 0. In this case, the bound is given by

 

where   is the derivative of   (by  ), and   is the Fisher information defined above.

Bound on the variance of biased estimators edit

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows.[14] Consider an estimator   with bias  , and let  . By the result above, any unbiased estimator whose expectation is   has variance greater than or equal to  . Thus, any estimator   whose bias is given by a function   satisfies[15]

 

The unbiased version of the bound is a special case of this result, with  .

It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation, we find that the mean squared error of a biased estimator is bounded by

 

using the standard decomposition of the MSE. Note, however, that if   this bound might be less than the unbiased Cramér–Rao bound  . For instance, in the example of estimating variance below,  .

Multivariate case edit

Extending the Cramér–Rao bound to multiple parameters, define a parameter column vector

 

with probability density function   which satisfies the two regularity conditions below.

The Fisher information matrix is a   matrix with element   defined as

 

Let   be an estimator of any vector function of parameters,  , and denote its expectation vector   by  . The Cramér–Rao bound then states that the covariance matrix of   satisfies

 ,
 

where

  • The matrix inequality   is understood to mean that the matrix   is positive semidefinite, and
  •   is the Jacobian matrix whose   element is given by  .


If   is an unbiased estimator of   (i.e.,  ), then the Cramér–Rao bound reduces to

 

If it is inconvenient to compute the inverse of the Fisher information matrix, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound.[16]

 

Regularity conditions edit

The bound relies on two weak regularity conditions on the probability density function,  , and the estimator  :

  • The Fisher information is always defined; equivalently, for all   such that  ,
     
    exists, and is finite.
  • The operations of integration with respect to   and differentiation with respect to   can be interchanged in the expectation of  ; that is,
     
    whenever the right-hand side is finite.
    This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold:
    1. The function   has bounded support in  , and the bounds do not depend on  ;
    2. The function   has infinite support, is continuously differentiable, and the integral converges uniformly for all  .

Proof edit

Proof for the general case based on the Chapman–Robbins bound edit

Proof based on.[17]

Proof

First equation:

Let   be an infinitesimal, then for any  , plugging   in, we have  

Plugging this into multivariate Chapman–Robbins bound gives  .

Second equation:

It suffices to prove this for scalar case, with   taking values in   . Because for general   , we can take any  , then defining  , the scalar case gives

 
This holds for all  , so we can conclude
 
The scalar case states that   with  .

Let   be an infinitesimal, then for any  , taking   in the single-variate Chapman–Robbins bound gives  .

By linear algebra,   for any positive-definite matrix  , thus we obtain

 

A standalone proof for the general scalar case edit

For the general scalar case:

Assume that   is an estimator with expectation   (based on the observations  ), i.e. that  . The goal is to prove that, for all  ,

 

Let   be a random variable with probability density function  . Here   is a statistic, which is used as an estimator for  . Define   as the score:

 

where the chain rule is used in the final equality above. Then the expectation of  , written  , is zero. This is because:

 

where the integral and partial derivative have been interchanged (justified by the second regularity condition).


If we consider the covariance   of   and  , we have  , because  . Expanding this expression we have

 

again because the integration and differentiation operations commute (second condition).

The Cauchy–Schwarz inequality shows that

 

therefore

 

which proves the proposition.

Examples edit

Multivariate normal distribution edit

For the case of a d-variate normal distribution

 

the Fisher information matrix has elements[18]

 

where "tr" is the trace.

For example, let   be a sample of   independent observations with unknown mean   and known variance   .

 

Then the Fisher information is a scalar given by

 

and so the Cramér–Rao bound is

 

Normal variance with known mean edit

Suppose X is a normally distributed random variable with known mean   and unknown variance  . Consider the following statistic:

 

Then T is unbiased for  , as  . What is the variance of T?

 

(the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value  ; the second is the square of the variance, or  . Thus

 

Now, what is the Fisher information in the sample? Recall that the score   is defined as

 

where   is the likelihood function. Thus in this case,

 
 

where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of  , or

 

Thus the information in a sample of   independent observations is just   times this, or  

The Cramér–Rao bound states that

 

In this case, the inequality is saturated (equality is achieved), showing that the estimator is efficient.

However, we can achieve a lower mean squared error using a biased estimator. The estimator

 

obviously has a smaller variance, which is in fact

 

Its bias is

 

so its mean squared error is

 

which is less than what unbiased estimators can achieve according to the Cramér–Rao bound.

When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by  , rather than   or  .

See also edit

References and notes edit

  1. ^ Cramér, Harald (1946). Mathematical Methods of Statistics. Princeton, NJ: Princeton Univ. Press. ISBN 0-691-08004-6. OCLC 185436716.
  2. ^ Rao, Calyampudi Radakrishna (1945). "Information and the accuracy attainable in the estimation of statistical parameters". Bulletin of the Calcutta Mathematical Society. 37. Calcutta Mathematical Society: 81–89. MR 0015748.
  3. ^ Rao, Calyampudi Radakrishna (1994). S. Das Gupta (ed.). Selected Papers of C. R. Rao. New York: Wiley. ISBN 978-0-470-22091-7. OCLC 174244259.
  4. ^ Fréchet, Maurice (1943). "Sur l'extension de certaines évaluations statistiques au cas de petits échantillons". Rev. Inst. Int. Statist. 11 (3/4): 182–205. doi:10.2307/1401114. JSTOR 1401114.
  5. ^ Darmois, Georges (1945). "Sur les limites de la dispersion de certaines estimations". Rev. Int. Inst. Statist. 13 (1/4): 9–15. doi:10.2307/1400974. JSTOR 1400974.
  6. ^ Aitken, A. C.; Silverstone, H. (1942). "XV.—On the Estimation of Statistical Parameters". Proceedings of the Royal Society of Edinburgh Section A: Mathematics. 61 (2): 186–194. doi:10.1017/S008045410000618X. ISSN 2053-5902. S2CID 124029876.
  7. ^ Shenton, L. R. (1970). "The so-called Cramer–Rao inequality". The American Statistician. 24 (2): 36. JSTOR 2681931.
  8. ^ Dodge, Yadolah (2003). The Oxford Dictionary of Statistical Terms. Oxford University Press. ISBN 978-0-19-920613-1.
  9. ^ Bhattacharyya, A. (1946). "On Some Analogues of the Amount of Information and Their Use in Statistical Estimation". Sankhyā. 8 (1): 1–14. JSTOR 25047921. MR 0020242.
  10. ^ Bhattacharyya, A. (1947). "On Some Analogues of the Amount of Information and Their Use in Statistical Estimation (Contd.)". Sankhyā. 8 (3): 201–218. JSTOR 25047948. MR 0023503.
  11. ^ Bhattacharyya, A. (1948). "On Some Analogues of the Amount of Information and Their Use in Statistical Estimation (Concluded)". Sankhyā. 8 (4): 315–328. JSTOR 25047897. MR 0026302.
  12. ^ Nielsen, Frank (2013). "Cramér-Rao Lower Bound and Information Geometry". Connected at Infinity II. Texts and Readings in Mathematics. Vol. 67. Hindustan Book Agency, Gurgaon. p. 18-37. arXiv:1301.3578. doi:10.1007/978-93-86279-56-9_2. ISBN 978-93-80250-51-9. S2CID 16759683.
  13. ^ Suba Rao. (PDF). Archived from the original (PDF) on 2020-09-26. Retrieved 2020-05-24.
  14. ^ "Cramér Rao Lower Bound - Navipedia". gssc.esa.int.
  15. ^ "Cramér-Rao Bound".
  16. ^ For the Bayesian case, see eqn. (11) of Bobrovsky; Mayer-Wolf; Zakai (1987). "Some classes of global Cramer–Rao bounds". Ann. Stat. 15 (4): 1421–38. doi:10.1214/aos/1176350602.
  17. ^ Polyanskiy, Yury (2017). "Lecture notes on information theory, chapter 29, ECE563 (UIUC)" (PDF). Lecture notes on information theory. (PDF) from the original on 2022-05-24. Retrieved 2022-05-24.
  18. ^ Kay, S. M. (1993). Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall. p. 47. ISBN 0-13-042268-1.

Further reading edit

  • Amemiya, Takeshi (1985). Advanced Econometrics. Cambridge: Harvard University Press. pp. 14–17. ISBN 0-674-00560-0.
  • Bos, Adriaan van den (2007). Parameter Estimation for Scientists and Engineers. Hoboken: John Wiley & Sons. pp. 45–98. ISBN 978-0-470-14781-8.
  • Kay, Steven M. (1993). Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall. ISBN 0-13-345711-7.. Chapter 3.
  • Shao, Jun (1998). Mathematical Statistics. New York: Springer. ISBN 0-387-98674-X.. Section 3.1.3.
  • Posterior uncertainty, asymptotic law and Cramér-Rao bound, Structural Control and Health Monitoring 25(1851):e2113 DOI: 10.1002/stc.2113

External links edit

  • FandPLimitTool a GUI-based software to calculate the Fisher information and Cramér-Rao lower bound with application to single-molecule microscopy.

cramér, bound, estimation, theory, statistics, relates, estimation, deterministic, fixed, though, unknown, parameter, result, named, honor, harald, cramér, also, been, derived, independently, maurice, fréchet, georges, darmois, alexander, aitken, harold, silve. In estimation theory and statistics the Cramer Rao bound CRB relates to estimation of a deterministic fixed though unknown parameter The result is named in honor of Harald Cramer and C R Rao 1 2 3 but has also been derived independently by Maurice Frechet 4 Georges Darmois 5 and by Alexander Aitken and Harold Silverstone 6 7 It is also known as Frechet Cramer Rao or Frechet Darmois Cramer Rao lower bound It states that the precision of any unbiased estimator is at most the Fisher information or equivalently the reciprocal of the Fisher information is a lower bound on its variance Illustration of the Cramer Rao bound there is no unbiased estimator which is able to estimate the 2 dimensional parameter with less variance than the Cramer Rao bound illustrated as standard deviation ellipse An unbiased estimator that achieves this bound is said to be fully efficient Such a solution achieves the lowest possible mean squared error among all unbiased methods and is therefore the minimum variance unbiased MVU estimator However in some cases no unbiased technique exists which achieves the bound This may occur either if for any unbiased estimator there exists another with a strictly smaller variance or if an MVU estimator exists but its variance is strictly greater than the inverse of the Fisher information The Cramer Rao bound can also be used to bound the variance of biased estimators of given bias In some cases a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramer Rao lower bound see estimator bias Significant progress over the Cramer Rao lower bound was proposed by A Bhattacharyya through a series of works called Bhattacharyya Bound 8 9 10 11 Contents 1 Statement 1 1 Scalar unbiased case 1 2 General scalar case 1 3 Bound on the variance of biased estimators 1 4 Multivariate case 1 5 Regularity conditions 2 Proof 2 1 Proof for the general case based on the Chapman Robbins bound 2 2 A standalone proof for the general scalar case 3 Examples 3 1 Multivariate normal distribution 3 2 Normal variance with known mean 4 See also 5 References and notes 6 Further reading 7 External linksStatement editThe Cramer Rao bound is stated in this section for several increasingly general cases beginning with the case in which the parameter is a scalar and its estimator is unbiased All versions of the bound require certain regularity conditions which hold for most well behaved distributions These conditions are listed later in this section Scalar unbiased case edit Suppose 8 displaystyle theta nbsp is an unknown deterministic parameter that is to be estimated from n displaystyle n nbsp independent observations measurements of x displaystyle x nbsp each from a distribution according to some probability density function f x 8 displaystyle f x theta nbsp The variance of any unbiased estimator 8 displaystyle hat theta nbsp of 8 displaystyle theta nbsp is then bounded 12 by the reciprocal of the Fisher information I 8 displaystyle I theta nbsp var 8 1I 8 displaystyle operatorname var hat theta geq frac 1 I theta nbsp where the Fisher information I 8 displaystyle I theta nbsp is defined by I 8 nEX 8 ℓ X 8 8 2 displaystyle I theta n operatorname E X theta left left frac partial ell X theta partial theta right 2 right nbsp and ℓ x 8 log f x 8 displaystyle ell x theta log f x theta nbsp is the natural logarithm of the likelihood function for a single sample x displaystyle x nbsp and Ex 8 displaystyle operatorname E x theta nbsp denotes the expected value with respect to the density f x 8 displaystyle f x theta nbsp of X displaystyle X nbsp If not indicated in what follows the expectation is taken with respect to X displaystyle X nbsp If ℓ x 8 displaystyle ell x theta nbsp is twice differentiable and certain regularity conditions hold then the Fisher information can also be defined as follows 13 I 8 nEX 8 2ℓ X 8 82 displaystyle I theta n operatorname E X theta left frac partial 2 ell X theta partial theta 2 right nbsp The efficiency of an unbiased estimator 8 displaystyle hat theta nbsp measures how close this estimator s variance comes to this lower bound estimator efficiency is defined as e 8 I 8 1var 8 displaystyle e hat theta frac I theta 1 operatorname var hat theta nbsp or the minimum possible variance for an unbiased estimator divided by its actual variance The Cramer Rao lower bound thus gives e 8 1 displaystyle e hat theta leq 1 nbsp General scalar case edit A more general form of the bound can be obtained by considering a biased estimator T X displaystyle T X nbsp whose expectation is not 8 displaystyle theta nbsp but a function of this parameter say ps 8 displaystyle psi theta nbsp Hence E T X 8 ps 8 8 displaystyle E T X theta psi theta theta nbsp is not generally equal to 0 In this case the bound is given by var T ps 8 2I 8 displaystyle operatorname var T geq frac psi theta 2 I theta nbsp where ps 8 displaystyle psi theta nbsp is the derivative of ps 8 displaystyle psi theta nbsp by 8 displaystyle theta nbsp and I 8 displaystyle I theta nbsp is the Fisher information defined above Bound on the variance of biased estimators edit Apart from being a bound on estimators of functions of the parameter this approach can be used to derive a bound on the variance of biased estimators with a given bias as follows 14 Consider an estimator 8 displaystyle hat theta nbsp with bias b 8 E 8 8 displaystyle b theta E hat theta theta nbsp and let ps 8 b 8 8 displaystyle psi theta b theta theta nbsp By the result above any unbiased estimator whose expectation is ps 8 displaystyle psi theta nbsp has variance greater than or equal to ps 8 2 I 8 displaystyle psi theta 2 I theta nbsp Thus any estimator 8 displaystyle hat theta nbsp whose bias is given by a function b 8 displaystyle b theta nbsp satisfies 15 var 8 1 b 8 2I 8 displaystyle operatorname var left hat theta right geq frac 1 b theta 2 I theta nbsp The unbiased version of the bound is a special case of this result with b 8 0 displaystyle b theta 0 nbsp It s trivial to have a small variance an estimator that is constant has a variance of zero But from the above equation we find that the mean squared error of a biased estimator is bounded by E 8 8 2 1 b 8 2I 8 b 8 2 displaystyle operatorname E left hat theta theta 2 right geq frac 1 b theta 2 I theta b theta 2 nbsp using the standard decomposition of the MSE Note however that if 1 b 8 lt 1 displaystyle 1 b theta lt 1 nbsp this bound might be less than the unbiased Cramer Rao bound 1 I 8 displaystyle 1 I theta nbsp For instance in the example of estimating variance below 1 b 8 nn 2 lt 1 displaystyle 1 b theta frac n n 2 lt 1 nbsp Multivariate case edit Extending the Cramer Rao bound to multiple parameters define a parameter column vector 8 81 82 8d T Rd displaystyle boldsymbol theta left theta 1 theta 2 dots theta d right T in mathbb R d nbsp with probability density function f x 8 displaystyle f x boldsymbol theta nbsp which satisfies the two regularity conditions below The Fisher information matrix is a d d displaystyle d times d nbsp matrix with element Im k displaystyle I m k nbsp defined as Im k E 8mlog f x 8 8klog f x 8 E 2 8m 8klog f x 8 displaystyle I m k operatorname E left frac partial partial theta m log f left x boldsymbol theta right frac partial partial theta k log f left x boldsymbol theta right right operatorname E left frac partial 2 partial theta m partial theta k log f left x boldsymbol theta right right nbsp Let T X displaystyle boldsymbol T X nbsp be an estimator of any vector function of parameters T X T1 X Td X T displaystyle boldsymbol T X T 1 X ldots T d X T nbsp and denote its expectation vector E T X displaystyle operatorname E boldsymbol T X nbsp by ps 8 displaystyle boldsymbol psi boldsymbol theta nbsp The Cramer Rao bound then states that the covariance matrix of T X displaystyle boldsymbol T X nbsp satisfies I 8 ϕ 8 Tcov8 T X 1ϕ 8 displaystyle I left boldsymbol theta right geq phi theta T operatorname cov boldsymbol theta left boldsymbol T X right 1 phi theta nbsp cov8 T X ϕ 8 I 8 1ϕ 8 T displaystyle operatorname cov boldsymbol theta left boldsymbol T X right geq phi theta I left boldsymbol theta right 1 phi theta T nbsp where The matrix inequality A B displaystyle A geq B nbsp is understood to mean that the matrix A B displaystyle A B nbsp is positive semidefinite and ϕ 8 ps 8 8 displaystyle phi theta partial boldsymbol psi boldsymbol theta partial boldsymbol theta nbsp is the Jacobian matrix whose ij displaystyle ij nbsp element is given by psi 8 8j displaystyle partial psi i boldsymbol theta partial theta j nbsp If T X displaystyle boldsymbol T X nbsp is an unbiased estimator of 8 displaystyle boldsymbol theta nbsp i e ps 8 8 displaystyle boldsymbol psi left boldsymbol theta right boldsymbol theta nbsp then the Cramer Rao bound reduces to cov8 T X I 8 1 displaystyle operatorname cov boldsymbol theta left boldsymbol T X right geq I left boldsymbol theta right 1 nbsp If it is inconvenient to compute the inverse of the Fisher information matrix then one can simply take the reciprocal of the corresponding diagonal element to find a possibly loose lower bound 16 var8 Tm X cov8 T X mm I 8 1 mm I 8 mm 1 displaystyle operatorname var boldsymbol theta T m X left operatorname cov boldsymbol theta left boldsymbol T X right right mm geq left I left boldsymbol theta right 1 right mm geq left left I left boldsymbol theta right right mm right 1 nbsp Regularity conditions edit The bound relies on two weak regularity conditions on the probability density function f x 8 displaystyle f x theta nbsp and the estimator T X displaystyle T X nbsp The Fisher information is always defined equivalently for all x displaystyle x nbsp such that f x 8 gt 0 displaystyle f x theta gt 0 nbsp 8log f x 8 displaystyle frac partial partial theta log f x theta nbsp exists and is finite The operations of integration with respect to x displaystyle x nbsp and differentiation with respect to 8 displaystyle theta nbsp can be interchanged in the expectation of T displaystyle T nbsp that is 8 T x f x 8 dx T x 8f x 8 dx displaystyle frac partial partial theta left int T x f x theta dx right int T x left frac partial partial theta f x theta right dx nbsp whenever the right hand side is finite This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold The function f x 8 displaystyle f x theta nbsp has bounded support in x displaystyle x nbsp and the bounds do not depend on 8 displaystyle theta nbsp The function f x 8 displaystyle f x theta nbsp has infinite support is continuously differentiable and the integral converges uniformly for all 8 displaystyle theta nbsp Proof editProof for the general case based on the Chapman Robbins bound edit Proof based on 17 Proof First equation Let d displaystyle delta nbsp be an infinitesimal then for any v Rn displaystyle v in mathbb R n nbsp plugging 8 8 dv displaystyle theta theta delta v nbsp in we have E8 T E8 T vTϕ 8 d x2 m8 m8 vTI 8 vd2 displaystyle E theta T E theta T v T phi theta delta quad chi 2 mu theta mu theta v T I theta v delta 2 nbsp Plugging this into multivariate Chapman Robbins bound gives I 8 ϕ 8 Cov8 T 1ϕ 8 T displaystyle I theta geq phi theta operatorname Cov theta T 1 phi theta T nbsp Second equation It suffices to prove this for scalar case with h X displaystyle h X nbsp taking values in R displaystyle mathbb R nbsp Because for general T X displaystyle T X nbsp we can take any v Rm displaystyle v in mathbb R m nbsp then defining h jvjTj textstyle h sum j v j T j nbsp the scalar case givesVar8 h vTCov8 T v vTϕ 8 I 8 1ϕ 8 Tv displaystyle operatorname Var theta h v T operatorname Cov theta T v geq v T phi theta I theta 1 phi theta T v nbsp This holds for all v Rm displaystyle v in mathbb R m nbsp so we can concludeCov8 T ϕ 8 I 8 1ϕ 8 T displaystyle operatorname Cov theta T geq phi theta I theta 1 phi theta T nbsp The scalar case states that Var8 h ϕ 8 TI 8 1ϕ 8 displaystyle operatorname Var theta h geq phi theta T I theta 1 phi theta nbsp with ϕ 8 8E8 h displaystyle phi theta nabla theta E theta h nbsp Let d displaystyle delta nbsp be an infinitesimal then for any v Rn displaystyle v in mathbb R n nbsp taking 8 8 dv displaystyle theta theta delta v nbsp in the single variate Chapman Robbins bound gives Var8 h v ϕ 8 2vTI 8 v displaystyle operatorname Var theta h geq frac langle v phi theta rangle 2 v T I theta v nbsp By linear algebra supv 0 w v 2vTMv wTM 1w displaystyle sup v neq 0 frac langle w v rangle 2 v T Mv w T M 1 w nbsp for any positive definite matrix M displaystyle M nbsp thus we obtainVar8 h ϕ 8 TI 8 1ϕ 8 displaystyle operatorname Var theta h geq phi theta T I theta 1 phi theta nbsp A standalone proof for the general scalar case edit For the general scalar case Assume that T t X displaystyle T t X nbsp is an estimator with expectation ps 8 displaystyle psi theta nbsp based on the observations X displaystyle X nbsp i e that E T ps 8 displaystyle operatorname E T psi theta nbsp The goal is to prove that for all 8 displaystyle theta nbsp var t X ps 8 2I 8 displaystyle operatorname var t X geq frac psi prime theta 2 I theta nbsp Let X displaystyle X nbsp be a random variable with probability density function f x 8 displaystyle f x theta nbsp Here T t X displaystyle T t X nbsp is a statistic which is used as an estimator for ps 8 displaystyle psi theta nbsp Define V displaystyle V nbsp as the score V 8ln f X 8 1f X 8 8f X 8 displaystyle V frac partial partial theta ln f X theta frac 1 f X theta frac partial partial theta f X theta nbsp where the chain rule is used in the final equality above Then the expectation of V displaystyle V nbsp written E V displaystyle operatorname E V nbsp is zero This is because E V f x 8 1f x 8 8f x 8 dx 8 f x 8 dx 0 displaystyle operatorname E V int f x theta left frac 1 f x theta frac partial partial theta f x theta right dx frac partial partial theta int f x theta dx 0 nbsp where the integral and partial derivative have been interchanged justified by the second regularity condition If we consider the covariance cov V T displaystyle operatorname cov V T nbsp of V displaystyle V nbsp and T displaystyle T nbsp we have cov V T E VT displaystyle operatorname cov V T operatorname E VT nbsp because E V 0 displaystyle operatorname E V 0 nbsp Expanding this expression we have cov V T E T 1f X 8 8f X 8 t x 1f x 8 8f x 8 f x 8 dx 8 t x f x 8 dx 8E T ps 8 displaystyle begin aligned operatorname cov V T amp operatorname E left T cdot left frac 1 f X theta frac partial partial theta f X theta right right 6pt amp int t x left frac 1 f x theta frac partial partial theta f x theta right f x theta dx 6pt amp frac partial partial theta left int t x f x theta dx right frac partial partial theta E T psi prime theta end aligned nbsp again because the integration and differentiation operations commute second condition The Cauchy Schwarz inequality shows that var T var V cov V T ps 8 displaystyle sqrt operatorname var T operatorname var V geq left operatorname cov V T right left psi prime theta right nbsp therefore var T ps 8 2var V ps 8 2I 8 displaystyle operatorname var T geq frac psi prime theta 2 operatorname var V frac psi prime theta 2 I theta nbsp which proves the proposition Examples editMultivariate normal distribution edit For the case of a d variate normal distribution x Nd m 8 C 8 displaystyle boldsymbol x sim mathcal N d left boldsymbol mu boldsymbol theta boldsymbol C boldsymbol theta right nbsp the Fisher information matrix has elements 18 Im k mT 8mC 1 m 8k 12tr C 1 C 8mC 1 C 8k displaystyle I m k frac partial boldsymbol mu T partial theta m boldsymbol C 1 frac partial boldsymbol mu partial theta k frac 1 2 operatorname tr left boldsymbol C 1 frac partial boldsymbol C partial theta m boldsymbol C 1 frac partial boldsymbol C partial theta k right nbsp where tr is the trace For example let w j displaystyle w j nbsp be a sample of n displaystyle n nbsp independent observations with unknown mean 8 displaystyle theta nbsp and known variance s2 displaystyle sigma 2 nbsp w j Nd n 81 s2I displaystyle w j sim mathcal N d n left theta boldsymbol 1 sigma 2 boldsymbol I right nbsp Then the Fisher information is a scalar given by I 8 m 8 8 TC 1 m 8 8 i 1n1s2 ns2 displaystyle I theta left frac partial boldsymbol mu theta partial theta right T boldsymbol C 1 left frac partial boldsymbol mu theta partial theta right sum i 1 n frac 1 sigma 2 frac n sigma 2 nbsp and so the Cramer Rao bound is var 8 s2n displaystyle operatorname var left hat theta right geq frac sigma 2 n nbsp Normal variance with known mean edit Suppose X is a normally distributed random variable with known mean m displaystyle mu nbsp and unknown variance s2 displaystyle sigma 2 nbsp Consider the following statistic T i 1n Xi m 2n displaystyle T frac sum i 1 n X i mu 2 n nbsp Then T is unbiased for s2 displaystyle sigma 2 nbsp as E T s2 displaystyle E T sigma 2 nbsp What is the variance of T var T var i 1n Xi m 2n i 1nvar Xi m 2n2 nvar X m 2n2 1n E X m 4 E X m 2 2 displaystyle operatorname var T operatorname var left frac sum i 1 n X i mu 2 n right frac sum i 1 n operatorname var X i mu 2 n 2 frac n operatorname var X mu 2 n 2 frac 1 n left operatorname E left X mu 4 right left operatorname E X mu 2 right 2 right nbsp the second equality follows directly from the definition of variance The first term is the fourth moment about the mean and has value 3 s2 2 displaystyle 3 sigma 2 2 nbsp the second is the square of the variance or s2 2 displaystyle sigma 2 2 nbsp Thus var T 2 s2 2n displaystyle operatorname var T frac 2 sigma 2 2 n nbsp Now what is the Fisher information in the sample Recall that the score V displaystyle V nbsp is defined as V s2log L s2 X displaystyle V frac partial partial sigma 2 log left L sigma 2 X right nbsp where L displaystyle L nbsp is the likelihood function Thus in this case log L s2 X log 12ps2e X m 2 2s2 log 2ps2 X m 22s2 displaystyle log left L sigma 2 X right log left frac 1 sqrt 2 pi sigma 2 e X mu 2 2 sigma 2 right log sqrt 2 pi sigma 2 frac X mu 2 2 sigma 2 nbsp V s2log L s2 X s2 log 2ps2 X m 22s2 12s2 X m 22 s2 2 displaystyle V frac partial partial sigma 2 log left L sigma 2 X right frac partial partial sigma 2 left log sqrt 2 pi sigma 2 frac X mu 2 2 sigma 2 right frac 1 2 sigma 2 frac X mu 2 2 sigma 2 2 nbsp where the second equality is from elementary calculus Thus the information in a single observation is just minus the expectation of the derivative of V displaystyle V nbsp or I E V s2 E X m 2 s2 3 12 s2 2 s2 s2 3 12 s2 2 12 s2 2 displaystyle I operatorname E left frac partial V partial sigma 2 right operatorname E left frac X mu 2 sigma 2 3 frac 1 2 sigma 2 2 right frac sigma 2 sigma 2 3 frac 1 2 sigma 2 2 frac 1 2 sigma 2 2 nbsp Thus the information in a sample of n displaystyle n nbsp independent observations is just n displaystyle n nbsp times this or n2 s2 2 displaystyle frac n 2 sigma 2 2 nbsp The Cramer Rao bound states that var T 1I displaystyle operatorname var T geq frac 1 I nbsp In this case the inequality is saturated equality is achieved showing that the estimator is efficient However we can achieve a lower mean squared error using a biased estimator The estimator T i 1n Xi m 2n 2 displaystyle T frac sum i 1 n X i mu 2 n 2 nbsp obviously has a smaller variance which is in fact var T 2n s2 2 n 2 2 displaystyle operatorname var T frac 2n sigma 2 2 n 2 2 nbsp Its bias is 1 nn 2 s2 2s2n 2 displaystyle left 1 frac n n 2 right sigma 2 frac 2 sigma 2 n 2 nbsp so its mean squared error is MSE T 2n n 2 2 4 n 2 2 s2 2 2 s2 2n 2 displaystyle operatorname MSE T left frac 2n n 2 2 frac 4 n 2 2 right sigma 2 2 frac 2 sigma 2 2 n 2 nbsp which is less than what unbiased estimators can achieve according to the Cramer Rao bound When the mean is not known the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n 1 displaystyle n 1 nbsp rather than n 1 displaystyle n 1 nbsp or n 2 displaystyle n 2 nbsp See also editChapman Robbins bound Kullback s inequality Brascamp Lieb inequalityReferences and notes edit Cramer Harald 1946 Mathematical Methods of Statistics Princeton NJ Princeton Univ Press ISBN 0 691 08004 6 OCLC 185436716 Rao Calyampudi Radakrishna 1945 Information and the accuracy attainable in the estimation of statistical parameters Bulletin of the Calcutta Mathematical Society 37 Calcutta Mathematical Society 81 89 MR 0015748 Rao Calyampudi Radakrishna 1994 S Das Gupta ed Selected Papers of C R Rao New York Wiley ISBN 978 0 470 22091 7 OCLC 174244259 Frechet Maurice 1943 Sur l extension de certaines evaluations statistiques au cas de petits echantillons Rev Inst Int Statist 11 3 4 182 205 doi 10 2307 1401114 JSTOR 1401114 Darmois Georges 1945 Sur les limites de la dispersion de certaines estimations Rev Int Inst Statist 13 1 4 9 15 doi 10 2307 1400974 JSTOR 1400974 Aitken A C Silverstone H 1942 XV On the Estimation of Statistical Parameters Proceedings of the Royal Society of Edinburgh Section A Mathematics 61 2 186 194 doi 10 1017 S008045410000618X ISSN 2053 5902 S2CID 124029876 Shenton L R 1970 The so called Cramer Rao inequality The American Statistician 24 2 36 JSTOR 2681931 Dodge Yadolah 2003 The Oxford Dictionary of Statistical Terms Oxford University Press ISBN 978 0 19 920613 1 Bhattacharyya A 1946 On Some Analogues of the Amount of Information and Their Use in Statistical Estimation Sankhya 8 1 1 14 JSTOR 25047921 MR 0020242 Bhattacharyya A 1947 On Some Analogues of the Amount of Information and Their Use in Statistical Estimation Contd Sankhya 8 3 201 218 JSTOR 25047948 MR 0023503 Bhattacharyya A 1948 On Some Analogues of the Amount of Information and Their Use in Statistical Estimation Concluded Sankhya 8 4 315 328 JSTOR 25047897 MR 0026302 Nielsen Frank 2013 Cramer Rao Lower Bound and Information Geometry Connected at Infinity II Texts and Readings in Mathematics Vol 67 Hindustan Book Agency Gurgaon p 18 37 arXiv 1301 3578 doi 10 1007 978 93 86279 56 9 2 ISBN 978 93 80250 51 9 S2CID 16759683 Suba Rao Lectures on statistical inference PDF Archived from the original PDF on 2020 09 26 Retrieved 2020 05 24 Cramer Rao Lower Bound Navipedia gssc esa int Cramer Rao Bound For the Bayesian case see eqn 11 of Bobrovsky Mayer Wolf Zakai 1987 Some classes of global Cramer Rao bounds Ann Stat 15 4 1421 38 doi 10 1214 aos 1176350602 Polyanskiy Yury 2017 Lecture notes on information theory chapter 29 ECE563 UIUC PDF Lecture notes on information theory Archived PDF from the original on 2022 05 24 Retrieved 2022 05 24 Kay S M 1993 Fundamentals of Statistical Signal Processing Estimation Theory Prentice Hall p 47 ISBN 0 13 042268 1 Further reading editAmemiya Takeshi 1985 Advanced Econometrics Cambridge Harvard University Press pp 14 17 ISBN 0 674 00560 0 Bos Adriaan van den 2007 Parameter Estimation for Scientists and Engineers Hoboken John Wiley amp Sons pp 45 98 ISBN 978 0 470 14781 8 Kay Steven M 1993 Fundamentals of Statistical Signal Processing Volume I Estimation Theory Prentice Hall ISBN 0 13 345711 7 Chapter 3 Shao Jun 1998 Mathematical Statistics New York Springer ISBN 0 387 98674 X Section 3 1 3 Posterior uncertainty asymptotic law and Cramer Rao bound Structural Control and Health Monitoring 25 1851 e2113 DOI 10 1002 stc 2113External links editFandPLimitTool a GUI based software to calculate the Fisher information and Cramer Rao lower bound with application to single molecule microscopy Retrieved from https en wikipedia org w index php title Cramer Rao bound amp oldid 1211644679, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.