fbpx
Wikipedia

Fisher information metric

In information geometry, the Fisher information metric[1] is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space. It can be used to calculate the informational difference between measurements.[clarification needed]

The metric is interesting in several aspects. By Chentsov’s theorem, the Fisher information metric on statistical models is the only Riemannian metric (up to rescaling) that is invariant under sufficient statistics.[2][3]

It can also be understood to be the infinitesimal form of the relative entropy (i.e., the Kullback–Leibler divergence); specifically, it is the Hessian of the divergence. Alternately, it can be understood as the metric induced by the flat space Euclidean metric, after appropriate changes of variable. When extended to complex projective Hilbert space, it becomes the Fubini–Study metric; when written in terms of mixed states, it is the quantum Bures metric.

Considered purely as a matrix, it is known as the Fisher information matrix. Considered as a measurement technique, where it is used to estimate hidden parameters in terms of observed random variables, it is known as the observed information.

Definition edit

Given a statistical manifold with coordinates  , one writes   for the probability density as a function of  . Here   is drawn from the value space R for a (discrete or continuous) random variable X. The probability is normalized by   where   is the distribution of  .

The Fisher information metric then takes the form:[clarification needed]

 

The integral is performed over all values x in R. The variable   is now a coordinate on a Riemann manifold. The labels j and k index the local coordinate axes on the manifold.

When the probability is derived from the Gibbs measure, as it would be for any Markovian process, then   can also be understood to be a Lagrange multiplier; Lagrange multipliers are used to enforce constraints, such as holding the expectation value of some quantity constant. If there are n constraints holding n different expectation values constant, then the dimension of the manifold is n dimensions smaller than the original space. In this case, the metric can be explicitly derived from the partition function; a derivation and discussion is presented there.

Substituting   from information theory, an equivalent form of the above definition is:

 

To show that the equivalent form equals the above definition note that

 

and apply   on both sides.

Relation to the Kullback–Leibler divergence edit

Alternatively, the metric can be obtained as the second derivative of the relative entropy or Kullback–Leibler divergence.[4] To obtain this, one considers two probability distributions   and  , which are infinitesimally close to one another, so that

 

with   an infinitesimally small change of   in the j direction. Then, since the Kullback–Leibler divergence   has an absolute minimum of 0 when  , one has an expansion up to second order in   of the form

 .

The symmetric matrix   is positive (semi) definite and is the Hessian matrix of the function   at the extremum point  . This can be thought of intuitively as: "The distance between two infinitesimally close points on a statistical differential manifold is the informational difference between them."

Relation to Ruppeiner geometry edit

The Ruppeiner metric and Weinhold metric are the Fisher information metric calculated for Gibbs distributions as the ones found in equilibrium statistical mechanics.[5][6]

Change in free entropy edit

The action of a curve on a Riemannian manifold is given by

 

The path parameter here is time t; this action can be understood to give the change in free entropy of a system as it is moved from time a to time b.[6] Specifically, one has

 

as the change in free entropy. This observation has resulted in practical applications in chemical and processing industry[citation needed]: in order to minimize the change in free entropy of a system, one should follow the minimum geodesic path between the desired endpoints of the process. The geodesic minimizes the entropy, due to the Cauchy–Schwarz inequality, which states that the action is bounded below by the length of the curve, squared.

Relation to the Jensen–Shannon divergence edit

The Fisher metric also allows the action and the curve length to be related to the Jensen–Shannon divergence.[6] Specifically, one has

 

where the integrand dJSD is understood to be the infinitesimal change in the Jensen–Shannon divergence along the path taken. Similarly, for the curve length, one has

 

That is, the square root of the Jensen–Shannon divergence is just the Fisher metric (divided by the square root of 8).

As Euclidean metric edit

For a discrete probability space, that is, a probability space on a finite set of objects, the Fisher metric can be understood to simply be the Euclidean metric restricted to a positive orthant (e.g. "quadrant" in  ) of a unit sphere, after appropriate changes of variable.[7]

Consider a flat, Euclidean space, of dimension N+1, parametrized by points  . The metric for Euclidean space is given by

 

where the   are 1-forms; they are the basis vectors for the cotangent space. Writing   as the basis vectors for the tangent space, so that

 ,

the Euclidean metric may be written as

 

The superscript 'flat' is there to remind that, when written in coordinate form, this metric is with respect to the flat-space coordinate  .

An N-dimensional unit sphere embedded in (N + 1)-dimensional Euclidean space may be defined as

 

This embedding induces a metric on the sphere, it is inherited directly from the Euclidean metric on the ambient space. It takes exactly the same form as the above, taking care to ensure that the coordinates are constrained to lie on the surface of the sphere. This can be done, e.g. with the technique of Lagrange multipliers.

Consider now the change of variable  . The sphere condition now becomes the probability normalization condition

 

while the metric becomes

 

The last can be recognized as one-fourth of the Fisher information metric. To complete the process, recall that the probabilities are parametric functions of the manifold variables  , that is, one has  . Thus, the above induces a metric on the parameter manifold:

 

or, in coordinate form, the Fisher information metric is:

 

where, as before,

 

The superscript 'fisher' is present to remind that this expression is applicable for the coordinates  ; whereas the non-coordinate form is the same as the Euclidean (flat-space) metric. That is, the Fisher information metric on a statistical manifold is simply (four times) the Euclidean metric restricted to the positive orthant of the sphere, after appropriate changes of variable.

When the random variable   is not discrete, but continuous, the argument still holds. This can be seen in one of two different ways. One way is to carefully recast all of the above steps in an infinite-dimensional space, being careful to define limits appropriately, etc., in order to make sure that all manipulations are well-defined, convergent, etc. The other way, as noted by Gromov,[7] is to use a category-theoretic approach; that is, to note that the above manipulations remain valid in the category of probabilities. Here, one should note that such a category would have the Radon–Nikodym property, that is, the Radon–Nikodym theorem holds in this category. This includes the Hilbert spaces; these are square-integrable, and in the manipulations above, this is sufficient to safely replace the sum over squares by an integral over squares.

As Fubini–Study metric edit

The above manipulations deriving the Fisher metric from the Euclidean metric can be extended to complex projective Hilbert spaces. In this case, one obtains the Fubini–Study metric.[8] This should perhaps be no surprise, as the Fubini–Study metric provides the means of measuring information in quantum mechanics. The Bures metric, also known as the Helstrom metric, is identical to the Fubini–Study metric,[8] although the latter is usually written in terms of pure states, as below, whereas the Bures metric is written for mixed states. By setting the phase of the complex coordinate to zero, one obtains exactly one-fourth of the Fisher information metric, exactly as above.

One begins with the same trick, of constructing a probability amplitude, written in polar coordinates, so:

 

Here,   is a complex-valued probability amplitude;   and   are strictly real. The previous calculations are obtained by setting  . The usual condition that probabilities lie within a simplex, namely that

 

is equivalently expressed by the idea the square amplitude be normalized:

 

When   is real, this is the surface of a sphere.

The Fubini–Study metric, written in infinitesimal form, using quantum-mechanical bra–ket notation, is

 

In this notation, one has that   and integration over the entire measure space X is written as

 

The expression   can be understood to be an infinitesimal variation; equivalently, it can be understood to be a 1-form in the cotangent space. Using the infinitesimal notation, the polar form of the probability above is simply

 

Inserting the above into the Fubini–Study metric gives:

 

Setting   in the above makes it clear that the first term is (one-fourth of) the Fisher information metric. The full form of the above can be made slightly clearer by changing notation to that of standard Riemannian geometry, so that the metric becomes a symmetric 2-form acting on the tangent space. The change of notation is done simply replacing   and   and noting that the integrals are just expectation values; so:

 

The imaginary term is a symplectic form, it is the Berry phase or geometric phase. In index notation, the metric is:

 

Again, the first term can be clearly seen to be (one fourth of) the Fisher information metric, by setting  . Equivalently, the Fubini–Study metric can be understood as the metric on complex projective Hilbert space that is induced by the complex extension of the flat Euclidean metric. The difference between this, and the Bures metric, is that the Bures metric is written in terms of mixed states.

Continuously-valued probabilities edit

A slightly more formal, abstract definition can be given, as follows.[9]

Let X be an orientable manifold, and let   be a measure on X. Equivalently, let   be a probability space on  , with sigma algebra   and probability  .

The statistical manifold S(X) of X is defined as the space of all measures   on X (with the sigma-algebra   held fixed). Note that this space is infinite-dimensional, and is commonly taken to be a Fréchet space. The points of S(X) are measures.

Pick a point   and consider the tangent space  . The Fisher information metric is then an inner product on the tangent space. With some abuse of notation, one may write this as

 

Here,   and   are vectors in the tangent space; that is,  . The abuse of notation is to write the tangent vectors as if they are derivatives, and to insert the extraneous d in writing the integral: the integration is meant to be carried out using the measure   over the whole space X. This abuse of notation is, in fact, taken to be perfectly normal in measure theory; it is the standard notation for the Radon–Nikodym derivative.

In order for the integral to be well-defined, the space S(X) must have the Radon–Nikodym property, and more specifically, the tangent space is restricted to those vectors that are square-integrable. Square integrability is equivalent to saying that a Cauchy sequence converges to a finite value under the weak topology: the space contains its limit points. Note that Hilbert spaces possess this property.

This definition of the metric can be seen to be equivalent to the previous, in several steps. First, one selects a submanifold of S(X) by considering only those measures   that are parameterized by some smoothly varying parameter  . Then, if   is finite-dimensional, then so is the submanifold; likewise, the tangent space has the same dimension as  .

With some additional abuse of language, one notes that the exponential map provides a map from vectors in a tangent space to points in an underlying manifold. Thus, if   is a vector in the tangent space, then   is the corresponding probability associated with point   (after the parallel transport of the exponential map to  .) Conversely, given a point  , the logarithm gives a point in the tangent space (roughly speaking, as again, one must transport from the origin to point  ; for details, refer to original sources). Thus, one has the appearance of logarithms in the simpler definition, previously given.

See also edit

Notes edit

  1. ^ Nielsen, Frank (2023). "A Simple Approximation Method for the Fisher–Rao Distance between Multivariate Normal Distributions". Entropy. 25 (4): 654. arXiv:2302.08175. Bibcode:2023Entrp..25..654N. doi:10.3390/e25040654. PMC 10137715. PMID 37190442.
  2. ^ Amari, Shun-ichi; Nagaoka, Horishi (2000). "Chentsov's theorem and some historical remarks". Methods of Information Geometry. New York: Oxford University Press. pp. 37–40. ISBN 0-8218-0531-2.
  3. ^ Dowty, James G. (2018). "Chentsov's theorem for exponential families". Information Geometry. 1 (1): 117–135. arXiv:1701.08895. doi:10.1007/s41884-018-0006-4. S2CID 5954036.
  4. ^ Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2nd ed.). Hoboken: John Wiley & Sons. ISBN 0-471-24195-4.
  5. ^ Brody, Dorje; Hook, Daniel (2008). "Information geometry in vapour-liquid equilibrium". Journal of Physics A. 42 (2): 023001. arXiv:0809.1166. doi:10.1088/1751-8113/42/2/023001. S2CID 118311636.
  6. ^ a b c Crooks, Gavin E. (2009). "Measuring thermodynamic length". Physical Review Letters. 99 (10): 100602. arXiv:0706.0559. doi:10.1103/PhysRevLett.99.100602. PMID 17930381. S2CID 7527491.
  7. ^ a b Gromov, Misha (2012). "In a Search for a Structure, Part 1: On Entropy" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ a b Facchi, Paolo; et al. (2010). "Classical and Quantum Fisher Information in the Geometrical Formulation of Quantum Mechanics". Physics Letters A. 374 (48): 4801–4803. arXiv:1009.5219. Bibcode:2010PhLA..374.4801F. doi:10.1016/j.physleta.2010.10.005. S2CID 55558124.
  9. ^ Itoh, Mitsuhiro; Shishido, Yuichi (2008). "Fisher information metric and Poisson kernels" (PDF). Differential Geometry and Its Applications. 26 (4): 347–356. doi:10.1016/j.difgeo.2007.11.027. hdl:2241/100265.

References edit

  • Feng, Edward H.; Crooks, Gavin E. (2009). "Far-from-equilibrium measurements of thermodynamic length". Physical Review E. 79 (1 Pt 1): 012104. arXiv:0807.0621. Bibcode:2009PhRvE..79a2104F. doi:10.1103/PhysRevE.79.012104. PMID 19257090. S2CID 8210246.
  • Shun'ichi Amari (1985) Differential-geometrical methods in statistics, Lecture Notes in Statistics, Springer-Verlag, Berlin.
  • Shun'ichi Amari, Hiroshi Nagaoka (2000) Methods of information geometry, Translations of mathematical monographs; v. 191, American Mathematical Society.
  • Paolo Gibilisco, Eva Riccomagno, Maria Piera Rogantin and Henry P. Wynn, (2009) Algebraic and Geometric Methods in Statistics, Cambridge U. Press, Cambridge.

fisher, information, metric, information, geometry, particular, riemannian, metric, which, defined, smooth, statistical, manifold, smooth, manifold, whose, points, probability, measures, defined, common, probability, space, used, calculate, informational, diff. In information geometry the Fisher information metric 1 is a particular Riemannian metric which can be defined on a smooth statistical manifold i e a smooth manifold whose points are probability measures defined on a common probability space It can be used to calculate the informational difference between measurements clarification needed The metric is interesting in several aspects By Chentsov s theorem the Fisher information metric on statistical models is the only Riemannian metric up to rescaling that is invariant under sufficient statistics 2 3 It can also be understood to be the infinitesimal form of the relative entropy i e the Kullback Leibler divergence specifically it is the Hessian of the divergence Alternately it can be understood as the metric induced by the flat space Euclidean metric after appropriate changes of variable When extended to complex projective Hilbert space it becomes the Fubini Study metric when written in terms of mixed states it is the quantum Bures metric Considered purely as a matrix it is known as the Fisher information matrix Considered as a measurement technique where it is used to estimate hidden parameters in terms of observed random variables it is known as the observed information Contents 1 Definition 2 Relation to the Kullback Leibler divergence 3 Relation to Ruppeiner geometry 4 Change in free entropy 5 Relation to the Jensen Shannon divergence 6 As Euclidean metric 7 As Fubini Study metric 8 Continuously valued probabilities 9 See also 10 Notes 11 ReferencesDefinition editGiven a statistical manifold with coordinates 8 8 1 8 2 8 n displaystyle theta theta 1 theta 2 ldots theta n nbsp one writes p x 8 displaystyle p x theta nbsp for the probability density as a function of 8 displaystyle theta nbsp Here x displaystyle x nbsp is drawn from the value space R for a discrete or continuous random variable X The probability is normalized by R p x 8 d x 1 displaystyle int R p x theta dx 1 nbsp where p x 8 d x displaystyle p x theta dx nbsp is the distribution of X displaystyle X nbsp The Fisher information metric then takes the form clarification needed g j k 8 R 2 log p x 8 8 j 8 k p x 8 d x displaystyle g jk theta int R frac partial 2 log p x theta partial theta j partial theta k p x theta dx nbsp The integral is performed over all values x in R The variable 8 displaystyle theta nbsp is now a coordinate on a Riemann manifold The labels j and k index the local coordinate axes on the manifold When the probability is derived from the Gibbs measure as it would be for any Markovian process then 8 displaystyle theta nbsp can also be understood to be a Lagrange multiplier Lagrange multipliers are used to enforce constraints such as holding the expectation value of some quantity constant If there are n constraints holding n different expectation values constant then the dimension of the manifold is n dimensions smaller than the original space In this case the metric can be explicitly derived from the partition function a derivation and discussion is presented there Substituting i x 8 log p x 8 displaystyle i x theta log p x theta nbsp from information theory an equivalent form of the above definition is g j k 8 R 2 i x 8 8 j 8 k p x 8 d x E 2 i x 8 8 j 8 k displaystyle g jk theta int R frac partial 2 i x theta partial theta j partial theta k p x theta dx mathrm E left frac partial 2 i x theta partial theta j partial theta k right nbsp To show that the equivalent form equals the above definition note that E log p x 8 8 j 0 displaystyle mathrm E left frac partial log p x theta partial theta j right 0 nbsp and apply 8 k displaystyle frac partial partial theta k nbsp on both sides Relation to the Kullback Leibler divergence editAlternatively the metric can be obtained as the second derivative of the relative entropy or Kullback Leibler divergence 4 To obtain this one considers two probability distributions P 8 displaystyle P theta nbsp and P 8 0 displaystyle P theta 0 nbsp which are infinitesimally close to one another so that P 8 P 8 0 j D 8 j P 8 j 8 0 displaystyle P theta P theta 0 sum j Delta theta j left frac partial P partial theta j right theta 0 nbsp with D 8 j displaystyle Delta theta j nbsp an infinitesimally small change of 8 displaystyle theta nbsp in the j direction Then since the Kullback Leibler divergence D K L P 8 0 P 8 displaystyle D mathrm KL P theta 0 P theta nbsp has an absolute minimum of 0 when P 8 P 8 0 displaystyle P theta P theta 0 nbsp one has an expansion up to second order in 8 8 0 displaystyle theta theta 0 nbsp of the form f 8 0 8 D K L P 8 0 P 8 1 2 j k D 8 j D 8 k g j k 8 0 O D 8 3 displaystyle f theta 0 theta D mathrm KL P theta 0 P theta frac 1 2 sum jk Delta theta j Delta theta k g jk theta 0 mathrm O Delta theta 3 nbsp The symmetric matrix g j k displaystyle g jk nbsp is positive semi definite and is the Hessian matrix of the function f 8 0 8 displaystyle f theta 0 theta nbsp at the extremum point 8 0 displaystyle theta 0 nbsp This can be thought of intuitively as The distance between two infinitesimally close points on a statistical differential manifold is the informational difference between them Relation to Ruppeiner geometry editThe Ruppeiner metric and Weinhold metric are the Fisher information metric calculated for Gibbs distributions as the ones found in equilibrium statistical mechanics 5 6 Change in free entropy editThe action of a curve on a Riemannian manifold is given by A 1 2 a b 8 j t g j k 8 8 k t d t displaystyle A frac 1 2 int a b frac partial theta j partial t g jk theta frac partial theta k partial t dt nbsp The path parameter here is time t this action can be understood to give the change in free entropy of a system as it is moved from time a to time b 6 Specifically one has D S b a A displaystyle Delta S b a A nbsp as the change in free entropy This observation has resulted in practical applications in chemical and processing industry citation needed in order to minimize the change in free entropy of a system one should follow the minimum geodesic path between the desired endpoints of the process The geodesic minimizes the entropy due to the Cauchy Schwarz inequality which states that the action is bounded below by the length of the curve squared Relation to the Jensen Shannon divergence editThe Fisher metric also allows the action and the curve length to be related to the Jensen Shannon divergence 6 Specifically one has b a a b 8 j t g j k 8 k t d t 8 a b d J S D displaystyle b a int a b frac partial theta j partial t g jk frac partial theta k partial t dt 8 int a b dJSD nbsp where the integrand dJSD is understood to be the infinitesimal change in the Jensen Shannon divergence along the path taken Similarly for the curve length one has a b 8 j t g j k 8 k t d t 8 a b d J S D displaystyle int a b sqrt frac partial theta j partial t g jk frac partial theta k partial t dt sqrt 8 int a b sqrt dJSD nbsp That is the square root of the Jensen Shannon divergence is just the Fisher metric divided by the square root of 8 As Euclidean metric editFor a discrete probability space that is a probability space on a finite set of objects the Fisher metric can be understood to simply be the Euclidean metric restricted to a positive orthant e g quadrant in R 2 displaystyle R 2 nbsp of a unit sphere after appropriate changes of variable 7 Consider a flat Euclidean space of dimension N 1 parametrized by points y y 0 y n displaystyle y y 0 cdots y n nbsp The metric for Euclidean space is given by h i 0 N d y i d y i displaystyle h sum i 0 N dy i dy i nbsp where the d y i displaystyle textstyle dy i nbsp are 1 forms they are the basis vectors for the cotangent space Writing y j displaystyle textstyle frac partial partial y j nbsp as the basis vectors for the tangent space so that d y j y k d j k displaystyle dy j left frac partial partial y k right delta jk nbsp the Euclidean metric may be written as h j k f l a t h y j y k d j k displaystyle h jk mathrm flat h left frac partial partial y j frac partial partial y k right delta jk nbsp The superscript flat is there to remind that when written in coordinate form this metric is with respect to the flat space coordinate y displaystyle y nbsp An N dimensional unit sphere embedded in N 1 dimensional Euclidean space may be defined as i 0 N y i 2 1 displaystyle sum i 0 N y i 2 1 nbsp This embedding induces a metric on the sphere it is inherited directly from the Euclidean metric on the ambient space It takes exactly the same form as the above taking care to ensure that the coordinates are constrained to lie on the surface of the sphere This can be done e g with the technique of Lagrange multipliers Consider now the change of variable p i y i 2 displaystyle p i y i 2 nbsp The sphere condition now becomes the probability normalization condition i p i 1 displaystyle sum i p i 1 nbsp while the metric becomes h i d y i d y i i d p i d p i 1 4 i d p i d p i p i 1 4 i p i d log p i d log p i displaystyle begin aligned h amp sum i dy i dy i sum i d sqrt p i d sqrt p i amp frac 1 4 sum i frac dp i dp i p i frac 1 4 sum i p i d log p i d log p i end aligned nbsp The last can be recognized as one fourth of the Fisher information metric To complete the process recall that the probabilities are parametric functions of the manifold variables 8 displaystyle theta nbsp that is one has p i p i 8 displaystyle p i p i theta nbsp Thus the above induces a metric on the parameter manifold h 1 4 i p i 8 d log p i 8 d log p i 8 1 4 j k i p i 8 log p i 8 8 j log p i 8 8 k d 8 j d 8 k displaystyle begin aligned h amp frac 1 4 sum i p i theta d log p i theta d log p i theta amp frac 1 4 sum jk sum i p i theta frac partial log p i theta partial theta j frac partial log p i theta partial theta k d theta j d theta k end aligned nbsp or in coordinate form the Fisher information metric is g j k 8 4 h j k f i s h e r 4 h 8 j 8 k i p i 8 log p i 8 8 j log p i 8 8 k E log p i 8 8 j log p i 8 8 k displaystyle begin aligned g jk theta 4h jk mathrm fisher amp 4h left frac partial partial theta j frac partial partial theta k right amp sum i p i theta frac partial log p i theta partial theta j frac partial log p i theta partial theta k amp mathrm E left frac partial log p i theta partial theta j frac partial log p i theta partial theta k right end aligned nbsp where as before d 8 j 8 k d j k displaystyle d theta j left frac partial partial theta k right delta jk nbsp The superscript fisher is present to remind that this expression is applicable for the coordinates 8 displaystyle theta nbsp whereas the non coordinate form is the same as the Euclidean flat space metric That is the Fisher information metric on a statistical manifold is simply four times the Euclidean metric restricted to the positive orthant of the sphere after appropriate changes of variable When the random variable p displaystyle p nbsp is not discrete but continuous the argument still holds This can be seen in one of two different ways One way is to carefully recast all of the above steps in an infinite dimensional space being careful to define limits appropriately etc in order to make sure that all manipulations are well defined convergent etc The other way as noted by Gromov 7 is to use a category theoretic approach that is to note that the above manipulations remain valid in the category of probabilities Here one should note that such a category would have the Radon Nikodym property that is the Radon Nikodym theorem holds in this category This includes the Hilbert spaces these are square integrable and in the manipulations above this is sufficient to safely replace the sum over squares by an integral over squares As Fubini Study metric editThe above manipulations deriving the Fisher metric from the Euclidean metric can be extended to complex projective Hilbert spaces In this case one obtains the Fubini Study metric 8 This should perhaps be no surprise as the Fubini Study metric provides the means of measuring information in quantum mechanics The Bures metric also known as the Helstrom metric is identical to the Fubini Study metric 8 although the latter is usually written in terms of pure states as below whereas the Bures metric is written for mixed states By setting the phase of the complex coordinate to zero one obtains exactly one fourth of the Fisher information metric exactly as above One begins with the same trick of constructing a probability amplitude written in polar coordinates so ps x 8 p x 8 e i a x 8 displaystyle psi x theta sqrt p x theta e i alpha x theta nbsp Here ps x 8 displaystyle psi x theta nbsp is a complex valued probability amplitude p x 8 displaystyle p x theta nbsp and a x 8 displaystyle alpha x theta nbsp are strictly real The previous calculations are obtained by setting a x 8 0 displaystyle alpha x theta 0 nbsp The usual condition that probabilities lie within a simplex namely that X p x 8 d x 1 displaystyle int X p x theta dx 1 nbsp is equivalently expressed by the idea the square amplitude be normalized X ps x 8 2 d x 1 displaystyle int X vert psi x theta vert 2 dx 1 nbsp When ps x 8 displaystyle psi x theta nbsp is real this is the surface of a sphere The Fubini Study metric written in infinitesimal form using quantum mechanical bra ket notation is d s 2 d ps d ps ps ps d ps ps ps d ps ps ps 2 displaystyle ds 2 frac langle delta psi mid delta psi rangle langle psi mid psi rangle frac langle delta psi mid psi rangle langle psi mid delta psi rangle langle psi mid psi rangle 2 nbsp In this notation one has that x ps ps x 8 displaystyle langle x mid psi rangle psi x theta nbsp and integration over the entire measure space X is written as ϕ ps X ϕ x 8 ps x 8 d x displaystyle langle phi mid psi rangle int X phi x theta psi x theta dx nbsp The expression d ps displaystyle vert delta psi rangle nbsp can be understood to be an infinitesimal variation equivalently it can be understood to be a 1 form in the cotangent space Using the infinitesimal notation the polar form of the probability above is simply d ps d p 2 p i d a ps displaystyle delta psi left frac delta p 2p i delta alpha right psi nbsp Inserting the above into the Fubini Study metric gives d s 2 1 4 X d log p 2 p d x X d a 2 p d x X d a p d x 2 i 2 X d log p d a d a d log p p d x displaystyle begin aligned ds 2 amp frac 1 4 int X delta log p 2 p dx 8pt amp int X delta alpha 2 p dx left int X delta alpha p dx right 2 8pt amp frac i 2 int X delta log p delta alpha delta alpha delta log p p dx end aligned nbsp Setting d a 0 displaystyle delta alpha 0 nbsp in the above makes it clear that the first term is one fourth of the Fisher information metric The full form of the above can be made slightly clearer by changing notation to that of standard Riemannian geometry so that the metric becomes a symmetric 2 form acting on the tangent space The change of notation is done simply replacing d d displaystyle delta to d nbsp and d s 2 h displaystyle ds 2 to h nbsp and noting that the integrals are just expectation values so h 1 4 E d log p 2 E d a 2 E d a 2 i 2 E d log p d a displaystyle begin aligned h amp frac 1 4 mathrm E left d log p 2 right mathrm E left d alpha 2 right left mathrm E left d alpha right right 2 8pt amp frac i 2 mathrm E left d log p wedge d alpha right end aligned nbsp The imaginary term is a symplectic form it is the Berry phase or geometric phase In index notation the metric is h j k h 8 j 8 k 1 4 E log p 8 j log p 8 k E a 8 j a 8 k E a 8 j E a 8 k i 2 E log p 8 j a 8 k a 8 j log p 8 k displaystyle begin aligned h jk amp h left frac partial partial theta j frac partial partial theta k right 8pt amp frac 1 4 mathrm E left frac partial log p partial theta j frac partial log p partial theta k right 8pt amp mathrm E left frac partial alpha partial theta j frac partial alpha partial theta k right mathrm E left frac partial alpha partial theta j right mathrm E left frac partial alpha partial theta k right 8pt amp frac i 2 mathrm E left frac partial log p partial theta j frac partial alpha partial theta k frac partial alpha partial theta j frac partial log p partial theta k right end aligned nbsp Again the first term can be clearly seen to be one fourth of the Fisher information metric by setting a 0 displaystyle alpha 0 nbsp Equivalently the Fubini Study metric can be understood as the metric on complex projective Hilbert space that is induced by the complex extension of the flat Euclidean metric The difference between this and the Bures metric is that the Bures metric is written in terms of mixed states Continuously valued probabilities editA slightly more formal abstract definition can be given as follows 9 Let X be an orientable manifold and let X S m displaystyle X Sigma mu nbsp be a measure on X Equivalently let W F P displaystyle Omega mathcal F P nbsp be a probability space on W X displaystyle Omega X nbsp with sigma algebra F S displaystyle mathcal F Sigma nbsp and probability P m displaystyle P mu nbsp The statistical manifold S X of X is defined as the space of all measures m displaystyle mu nbsp on X with the sigma algebra S displaystyle Sigma nbsp held fixed Note that this space is infinite dimensional and is commonly taken to be a Frechet space The points of S X are measures Pick a point m S X displaystyle mu in S X nbsp and consider the tangent space T m S displaystyle T mu S nbsp The Fisher information metric is then an inner product on the tangent space With some abuse of notation one may write this as g s 1 s 2 X d s 1 d m d s 2 d m d m displaystyle g sigma 1 sigma 2 int X frac d sigma 1 d mu frac d sigma 2 d mu d mu nbsp Here s 1 displaystyle sigma 1 nbsp and s 2 displaystyle sigma 2 nbsp are vectors in the tangent space that is s 1 s 2 T m S displaystyle sigma 1 sigma 2 in T mu S nbsp The abuse of notation is to write the tangent vectors as if they are derivatives and to insert the extraneous d in writing the integral the integration is meant to be carried out using the measure m displaystyle mu nbsp over the whole space X This abuse of notation is in fact taken to be perfectly normal in measure theory it is the standard notation for the Radon Nikodym derivative In order for the integral to be well defined the space S X must have the Radon Nikodym property and more specifically the tangent space is restricted to those vectors that are square integrable Square integrability is equivalent to saying that a Cauchy sequence converges to a finite value under the weak topology the space contains its limit points Note that Hilbert spaces possess this property This definition of the metric can be seen to be equivalent to the previous in several steps First one selects a submanifold of S X by considering only those measures m displaystyle mu nbsp that are parameterized by some smoothly varying parameter 8 displaystyle theta nbsp Then if 8 displaystyle theta nbsp is finite dimensional then so is the submanifold likewise the tangent space has the same dimension as 8 displaystyle theta nbsp With some additional abuse of language one notes that the exponential map provides a map from vectors in a tangent space to points in an underlying manifold Thus if s T m S displaystyle sigma in T mu S nbsp is a vector in the tangent space then p exp s displaystyle p exp sigma nbsp is the corresponding probability associated with point p S X displaystyle p in S X nbsp after the parallel transport of the exponential map to m displaystyle mu nbsp Conversely given a point p S X displaystyle p in S X nbsp the logarithm gives a point in the tangent space roughly speaking as again one must transport from the origin to point m displaystyle mu nbsp for details refer to original sources Thus one has the appearance of logarithms in the simpler definition previously given See also editCramer Rao bound Fisher information Hellinger distance Information geometryNotes edit Nielsen Frank 2023 A Simple Approximation Method for the Fisher Rao Distance between Multivariate Normal Distributions Entropy 25 4 654 arXiv 2302 08175 Bibcode 2023Entrp 25 654N doi 10 3390 e25040654 PMC 10137715 PMID 37190442 Amari Shun ichi Nagaoka Horishi 2000 Chentsov s theorem and some historical remarks Methods of Information Geometry New York Oxford University Press pp 37 40 ISBN 0 8218 0531 2 Dowty James G 2018 Chentsov s theorem for exponential families Information Geometry 1 1 117 135 arXiv 1701 08895 doi 10 1007 s41884 018 0006 4 S2CID 5954036 Cover Thomas M Thomas Joy A 2006 Elements of Information Theory 2nd ed Hoboken John Wiley amp Sons ISBN 0 471 24195 4 Brody Dorje Hook Daniel 2008 Information geometry in vapour liquid equilibrium Journal of Physics A 42 2 023001 arXiv 0809 1166 doi 10 1088 1751 8113 42 2 023001 S2CID 118311636 a b c Crooks Gavin E 2009 Measuring thermodynamic length Physical Review Letters 99 10 100602 arXiv 0706 0559 doi 10 1103 PhysRevLett 99 100602 PMID 17930381 S2CID 7527491 a b Gromov Misha 2012 In a Search for a Structure Part 1 On Entropy PDF a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help a b Facchi Paolo et al 2010 Classical and Quantum Fisher Information in the Geometrical Formulation of Quantum Mechanics Physics Letters A 374 48 4801 4803 arXiv 1009 5219 Bibcode 2010PhLA 374 4801F doi 10 1016 j physleta 2010 10 005 S2CID 55558124 Itoh Mitsuhiro Shishido Yuichi 2008 Fisher information metric and Poisson kernels PDF Differential Geometry and Its Applications 26 4 347 356 doi 10 1016 j difgeo 2007 11 027 hdl 2241 100265 References editGarvesh Raskutti Sayan Mukherjee 2014 The information geometry of mirror descent https arxiv org pdf 1310 7780 pdf Feng Edward H Crooks Gavin E 2009 Far from equilibrium measurements of thermodynamic length Physical Review E 79 1 Pt 1 012104 arXiv 0807 0621 Bibcode 2009PhRvE 79a2104F doi 10 1103 PhysRevE 79 012104 PMID 19257090 S2CID 8210246 Shun ichi Amari 1985 Differential geometrical methods in statistics Lecture Notes in Statistics Springer Verlag Berlin Shun ichi Amari Hiroshi Nagaoka 2000 Methods of information geometry Translations of mathematical monographs v 191 American Mathematical Society Paolo Gibilisco Eva Riccomagno Maria Piera Rogantin and Henry P Wynn 2009 Algebraic and Geometric Methods in Statistics Cambridge U Press Cambridge Retrieved from https en wikipedia org w index php title Fisher information metric amp oldid 1219428879, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.