fbpx
Wikipedia

Gibbs' inequality

In information theory, Gibbs' inequality is a statement about the information entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality. It was first presented by J. Willard Gibbs in the 19th century.

Josiah Willard Gibbs

Gibbs' inequality Edit

Suppose that

 

is a discrete probability distribution. Then for any other probability distribution

 

the following inequality between positive quantities (since pi and qi are between zero and one) holds:[1]: 68 

 

with equality if and only if

 

for all i. Put in words, the information entropy of a distribution P is less than or equal to its cross entropy with any other distribution Q.

The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written:[2]: 34 

 

Note that the use of base-2 logarithms is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in bits.

Proof Edit

For simplicity, we prove the statement using the natural logarithm (ln). Because

 

the particular logarithm base b > 1 that we choose only scales the relationship by the factor 1 / ln b.

Let   denote the set of all   for which pi is non-zero. Then, since   for all x > 0, with equality if and only if x=1, we have:

  

The last inequality is a consequence of the pi and qi being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero qi, however, may have been excluded since the choice of indices is conditioned upon the pi being non-zero. Therefore, the sum of the qi may be less than 1.

So far, over the index set  , we have:

 ,

or equivalently

 .

Both sums can be extended to all  , i.e. including  , by recalling that the expression   tends to 0 as   tends to 0, and   tends to   as   tends to 0. We arrive at

 

For equality to hold, we require

  1.   for all   so that the equality   holds,
  2. and   which means   if  , that is,   if  .

This can happen if and only if   for  .

Alternative proofs Edit

The result can alternatively be proved using Jensen's inequality, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of Bregman divergence. Below we give a proof based on Jensen's inequality:

Because log is a concave function, we have that:

 

Where the first inequality is due to Jensen's inequality, and the last equality is due to the same reason given in the above proof.

Furthermore, since   is strictly concave, by the equality condition of Jensen's inequality we get equality when

 

and

 

Suppose that this ratio is  , then we have that

 

Where we use the fact that   are probability distributions. Therefore, the equality happens when  .

Corollary Edit

The entropy of   is bounded by:[1]: 68 

 

The proof is trivial – simply set   for all i.

See also Edit

References Edit

  1. ^ a b Pierre Bremaud (6 December 2012). An Introduction to Probabilistic Modeling. Springer Science & Business Media. ISBN 978-1-4612-1046-7.
  2. ^ David J. C. MacKay (25 September 2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press. ISBN 978-0-521-64298-9.

gibbs, inequality, information, theory, statement, about, information, entropy, discrete, probability, distribution, several, other, bounds, entropy, probability, distributions, derived, from, including, fano, inequality, first, presented, willard, gibbs, 19th. In information theory Gibbs inequality is a statement about the information entropy of a discrete probability distribution Several other bounds on the entropy of probability distributions are derived from Gibbs inequality including Fano s inequality It was first presented by J Willard Gibbs in the 19th century Josiah Willard Gibbs Contents 1 Gibbs inequality 2 Proof 3 Alternative proofs 4 Corollary 5 See also 6 ReferencesGibbs inequality EditSuppose that P p 1 p n displaystyle P p 1 ldots p n nbsp is a discrete probability distribution Then for any other probability distribution Q q 1 q n displaystyle Q q 1 ldots q n nbsp the following inequality between positive quantities since pi and qi are between zero and one holds 1 68 i 1 n p i log p i i 1 n p i log q i displaystyle sum i 1 n p i log p i leq sum i 1 n p i log q i nbsp with equality if and only if p i q i displaystyle p i q i nbsp for all i Put in words the information entropy of a distribution P is less than or equal to its cross entropy with any other distribution Q The difference between the two quantities is the Kullback Leibler divergence or relative entropy so the inequality can also be written 2 34 D K L P Q i 1 n p i log p i q i 0 displaystyle D mathrm KL P Q equiv sum i 1 n p i log frac p i q i geq 0 nbsp Note that the use of base 2 logarithms is optional and allows one to refer to the quantity on each side of the inequality as an average surprisal measured in bits Proof EditFor simplicity we prove the statement using the natural logarithm ln Because log b a ln a ln b displaystyle log b a frac ln a ln b nbsp the particular logarithm base b gt 1 that we choose only scales the relationship by the factor 1 ln b Let I displaystyle I nbsp denote the set of all i displaystyle i nbsp for which pi is non zero Then since ln x x 1 displaystyle ln x leq x 1 nbsp for all x gt 0 with equality if and only if x 1 we have i I p i ln q i p i i I p i q i p i 1 displaystyle sum i in I p i ln frac q i p i geq sum i in I p i left frac q i p i 1 right nbsp i I q i i I p i i I q i 1 0 displaystyle sum i in I q i sum i in I p i sum i in I q i 1 geq 0 nbsp The last inequality is a consequence of the pi and qi being part of a probability distribution Specifically the sum of all non zero values is 1 Some non zero qi however may have been excluded since the choice of indices is conditioned upon the pi being non zero Therefore the sum of the qi may be less than 1 So far over the index set I displaystyle I nbsp we have i I p i ln q i p i 0 displaystyle sum i in I p i ln frac q i p i geq 0 nbsp or equivalently i I p i ln q i i I p i ln p i displaystyle sum i in I p i ln q i geq sum i in I p i ln p i nbsp Both sums can be extended to all i 1 n displaystyle i 1 ldots n nbsp i e including p i 0 displaystyle p i 0 nbsp by recalling that the expression p ln p displaystyle p ln p nbsp tends to 0 as p displaystyle p nbsp tends to 0 and ln q displaystyle ln q nbsp tends to displaystyle infty nbsp as q displaystyle q nbsp tends to 0 We arrive at i 1 n p i ln q i i 1 n p i ln p i displaystyle sum i 1 n p i ln q i geq sum i 1 n p i ln p i nbsp For equality to hold we require q i p i 1 displaystyle frac q i p i 1 nbsp for all i I displaystyle i in I nbsp so that the equality ln q i p i q i p i 1 displaystyle ln frac q i p i frac q i p i 1 nbsp holds and i I q i 1 displaystyle sum i in I q i 1 nbsp which means q i 0 displaystyle q i 0 nbsp if i I displaystyle i notin I nbsp that is q i 0 displaystyle q i 0 nbsp if p i 0 displaystyle p i 0 nbsp This can happen if and only if p i q i displaystyle p i q i nbsp for i 1 n displaystyle i 1 ldots n nbsp Alternative proofs EditThe result can alternatively be proved using Jensen s inequality the log sum inequality or the fact that the Kullback Leibler divergence is a form of Bregman divergence Below we give a proof based on Jensen s inequality Because log is a concave function we have that i p i log q i p i log i p i q i p i log i q i 0 displaystyle sum i p i log frac q i p i leq log sum i p i frac q i p i log sum i q i leq 0 nbsp Where the first inequality is due to Jensen s inequality and the last equality is due to the same reason given in the above proof Furthermore since log displaystyle log nbsp is strictly concave by the equality condition of Jensen s inequality we get equality when q 1 p 1 q 2 p 2 q n p n displaystyle frac q 1 p 1 frac q 2 p 2 cdots frac q n p n nbsp and i q i 1 displaystyle sum i q i 1 nbsp Suppose that this ratio is s displaystyle sigma nbsp then we have that 1 i q i i s p i s displaystyle 1 sum i q i sum i sigma p i sigma nbsp Where we use the fact that p q displaystyle p q nbsp are probability distributions Therefore the equality happens when p q displaystyle p q nbsp Corollary EditThe entropy of P displaystyle P nbsp is bounded by 1 68 H p 1 p n log n displaystyle H p 1 ldots p n leq log n nbsp The proof is trivial simply set q i 1 n displaystyle q i 1 n nbsp for all i See also EditInformation entropy Bregman divergence Log sum inequalityReferences Edit a b Pierre Bremaud 6 December 2012 An Introduction to Probabilistic Modeling Springer Science amp Business Media ISBN 978 1 4612 1046 7 David J C MacKay 25 September 2003 Information Theory Inference and Learning Algorithms Cambridge University Press ISBN 978 0 521 64298 9 Retrieved from https en wikipedia org w index php title Gibbs 27 inequality amp oldid 1157575617, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.