fbpx
Wikipedia

Asymptotic equipartition property

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.

Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the law of large numbers and ergodic theory.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through Cramér's large deviation theorem, which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in large deviations theory; intuitively, it is the large deviations that would violate equipartition, but these are unlikely.

In the field of pseudorandom number generation, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning sufficient typicality.

Definition

Given a discrete-time stationary ergodic stochastic process   on the probability space  , the asymptotic equipartition property is an assertion that

 
where   or simply   denotes the entropy rate of  , which must exist for all discrete-time stationary processes including the ergodic ones. The asymptotic equipartition property is proved for finite-valued (i.e.  ) stationary ergodic stochastic processes in the Shannon–McMillan–Breiman theorem using the ergodic theory and for any i.i.d. sources directly using the law of large numbers in both the discrete-valued case (where   is simply the entropy of a symbol) and the continuous-valued case (where H is the differential entropy instead). The definition of the asymptotic equipartition property can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven almost sure in all cases.

Discrete-time i.i.d. sources

Given   is an i.i.d. source which may take values in the alphabet  , its time series   is i.i.d. with entropy  . The weak law of large numbers gives the asymptotic equipartition property with convergence in probability,

 
since the entropy is equal to the expectation of [1]
 

The strong law of large numbers asserts the stronger almost sure convergence,

 

Discrete-time finite-valued stationary ergodic sources

Consider a finite-valued sample space  , i.e.  , for the discrete-time stationary ergodic process   defined on the probability space  . The asymptotic equipartition property for such stochastic source is known as the Shannon–McMillan–Breiman theorem, due to Claude Shannon, Brockway McMillan, and Leo Breiman.

Proof (sketch) [2]
  • Let x denote some measurable set   for some  
  • Parameterize the joint probability by n and x as
     
  • Parameterize the conditional probability by i, k and x as
     
  • Take the limit of the conditional probability as k → ∞ and denote it as
     
  • Argue the two notions of entropy rate
     
    exist and are equal for any stationary process including the stationary ergodic process X. Denote it as H.
  • Argue that both
     
    where i is the time index, are stationary ergodic processes, whose sample means converge almost surely to some values denoted by   and   respectively.
  • Define the k-th order Markov approximation to the probability   as
     
  • Argue that   is finite from the finite-value assumption.
  • Express   in terms of the sample mean of   and show that it converges almost surely to Hk
  • Define the probability measure
     
  • Express   in terms of the sample mean of   and show that it converges almost surely to H.
  • Argue that   as k → ∞ using the stationarity of the process.
  • Argue that H = H using the Lévy's martingale convergence theorem and the finite-value assumption.
  • Show that
     
    which is finite as argued before.
  • Show that
     
    by conditioning on the infinite past   and iterating the expectation.
  • Show that
     
    using the Markov's inequality and the expectation derived previously.
  • Similarly, show that
     
    which is equivalent to
     
  • Show that limsup of
     
    are non-positive almost surely by setting α = nβ for any β > 1 and applying the Borel–Cantelli lemma.
  • Show that liminf and limsup of
     
    are lower and upper bounded almost surely by H and Hk respectively by breaking up the logarithms in the previous result.
  • Complete the proof by pointing out that the upper and lower bounds are shown previously to approach H as k → ∞.

Non-stationary discrete-time source producing independent symbols

The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the asymptotic equipartition property to hold. Indeed, as is quite clear intuitively, the asymptotic equipartition property requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.

We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that   for all i, for some M > 0, the following holds (AEP):

 
where
 
Proof

The proof follows from a simple application of Markov's inequality (applied to second moment of  .

 

It is obvious that the proof holds if any moment   is uniformly bounded for r > 1 (again by Markov's inequality applied to r-th moment). Q.E.D.

Even this condition is not necessary, but given a non-stationary random process, it should not be difficult to test whether the asymptotic equipartition property holds using the above method.

Applications

The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the source coding theorem for non-stationary source (with independent output symbols) and noisy-channel coding theorem for non-stationary memoryless channels.

Continuous-time stationary ergodic sources

Discrete-time functions can be interpolated to continuous-time functions. If such interpolation f is measurable, we may define the continuous-time stationary process accordingly as  . If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.

 
where n corresponds to the degree of freedom in time τ. nH(X)/τ and H(X) are the entropy per unit time and per degree of freedom respectively, defined by Shannon.

An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous   functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists T > 1/2W, where W is the nominal bandwidth, such that the T-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process.

Any time-invariant operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.

Category theory

A category theoretic definition for the equipartition property is given by Gromov.[3] Given a sequence of Cartesian powers   of a measure space P, this sequence admits an asymptotically equivalent sequence HN of homogeneous measure spaces (i.e. all sets have the same measure; all morphisms are invariant under the group of automorphisms, and thus factor as a morphism to the terminal object) .

The above requires a definition of asymptotic equivalence. This is given in terms of a distance function, giving how much an injective correspondence differs from an isomorphism. An injective correspondence   is a partially defined map that is a bijection; that is, it is a bijection between a subset   and  . Then define

 
where |S| denotes the measure of a set S. In what follows, the measure of P and Q are taken to be 1, so that the measure spaces are probability spaces. This distance   is commonly known as the earth mover's distance or Wasserstein metric.

Similarly, define

 
with   taken to be the counting measure on P. Thus, this definition requires that P be a finite measure space. Finally, let
 

A sequence of injective correspondences   are then asymptotically equivalent when

 

Given a homogenous space sequence HN that is asymptotically equivalent to PN, the entropy H(P) of P may be taken as

 

See also

Notes

  1. ^ Cover & Thomas (1991), p. 51.
  2. ^ Algoet & Cover (1988).
  3. ^ Misha Gromov, (2012) "In a Search for a Structure, Part 1: On Entropy". (See page 5, where the equipartition property is called the 'Bernoulli approximation theorem'.)

References

Journal articles

  • Claude E. Shannon. "A Mathematical Theory of Communication". Bell System Technical Journal, July/October 1948.
  • Algoet, Paul H.; Cover, Thomas M. (1988). "A Sandwich Proof of the Shannon-McMillan-Breiman Theorem" (PDF). The Annals of Probability. 16 (2): 899–909.
  • Sergio Verdu and Te Sun Han. "The Role of the Asymptotic Equipartition Property in Noiseless Source Coding." IEEE Transactions on Information Theory, 43(3): 847–857, 1997.

Textbooks

asymptotic, equipartition, property, information, theory, asymptotic, equipartition, property, general, property, output, samples, stochastic, source, fundamental, concept, typical, used, theories, data, compression, roughly, speaking, theorem, states, that, a. In information theory the asymptotic equipartition property AEP is a general property of the output samples of a stochastic source It is fundamental to the concept of typical set used in theories of data compression Roughly speaking the theorem states that although there are many series of results that may be produced by a random process the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized This is a consequence of the law of large numbers and ergodic theory Although there are individual outcomes which have a higher probability than any outcome in this set the vast number of outcomes in the set almost guarantees that the outcome will come from the set One way of intuitively understanding the property is through Cramer s large deviation theorem which states that the probability of a large deviation from mean decays exponentially with the number of samples Such results are studied in large deviations theory intuitively it is the large deviations that would violate equipartition but these are unlikely In the field of pseudorandom number generation a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random Thus although the typical set is loosely defined practical notions arise concerning sufficient typicality Contents 1 Definition 2 Discrete time i i d sources 3 Discrete time finite valued stationary ergodic sources 4 Non stationary discrete time source producing independent symbols 4 1 Applications 5 Continuous time stationary ergodic sources 6 Category theory 7 See also 8 Notes 9 References 9 1 Journal articles 9 2 TextbooksDefinition EditGiven a discrete time stationary ergodic stochastic process X displaystyle X on the probability space W B p displaystyle Omega B p the asymptotic equipartition property is an assertion that 1 n log p X 1 X 2 X n H X as n displaystyle frac 1 n log p X 1 X 2 dots X n to H X quad text as quad n to infty where H X displaystyle H X or simply H displaystyle H denotes the entropy rate of X displaystyle X which must exist for all discrete time stationary processes including the ergodic ones The asymptotic equipartition property is proved for finite valued i e W lt displaystyle Omega lt infty stationary ergodic stochastic processes in the Shannon McMillan Breiman theorem using the ergodic theory and for any i i d sources directly using the law of large numbers in both the discrete valued case where H displaystyle H is simply the entropy of a symbol and the continuous valued case where H is the differential entropy instead The definition of the asymptotic equipartition property can also be extended for certain classes of continuous time stochastic processes for which a typical set exists for long enough observation time The convergence is proven almost sure in all cases Discrete time i i d sources EditGiven X displaystyle X is an i i d source which may take values in the alphabet X displaystyle mathcal X its time series X 1 X n displaystyle X 1 ldots X n is i i d with entropy H X displaystyle H X The weak law of large numbers gives the asymptotic equipartition property with convergence in probability lim n Pr 1 n log p X 1 X 2 X n H X gt e 0 e gt 0 displaystyle lim n to infty Pr left left frac 1 n log p X 1 X 2 ldots X n H X right gt varepsilon right 0 qquad forall varepsilon gt 0 since the entropy is equal to the expectation of 1 1 n log p X 1 X 2 X n displaystyle frac 1 n log p X 1 X 2 ldots X n The strong law of large numbers asserts the stronger almost sure convergence Pr lim n 1 n log p X 1 X 2 X n H X 1 displaystyle Pr left lim n to infty frac 1 n log p X 1 X 2 ldots X n H X right 1 Discrete time finite valued stationary ergodic sources EditConsider a finite valued sample space W displaystyle Omega i e W lt displaystyle Omega lt infty for the discrete time stationary ergodic process X X n displaystyle X X n defined on the probability space W B p displaystyle Omega B p The asymptotic equipartition property for such stochastic source is known as the Shannon McMillan Breiman theorem due to Claude Shannon Brockway McMillan and Leo Breiman Proof sketch 2 Let x denote some measurable set x X A displaystyle x X A for some A B displaystyle A in B Parameterize the joint probability by n and x as j n x p x 0 n 1 displaystyle j n x p left x 0 n 1 right Parameterize the conditional probability by i k and x as c i k x p x i x i k i 1 displaystyle c i k x p left x i mid x i k i 1 right Take the limit of the conditional probability as k and denote it as c i x p x i x i 1 displaystyle c i x p left x i mid x infty i 1 right Argue the two notions of entropy rate lim n 1 n E log j n X and lim n E log c n n X displaystyle lim n to infty frac 1 n mathrm E log j n X quad text and quad lim n to infty mathrm E log c n n X exist and are equal for any stationary process including the stationary ergodic process X Denote it as H Argue that both c i k X p X i X i k i 1 c i X p X i X i 1 displaystyle begin aligned c i k X amp left p left X i mid X i k i 1 right right c i X amp left p left X i mid X infty i 1 right right end aligned where i is the time index are stationary ergodic processes whose sample means converge almost surely to some values denoted by H k displaystyle H k and H displaystyle H infty respectively Define the k th order Markov approximation to the probability a n k x displaystyle a n k x as a n k x p X 0 k 1 i k n 1 p X i X i k i 1 j k x i k n 1 c i k x displaystyle a n k x p left X 0 k 1 right prod i k n 1 p left X i mid X i k i 1 right j k x prod i k n 1 c i k x Argue that a n k X W displaystyle a n k X Omega is finite from the finite value assumption Express 1 n log a n k X displaystyle frac 1 n log a n k X in terms of the sample mean of c i k X displaystyle c i k X and show that it converges almost surely to Hk Define the probability measure a n x p x 0 n 1 x 1 displaystyle a n x p left x 0 n 1 mid x infty 1 right Express 1 n log a n X displaystyle frac 1 n log a n X in terms of the sample mean of c i X displaystyle c i X and show that it converges almost surely to H Argue that H k H displaystyle H k searrow H as k using the stationarity of the process Argue that H H using the Levy s martingale convergence theorem and the finite value assumption Show that E a n k X j n X a n k X W displaystyle mathrm E left frac a n k X j n X right a n k X Omega which is finite as argued before Show that E j n X a n X 1 displaystyle mathrm E left frac j n X a n X right 1 by conditioning on the infinite past X 1 displaystyle X infty 1 and iterating the expectation Show that a R Pr a n k X j n X a a n k X W a displaystyle forall alpha in mathbb R Pr left frac a n k X j n X geq alpha right leq frac a n k X Omega alpha using the Markov s inequality and the expectation derived previously Similarly show that a R Pr j n X a n X a 1 a displaystyle forall alpha in mathbb R Pr left frac j n X a n X geq alpha right leq frac 1 alpha which is equivalent to a R Pr 1 n log j n X a n X 1 n log a 1 a displaystyle forall alpha in mathbb R Pr left frac 1 n log frac j n X a n X geq frac 1 n log alpha right leq frac 1 alpha Show that limsup of 1 n log a n k X j n X and 1 n log j n X a n X displaystyle frac 1 n log frac a n k X j n X quad text and quad frac 1 n log frac j n X a n X are non positive almost surely by setting a nb for any b gt 1 and applying the Borel Cantelli lemma Show that liminf and limsup of 1 n log j n X displaystyle frac 1 n log j n X are lower and upper bounded almost surely by H and Hk respectively by breaking up the logarithms in the previous result Complete the proof by pointing out that the upper and lower bounds are shown previously to approach H as k Non stationary discrete time source producing independent symbols EditThe assumptions of stationarity ergodicity identical distribution of random variables is not essential for the asymptotic equipartition property to hold Indeed as is quite clear intuitively the asymptotic equipartition property requires only some form of the law of large numbers to hold which is fairly general However the expression needs to be suitably generalized and the conditions need to be formulated precisely We assume that the source is producing independent symbols with possibly different output statistics at each instant We assume that the statistics of the process are known completely that is the marginal distribution of the process seen at each time instant is known The joint distribution is just the product of marginals Then under the condition which can be relaxed that V a r log p X i lt M displaystyle mathrm Var log p X i lt M for all i for some M gt 0 the following holds AEP lim n Pr 1 n log p X 1 X 2 X n H n X lt e 1 e gt 0 displaystyle lim n to infty Pr left left frac 1 n log p X 1 X 2 ldots X n overline H n X right lt varepsilon right 1 qquad forall varepsilon gt 0 where H n X 1 n H X 1 X 2 X n displaystyle overline H n X frac 1 n H X 1 X 2 ldots X n Proof The proof follows from a simple application of Markov s inequality applied to second moment of log p X i displaystyle log p X i Pr 1 n log p X 1 X 2 X n H X gt e 1 n 2 e 2 V a r i 1 n log p X i 2 M n e 2 0 as n displaystyle begin aligned Pr left left frac 1 n log p X 1 X 2 ldots X n overline H X right gt varepsilon right amp leq frac 1 n 2 varepsilon 2 mathrm Var left sum i 1 n left log p X i right 2 right amp leq frac M n varepsilon 2 to 0 text as n to infty end aligned It is obvious that the proof holds if any moment E log p X i r displaystyle mathrm E left log p X i r right is uniformly bounded for r gt 1 again by Markov s inequality applied to r th moment Q E D Even this condition is not necessary but given a non stationary random process it should not be difficult to test whether the asymptotic equipartition property holds using the above method Applications Edit The asymptotic equipartition property for non stationary discrete time independent process leads us to among other results the source coding theorem for non stationary source with independent output symbols and noisy channel coding theorem for non stationary memoryless channels Continuous time stationary ergodic sources EditDiscrete time functions can be interpolated to continuous time functions If such interpolation f is measurable we may define the continuous time stationary process accordingly as X f X displaystyle tilde X f circ X If the asymptotic equipartition property holds for the discrete time process as in the i i d or finite valued stationary ergodic cases shown above it automatically holds for the continuous time stationary process derived from it by some measurable interpolation i e 1 n log p X 0 t H X displaystyle frac 1 n log p tilde X 0 tau to H X where n corresponds to the degree of freedom in time t nH X t and H X are the entropy per unit time and per degree of freedom respectively defined by Shannon An important class of such continuous time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous L 2 displaystyle mathcal L 2 functions The asymptotic equipartition property holds if the process is white in which case the time samples are i i d or there exists T gt 1 2W where W is the nominal bandwidth such that the T spaced time samples take values in a finite set in which case we have the discrete time finite valued stationary ergodic process Any time invariant operations also preserves the asymptotic equipartition property stationarity and ergodicity and we may easily turn a stationary process to non stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process Category theory EditA category theoretic definition for the equipartition property is given by Gromov 3 Given a sequence of Cartesian powers P N P P displaystyle P N P times cdots times P of a measure space P this sequence admits an asymptotically equivalent sequence HN of homogeneous measure spaces i e all sets have the same measure all morphisms are invariant under the group of automorphisms and thus factor as a morphism to the terminal object The above requires a definition of asymptotic equivalence This is given in terms of a distance function giving how much an injective correspondence differs from an isomorphism An injective correspondence p P Q displaystyle pi P to Q is a partially defined map that is a bijection that is it is a bijection between a subset P P displaystyle P subset P and Q Q displaystyle Q subset Q Then define P Q p P P Q Q displaystyle P Q pi P setminus P Q setminus Q where S denotes the measure of a set S In what follows the measure of P and Q are taken to be 1 so that the measure spaces are probability spaces This distance P Q p displaystyle P Q pi is commonly known as the earth mover s distance or Wasserstein metric Similarly define log P Q p sup p P log p log p p log min set P set Q displaystyle log P Q pi frac sup p in P log p log pi p log min left operatorname set P operatorname set Q right with set P displaystyle operatorname set P taken to be the counting measure on P Thus this definition requires that P be a finite measure space Finally let dist p P Q P Q p log P Q p displaystyle text dist pi P Q P Q pi log P Q pi A sequence of injective correspondences p N P N Q N displaystyle pi N P N to Q N are then asymptotically equivalent whendist p N P N Q N 0 as N displaystyle text dist pi N P N Q N to 0 quad text as quad N to infty Given a homogenous space sequence HN that is asymptotically equivalent to PN the entropy H P of P may be taken asH P lim N 1 N set H N displaystyle H P lim N to infty frac 1 N operatorname set H N See also EditCramer s theorem large deviations Shannon s source coding theorem Noisy channel coding theoremNotes Edit Cover amp Thomas 1991 p 51 Algoet amp Cover 1988 Misha Gromov 2012 In a Search for a Structure Part 1 On Entropy See page 5 where the equipartition property is called the Bernoulli approximation theorem References EditJournal articles Edit Claude E Shannon A Mathematical Theory of Communication Bell System Technical Journal July October 1948 Algoet Paul H Cover Thomas M 1988 A Sandwich Proof of the Shannon McMillan Breiman Theorem PDF The Annals of Probability 16 2 899 909 Sergio Verdu and Te Sun Han The Role of the Asymptotic Equipartition Property in Noiseless Source Coding IEEE Transactions on Information Theory 43 3 847 857 1997 Textbooks Edit Cover Thomas M Thomas Joy A 1991 Elements of Information Theory first ed Hoboken New Jersey Wiley ISBN 978 0 471 24195 9 MacKay David J C 2003 Information Theory Inference and Learning Algorithms Cambridge University Press ISBN 0 521 64298 1 Retrieved from https en wikipedia org w index php title Asymptotic equipartition property amp oldid 1126157228, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.