fbpx
Wikipedia

Typical set

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence Xn using nH(X) bits on average, and, hence, justifying the use of entropy as a measure of information from a source.

The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases.

(Weakly) typical sequences (weak typicality, entropy typicality) edit

If a sequence x1, ..., xn is drawn from an independent identically-distributed random variable (IID) X defined over a finite alphabet  , then the typical set, Aε(n) (n) is defined as those sequences which satisfy:

 

where

 

is the information entropy of X. The probability above need only be within a factor of 2n ε. Taking the logarithm on all sides and dividing by -n, this definition can be equivalently stated as

 

For i.i.d sequence, since

 

we further have

 

By the law of large numbers, for sufficiently large n

 

Properties edit

An essential characteristic of the typical set is that, if one draws a large number n of independent random samples from the distribution X, the resulting sequence (x1x2, ..., xn) is very likely to be a member of the typical set, even though the typical set comprises only a small fraction of all the possible sequences. Formally, given any  , one can choose n such that:

  1. The probability of a sequence from X(n) being drawn from Aε(n) is greater than 1 − ε, i.e.  
  2.  
  3.  
  4. If the distribution over   is not uniform, then the fraction of sequences that are typical is
 
as n becomes very large, since   where   is the cardinality of  .

For a general stochastic process {X(t)} with AEP, the (weakly) typical set can be defined similarly with p(x1x2, ..., xn) replaced by p(x0τ) (i.e. the probability of the sample limited to the time interval [0, τ]), n being the degree of freedom of the process in the time interval and H(X) being the entropy rate. If the process is continuous valued, differential entropy is used instead.

Example edit

Counter-intuitively, the most likely sequence is often not a member of the typical set. For example, suppose that X is an i.i.d Bernoulli random variable with p(0)=0.1 and p(1)=0.9. In n independent trials, since p(1)>p(0), the most likely sequence of outcome is the sequence of all 1's, (1,1,...,1). Here the entropy of X is H(X)=0.469, while

 

So this sequence is not in the typical set because its average logarithmic probability cannot come arbitrarily close to the entropy of the random variable X no matter how large we take the value of n.

For Bernoulli random variables, the typical set consists of sequences with average numbers of 0s and 1s in n independent trials. This is easily demonstrated: If p(1) = p and p(0) = 1-p, then for n trials with m 1's, we have

 

The average number of 1's in a sequence of Bernoulli trials is m = np. Thus, we have

 

For this example, if n=10, then the typical set consist of all sequences that have a single 0 in the entire sequence. In case p(0)=p(1)=0.5, then every possible binary sequences belong to the typical set.

Strongly typical sequences (strong typicality, letter typicality) edit

If a sequence x1, ..., xn is drawn from some specified joint distribution defined over a finite or an infinite alphabet  , then the strongly typical set, Aε,strong(n)  is defined as the set of sequences which satisfy

 

where   is the number of occurrences of a specific symbol in the sequence.

It can be shown that strongly typical sequences are also weakly typical (with a different constant ε), and hence the name. The two forms, however, are not equivalent. Strong typicality is often easier to work with in proving theorems for memoryless channels. However, as is apparent from the definition, this form of typicality is only defined for random variables having finite support.

Jointly typical sequences edit

Two sequences   and   are jointly ε-typical if the pair   is ε-typical with respect to the joint distribution   and both   and   are ε-typical with respect to their marginal distributions   and  . The set of all such pairs of sequences   is denoted by  . Jointly ε-typical n-tuple sequences are defined similarly.

Let   and   be two independent sequences of random variables with the same marginal distributions   and  . Then for any ε>0, for sufficiently large n, jointly typical sequences satisfy the following properties:

  1.  
  2.  
  3.  
  4.  
  5.  

Applications of typicality edit

Typical set encoding edit

In information theory, typical set encoding encodes only the sequences in the typical set of a stochastic source with fixed length block codes. Since the size of the typical set is about 2nH(X), only nH(X) bits are required for the coding, while at the same time ensuring that the chances of encoding error is limited to ε. Asymptotically, it is, by the AEP, lossless and achieves the minimum rate equal to the entropy rate of the source.

Typical set decoding edit

In information theory, typical set decoding is used in conjunction with random coding to estimate the transmitted message as the one with a codeword that is jointly ε-typical with the observation. i.e.

 

where   are the message estimate, codeword of message   and the observation respectively.   is defined with respect to the joint distribution   where   is the transition probability that characterizes the channel statistics, and   is some input distribution used to generate the codewords in the random codebook.

Universal null-hypothesis testing edit

Universal channel code edit

See also edit

References edit

  • C. E. Shannon, "A Mathematical Theory of Communication", Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948
  • Cover, Thomas M. (2006). "Chapter 3: Asymptotic Equipartition Property, Chapter 5: Data Compression, Chapter 8: Channel Capacity". Elements of Information Theory. John Wiley & Sons. ISBN 0-471-24195-4.
  • David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1

typical, information, theory, typical, sequences, whose, probability, close, raised, negative, power, entropy, their, source, distribution, that, this, total, probability, close, consequence, asymptotic, equipartition, property, which, kind, large, numbers, no. In information theory the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution That this set has total probability close to one is a consequence of the asymptotic equipartition property AEP which is a kind of law of large numbers The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself This has great use in compression theory as it provides a theoretical means for compressing data allowing us to represent any sequence Xn using nH X bits on average and hence justifying the use of entropy as a measure of information from a source The AEP can also be proven for a large class of stationary ergodic processes allowing typical set to be defined in more general cases Contents 1 Weakly typical sequences weak typicality entropy typicality 1 1 Properties 1 2 Example 2 Strongly typical sequences strong typicality letter typicality 3 Jointly typical sequences 4 Applications of typicality 4 1 Typical set encoding 4 2 Typical set decoding 4 3 Universal null hypothesis testing 4 4 Universal channel code 5 See also 6 References Weakly typical sequences weak typicality entropy typicality editIf a sequence x1 xn is drawn from an independent identically distributed random variable IID X defined over a finite alphabet X displaystyle mathcal X nbsp then the typical set Ae n X displaystyle in mathcal X nbsp n is defined as those sequences which satisfy 2 n H X e p x 1 x 2 x n 2 n H X e displaystyle 2 n H X varepsilon leqslant p x 1 x 2 dots x n leqslant 2 n H X varepsilon nbsp where H X x X p x log 2 p x displaystyle H X sum x in mathcal X p x log 2 p x nbsp is the information entropy of X The probability above need only be within a factor of 2n e Taking the logarithm on all sides and dividing by n this definition can be equivalently stated as H X e 1 n log 2 p x 1 x 2 x n H X e displaystyle H X varepsilon leq frac 1 n log 2 p x 1 x 2 ldots x n leq H X varepsilon nbsp For i i d sequence since p x 1 x 2 x n i 1 n p x i displaystyle p x 1 x 2 ldots x n prod i 1 n p x i nbsp we further have H X e 1 n i 1 n log 2 p x i H X e displaystyle H X varepsilon leq frac 1 n sum i 1 n log 2 p x i leq H X varepsilon nbsp By the law of large numbers for sufficiently large n 1 n i 1 n log 2 p x i H X displaystyle frac 1 n sum i 1 n log 2 p x i rightarrow H X nbsp Properties edit An essential characteristic of the typical set is that if one draws a large number n of independent random samples from the distribution X the resulting sequence x1 x2 xn is very likely to be a member of the typical set even though the typical set comprises only a small fraction of all the possible sequences Formally given any e gt 0 displaystyle varepsilon gt 0 nbsp one can choose n such that The probability of a sequence from X n being drawn from Ae n is greater than 1 e i e P r x n A ϵ n 1 e displaystyle Pr x n in A epsilon n geq 1 varepsilon nbsp A e n 2 n H X e displaystyle left A varepsilon n right leqslant 2 n H X varepsilon nbsp A e n 1 e 2 n H X e displaystyle left A varepsilon n right geqslant 1 varepsilon 2 n H X varepsilon nbsp If the distribution over X displaystyle mathcal X nbsp is not uniform then the fraction of sequences that are typical is A ϵ n X n 2 n H X 2 n log 2 X 2 n log 2 X H X 0 displaystyle frac A epsilon n mathcal X n equiv frac 2 nH X 2 n log 2 mathcal X 2 n log 2 mathcal X H X rightarrow 0 nbsp dd as n becomes very large since H X lt log 2 X displaystyle H X lt log 2 mathcal X nbsp where X displaystyle mathcal X nbsp is the cardinality of X displaystyle mathcal X nbsp dd For a general stochastic process X t with AEP the weakly typical set can be defined similarly with p x1 x2 xn replaced by p x0t i e the probability of the sample limited to the time interval 0 t n being the degree of freedom of the process in the time interval and H X being the entropy rate If the process is continuous valued differential entropy is used instead Example edit Counter intuitively the most likely sequence is often not a member of the typical set For example suppose that X is an i i d Bernoulli random variable with p 0 0 1 and p 1 0 9 In n independent trials since p 1 gt p 0 the most likely sequence of outcome is the sequence of all 1 s 1 1 1 Here the entropy of X is H X 0 469 while 1 n log 2 p x n 1 1 1 1 n log 2 0 9 n 0 152 displaystyle frac 1 n log 2 p left x n 1 1 ldots 1 right frac 1 n log 2 0 9 n 0 152 nbsp So this sequence is not in the typical set because its average logarithmic probability cannot come arbitrarily close to the entropy of the random variable X no matter how large we take the value of n For Bernoulli random variables the typical set consists of sequences with average numbers of 0s and 1s in n independent trials This is easily demonstrated If p 1 p and p 0 1 p then for n trials with m 1 s we have 1 n log 2 p x n 1 n log 2 p m 1 p n m m n log 2 p n m n log 2 1 p displaystyle frac 1 n log 2 p x n frac 1 n log 2 p m 1 p n m frac m n log 2 p left frac n m n right log 2 1 p nbsp The average number of 1 s in a sequence of Bernoulli trials is m np Thus we have 1 n log 2 p x n p log 2 p 1 p log 2 1 p H X displaystyle frac 1 n log 2 p x n p log 2 p 1 p log 2 1 p H X nbsp For this example if n 10 then the typical set consist of all sequences that have a single 0 in the entire sequence In case p 0 p 1 0 5 then every possible binary sequences belong to the typical set Strongly typical sequences strong typicality letter typicality editIf a sequence x1 xn is drawn from some specified joint distribution defined over a finite or an infinite alphabet X displaystyle mathcal X nbsp then the strongly typical set Ae strong n X displaystyle in mathcal X nbsp is defined as the set of sequences which satisfy N x i n p x i lt e X displaystyle left frac N x i n p x i right lt frac varepsilon mathcal X nbsp where N x i displaystyle N x i nbsp is the number of occurrences of a specific symbol in the sequence It can be shown that strongly typical sequences are also weakly typical with a different constant e and hence the name The two forms however are not equivalent Strong typicality is often easier to work with in proving theorems for memoryless channels However as is apparent from the definition this form of typicality is only defined for random variables having finite support Jointly typical sequences editTwo sequences x n displaystyle x n nbsp and y n displaystyle y n nbsp are jointly e typical if the pair x n y n displaystyle x n y n nbsp is e typical with respect to the joint distribution p x n y n i 1 n p x i y i displaystyle p x n y n prod i 1 n p x i y i nbsp and both x n displaystyle x n nbsp and y n displaystyle y n nbsp are e typical with respect to their marginal distributions p x n displaystyle p x n nbsp and p y n displaystyle p y n nbsp The set of all such pairs of sequences x n y n displaystyle x n y n nbsp is denoted by A e n X Y displaystyle A varepsilon n X Y nbsp Jointly e typical n tuple sequences are defined similarly Let X n displaystyle tilde X n nbsp and Y n displaystyle tilde Y n nbsp be two independent sequences of random variables with the same marginal distributions p x n displaystyle p x n nbsp and p y n displaystyle p y n nbsp Then for any e gt 0 for sufficiently large n jointly typical sequences satisfy the following properties P X n Y n A e n X Y 1 ϵ displaystyle P left X n Y n in A varepsilon n X Y right geqslant 1 epsilon nbsp A e n X Y 2 n H X Y ϵ displaystyle left A varepsilon n X Y right leqslant 2 n H X Y epsilon nbsp A e n X Y 1 ϵ 2 n H X Y ϵ displaystyle left A varepsilon n X Y right geqslant 1 epsilon 2 n H X Y epsilon nbsp P X n Y n A e n X Y 2 n I X Y 3 ϵ displaystyle P left tilde X n tilde Y n in A varepsilon n X Y right leqslant 2 n I X Y 3 epsilon nbsp P X n Y n A e n X Y 1 ϵ 2 n I X Y 3 ϵ displaystyle P left tilde X n tilde Y n in A varepsilon n X Y right geqslant 1 epsilon 2 n I X Y 3 epsilon nbsp This section needs expansion You can help by adding to it December 2009 Applications of typicality editThis section needs expansion You can help by adding to it December 2009 Typical set encoding edit Further information Shannon s source coding theorem In information theory typical set encoding encodes only the sequences in the typical set of a stochastic source with fixed length block codes Since the size of the typical set is about 2nH X only nH X bits are required for the coding while at the same time ensuring that the chances of encoding error is limited to e Asymptotically it is by the AEP lossless and achieves the minimum rate equal to the entropy rate of the source This section needs expansion You can help by adding to it December 2009 Typical set decoding edit In information theory typical set decoding is used in conjunction with random coding to estimate the transmitted message as the one with a codeword that is jointly e typical with the observation i e w w w x 1 n w y 1 n A e n X Y displaystyle hat w w iff exists w x 1 n w y 1 n in A varepsilon n X Y nbsp where w x 1 n w y 1 n displaystyle hat w x 1 n w y 1 n nbsp are the message estimate codeword of message w displaystyle w nbsp and the observation respectively A e n X Y displaystyle A varepsilon n X Y nbsp is defined with respect to the joint distribution p x 1 n p y 1 n x 1 n displaystyle p x 1 n p y 1 n x 1 n nbsp where p y 1 n x 1 n displaystyle p y 1 n x 1 n nbsp is the transition probability that characterizes the channel statistics and p x 1 n displaystyle p x 1 n nbsp is some input distribution used to generate the codewords in the random codebook This section needs expansion You can help by adding to it December 2009 Universal null hypothesis testing edit This section is empty You can help by adding to it December 2009 Universal channel code edit This section needs expansion You can help by adding to it December 2009 See also algorithmic complexity theorySee also editAsymptotic equipartition property Source coding theorem Noisy channel coding theoremReferences editC E Shannon A Mathematical Theory of Communication Bell System Technical Journal vol 27 pp 379 423 623 656 July October 1948 Cover Thomas M 2006 Chapter 3 Asymptotic Equipartition Property Chapter 5 Data Compression Chapter 8 Channel Capacity Elements of Information Theory John Wiley amp Sons ISBN 0 471 24195 4 David J C MacKay Information Theory Inference and Learning Algorithms Cambridge Cambridge University Press 2003 ISBN 0 521 64298 1 Retrieved from https en wikipedia org w index php title Typical set amp oldid 1222343370, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.