fbpx
Wikipedia

Shannon's source coding theorem

In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

Named after Claude Shannon, the source coding theorem shows that, in the limit, as the length of a stream of independent and identically-distributed random variable (i.i.d.) data tends to infinity, it is impossible to compress such data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

The source coding theorem for symbol codes places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.

Note that, for data that exhibits more dependencies (whose source is not an i.i.d. random variable), the Kolmogorov complexity, which quantifies the minimal description length of an object, is more suitable to describe the limits of data compression. Shannon entropy takes into account only frequency regularities while Kolmogorov complexity takes into account all algorithmic regularities, so in general the latter is smaller. On the other hand, if an object is generated by a random process in such a way that it has only frequency regularities, entropy is close to complexity with high probability (Shen et al. 2017).[1]

Statements edit

Source coding is a mapping from (a sequence of) symbols from an information source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is one approach to data compression.

Source coding theorem edit

In information theory, the source coding theorem (Shannon 1948)[2] informally states that (MacKay 2003, pg. 81,[3] Cover 2006, Chapter 5[4]):

N i.i.d. random variables each with entropy H(X) can be compressed into more than N H(X) bits with negligible risk of information loss, as N → ∞; but conversely, if they are compressed into fewer than N H(X) bits it is virtually certain that information will be lost.

The   coded sequence represents the compressed message in a biunivocal way, under the assumption that the decoder knows the source. From a practical point of view, this hypothesis is not always true. Consequently, when the entropy encoding is applied the transmitted message is  . Usually, the information that characterizes the source is inserted at the beginning of the transmitted message.

Source coding theorem for symbol codes edit

Let Σ1, Σ2 denote two finite alphabets and let Σ
1
and Σ
2
denote the set of all finite words from those alphabets (respectively).

Suppose that X is a random variable taking values in Σ1 and let f be a uniquely decodable code from Σ
1
to Σ
2
where 2| = a. Let S denote the random variable given by the length of codeword f (X).

If f is optimal in the sense that it has the minimal expected word length for X, then (Shannon 1948):

 

Where   denotes the expected value operator.

Proof: source coding theorem edit

Given X is an i.i.d. source, its time series X1, ..., Xn is i.i.d. with entropy H(X) in the discrete-valued case and differential entropy in the continuous-valued case. The Source coding theorem states that for any ε > 0, i.e. for any rate H(X) + ε larger than the entropy of the source, there is large enough n and an encoder that takes n i.i.d. repetition of the source, X1:n, and maps it to n(H(X) + ε) binary bits such that the source symbols X1:n are recoverable from the binary bits with probability of at least 1 − ε.

Proof of Achievability. Fix some ε > 0, and let

 

The typical set, Aε
n
, is defined as follows:

 

The asymptotic equipartition property (AEP) shows that for large enough n, the probability that a sequence generated by the source lies in the typical set, Aε
n
, as defined approaches one. In particular, for sufficiently large n,   can be made arbitrarily close to 1, and specifically, greater than   (See AEP for a proof).

The definition of typical sets implies that those sequences that lie in the typical set satisfy:

 


  • The probability of a sequence   being drawn from Aε
    n
    is greater than 1 − ε.
  •  , which follows from the left hand side (lower bound) for  .
  •  , which follows from upper bound for   and the lower bound on the total probability of the whole set Aε
    n
    .

Since   bits are enough to point to any string in this set.

The encoding algorithm: the encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary n(H(X) + ε) digit number. As long as the input sequence lies within the typical set (with probability at least 1 − ε), the encoder does not make any error. So, the probability of error of the encoder is bounded above by ε.

Proof of converse: the converse is proved by showing that any set of size smaller than Aε
n
(in the sense of exponent) would cover a set of probability bounded away from 1.

Proof: Source coding theorem for symbol codes edit

For 1 ≤ in let si denote the word length of each possible xi. Define  , where C is chosen so that q1 + ... + qn = 1. Then

 

where the second line follows from Gibbs' inequality and the fifth line follows from Kraft's inequality:

 

so log C ≤ 0.

For the second inequality we may set

 

so that

 

and so

 

and

 

and so by Kraft's inequality there exists a prefix-free code having those word lengths. Thus the minimal S satisfies

 

Extension to non-stationary independent sources edit

Fixed rate lossless source coding for discrete time non-stationary independent sources edit

Define typical set Aε
n
as:

 

Then, for given δ > 0, for n large enough, Pr(Aε
n
) > 1 − δ
. Now we just encode the sequences in the typical set, and usual methods in source coding show that the cardinality of this set is smaller than  . Thus, on an average, Hn(X) + ε bits suffice for encoding with probability greater than 1 − δ, where ε and δ can be made arbitrarily small, by making n larger.

See also edit

References edit

  1. ^ Shen, A. and Uspensky, V.A. and Vereshchagin, N. (2017). "Chapter 7.3. : Complexity and entropy". Kolmogorov Complexity and Algorithmic Randomness. American Mathematical Society. p. 226. ISBN 9781470431822.{{cite book}}: CS1 maint: multiple names: authors list (link)
  2. ^ C.E. Shannon, "A Mathematical Theory of Communication 2009-02-16 at the Wayback Machine", Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948
  3. ^ David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
  4. ^ Cover, Thomas M. (2006). "Chapter 5: Data Compression". Elements of Information Theory. John Wiley & Sons. pp. 103–142. ISBN 0-471-24195-4.

shannon, source, coding, theorem, this, article, about, theory, source, coding, data, compression, term, computer, programming, source, code, information, theory, noiseless, coding, theorem, establishes, statistical, limits, possible, data, compression, data, . This article is about the theory of source coding in data compression For the term in computer programming see Source code In information theory Shannon s source coding theorem or noiseless coding theorem establishes the statistical limits to possible data compression for data whose source is an independent identically distributed random variable and the operational meaning of the Shannon entropy Named after Claude Shannon the source coding theorem shows that in the limit as the length of a stream of independent and identically distributed random variable i i d data tends to infinity it is impossible to compress such data such that the code rate average number of bits per symbol is less than the Shannon entropy of the source without it being virtually certain that information will be lost However it is possible to get the code rate arbitrarily close to the Shannon entropy with negligible probability of loss The source coding theorem for symbol codes places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word which is viewed as a random variable and of the size of the target alphabet Note that for data that exhibits more dependencies whose source is not an i i d random variable the Kolmogorov complexity which quantifies the minimal description length of an object is more suitable to describe the limits of data compression Shannon entropy takes into account only frequency regularities while Kolmogorov complexity takes into account all algorithmic regularities so in general the latter is smaller On the other hand if an object is generated by a random process in such a way that it has only frequency regularities entropy is close to complexity with high probability Shen et al 2017 1 Contents 1 Statements 1 1 Source coding theorem 1 2 Source coding theorem for symbol codes 2 Proof source coding theorem 3 Proof Source coding theorem for symbol codes 4 Extension to non stationary independent sources 4 1 Fixed rate lossless source coding for discrete time non stationary independent sources 5 See also 6 ReferencesStatements editSource coding is a mapping from a sequence of symbols from an information source to a sequence of alphabet symbols usually bits such that the source symbols can be exactly recovered from the binary bits lossless source coding or recovered within some distortion lossy source coding This is one approach to data compression Source coding theorem edit In information theory the source coding theorem Shannon 1948 2 informally states that MacKay 2003 pg 81 3 Cover 2006 Chapter 5 4 N i i d random variables each with entropy H X can be compressed into more than N H X bits with negligible risk of information loss as N but conversely if they are compressed into fewer than N H X bits it is virtually certain that information will be lost The N H X displaystyle NH X nbsp coded sequence represents the compressed message in a biunivocal way under the assumption that the decoder knows the source From a practical point of view this hypothesis is not always true Consequently when the entropy encoding is applied the transmitted message is N H X i n f s o u r c e displaystyle NH X inf source nbsp Usually the information that characterizes the source is inserted at the beginning of the transmitted message Source coding theorem for symbol codes edit Let S1 S2 denote two finite alphabets and let S 1 and S 2 denote the set of all finite words from those alphabets respectively Suppose that X is a random variable taking values in S1 and let f be a uniquely decodable code from S 1 to S 2 where S2 a Let S denote the random variable given by the length of codeword f X If f is optimal in the sense that it has the minimal expected word length for X then Shannon 1948 H X log 2 a E S lt H X log 2 a 1 displaystyle frac H X log 2 a leq mathbb E S lt frac H X log 2 a 1 nbsp Where E displaystyle mathbb E nbsp denotes the expected value operator Proof source coding theorem editGiven X is an i i d source its time series X1 Xn is i i d with entropy H X in the discrete valued case and differential entropy in the continuous valued case The Source coding theorem states that for any e gt 0 i e for any rate H X e larger than the entropy of the source there is large enough n and an encoder that takes n i i d repetition of the source X1 n and maps it to n H X e binary bits such that the source symbols X1 n are recoverable from the binary bits with probability of at least 1 e Proof of Achievability Fix some e gt 0 and let p x 1 x n Pr X 1 x 1 X n x n displaystyle p x 1 ldots x n Pr left X 1 x 1 cdots X n x n right nbsp The typical set Aen is defined as follows A n e x 1 x n 1 n log p x 1 x n H n X lt e displaystyle A n varepsilon left x 1 cdots x n left frac 1 n log p x 1 cdots x n H n X right lt varepsilon right nbsp The asymptotic equipartition property AEP shows that for large enough n the probability that a sequence generated by the source lies in the typical set Aen as defined approaches one In particular for sufficiently large n P X 1 X 2 X n A n e displaystyle P X 1 X 2 cdots X n in A n varepsilon nbsp can be made arbitrarily close to 1 and specifically greater than 1 e displaystyle 1 varepsilon nbsp See AEP for a proof The definition of typical sets implies that those sequences that lie in the typical set satisfy 2 n H X e p x 1 x n 2 n H X e displaystyle 2 n H X varepsilon leq p left x 1 cdots x n right leq 2 n H X varepsilon nbsp The probability of a sequence X 1 X 2 X n displaystyle X 1 X 2 cdots X n nbsp being drawn from Aen is greater than 1 e A n e 2 n H X e displaystyle left A n varepsilon right leq 2 n H X varepsilon nbsp which follows from the left hand side lower bound for p x 1 x 2 x n displaystyle p x 1 x 2 cdots x n nbsp A n e 1 e 2 n H X e displaystyle left A n varepsilon right geq 1 varepsilon 2 n H X varepsilon nbsp which follows from upper bound for p x 1 x 2 x n displaystyle p x 1 x 2 cdots x n nbsp and the lower bound on the total probability of the whole set Aen Since A n e 2 n H X e n H X e displaystyle left A n varepsilon right leq 2 n H X varepsilon n H X varepsilon nbsp bits are enough to point to any string in this set The encoding algorithm the encoder checks if the input sequence lies within the typical set if yes it outputs the index of the input sequence within the typical set if not the encoder outputs an arbitrary n H X e digit number As long as the input sequence lies within the typical set with probability at least 1 e the encoder does not make any error So the probability of error of the encoder is bounded above by e Proof of converse the converse is proved by showing that any set of size smaller than Aen in the sense of exponent would cover a set of probability bounded away from 1 Proof Source coding theorem for symbol codes editFor 1 i n let si denote the word length of each possible xi Define q i a s i C displaystyle q i a s i C nbsp where C is chosen so that q1 qn 1 Then H X i 1 n p i log 2 p i i 1 n p i log 2 q i i 1 n p i log 2 a s i i 1 n p i log 2 C i 1 n p i log 2 a s i log 2 C i 1 n s i p i log 2 a E S log 2 a displaystyle begin aligned H X amp sum i 1 n p i log 2 p i amp leq sum i 1 n p i log 2 q i amp sum i 1 n p i log 2 a s i sum i 1 n p i log 2 C amp sum i 1 n p i log 2 a s i log 2 C amp leq sum i 1 n s i p i log 2 a amp mathbb E S log 2 a end aligned nbsp where the second line follows from Gibbs inequality and the fifth line follows from Kraft s inequality C i 1 n a s i 1 displaystyle C sum i 1 n a s i leq 1 nbsp so log C 0 For the second inequality we may set s i log a p i displaystyle s i lceil log a p i rceil nbsp so that log a p i s i lt log a p i 1 displaystyle log a p i leq s i lt log a p i 1 nbsp and so a s i p i displaystyle a s i leq p i nbsp and a s i p i 1 displaystyle sum a s i leq sum p i 1 nbsp and so by Kraft s inequality there exists a prefix free code having those word lengths Thus the minimal S satisfies E S p i s i lt p i log a p i 1 p i log 2 p i log 2 a 1 H X log 2 a 1 displaystyle begin aligned mathbb E S amp sum p i s i amp lt sum p i left log a p i 1 right amp sum p i frac log 2 p i log 2 a 1 amp frac H X log 2 a 1 end aligned nbsp Extension to non stationary independent sources editFixed rate lossless source coding for discrete time non stationary independent sources edit Define typical set Aen as A n e x 1 n 1 n log p X 1 X n H n X lt e displaystyle A n varepsilon left x 1 n left frac 1 n log p left X 1 cdots X n right overline H n X right lt varepsilon right nbsp Then for given d gt 0 for n large enough Pr Aen gt 1 d Now we just encode the sequences in the typical set and usual methods in source coding show that the cardinality of this set is smaller than 2 n H n X e displaystyle 2 n overline H n X varepsilon nbsp Thus on an average Hn X e bits suffice for encoding with probability greater than 1 d where e and d can be made arbitrarily small by making n larger See also editChannel coding Error exponent Noisy channel coding theoremReferences edit Shen A and Uspensky V A and Vereshchagin N 2017 Chapter 7 3 Complexity and entropy Kolmogorov Complexity and Algorithmic Randomness American Mathematical Society p 226 ISBN 9781470431822 a href Template Cite book html title Template Cite book cite book a CS1 maint multiple names authors list link C E Shannon A Mathematical Theory of Communication Archived 2009 02 16 at the Wayback Machine Bell System Technical Journal vol 27 pp 379 423 623 656 July October 1948 David J C MacKay Information Theory Inference and Learning Algorithms Cambridge Cambridge University Press 2003 ISBN 0 521 64298 1 Cover Thomas M 2006 Chapter 5 Data Compression Elements of Information Theory John Wiley amp Sons pp 103 142 ISBN 0 471 24195 4 Retrieved from https en wikipedia org w index php title Shannon 27s source coding theorem amp oldid 1182618181, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.