fbpx
Wikipedia

Rate–distortion theory

Rate–distortion theory is a major branch of information theory which provides the theoretical foundations for lossy data compression; it addresses the problem of determining the minimal number of bits per symbol, as measured by the rate R, that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding an expected distortion D.

Introduction edit

 
Rate distortion encoder and decoder. An encoder   encodes a sequence  . The encoded sequence   is then fed to a decoder   which outputs a sequence  . We try to minimize the distortion between the original sequence   and the reconstructed sequence  .

Rate–distortion theory gives an analytical expression for how much compression can be achieved using lossy compression methods. Many of the existing audio, speech, image, and video compression techniques have transforms, quantization, and bit-rate allocation procedures that capitalize on the general shape of rate–distortion functions.

Rate–distortion theory was created by Claude Shannon in his foundational work on information theory.

In rate–distortion theory, the rate is usually understood as the number of bits per data sample to be stored or transmitted. The notion of distortion is a subject of on-going discussion.[1] In the most simple case (which is actually used in most cases), the distortion is defined as the expected value of the square of the difference between input and output signal (i.e., the mean squared error). However, since we know that most lossy compression techniques operate on data that will be perceived by human consumers (listening to music, watching pictures and video) the distortion measure should preferably be modeled on human perception and perhaps aesthetics: much like the use of probability in lossless compression, distortion measures can ultimately be identified with loss functions as used in Bayesian estimation and decision theory. In audio compression, perceptual models (and therefore perceptual distortion measures) are relatively well developed and routinely used in compression techniques such as MP3 or Vorbis, but are often not easy to include in rate–distortion theory. In image and video compression, the human perception models are less well developed and inclusion is mostly limited to the JPEG and MPEG weighting (quantization, normalization) matrix.

Distortion functions edit

Distortion functions measure the cost of representing a symbol   by an approximated symbol  . Typical distortion functions are the Hamming distortion and the Squared-error distortion.

Hamming distortion edit

 

Squared-error distortion edit

 

Rate–distortion functions edit

The functions that relate the rate and distortion are found as the solution of the following minimization problem:

 

Here  , sometimes called a test channel, is the conditional probability density function (PDF) of the communication channel output (compressed signal)   for a given input (original signal)  , and   is the mutual information between   and   defined as

 

where   and   are the entropy of the output signal Y and the conditional entropy of the output signal given the input signal, respectively:

 
 

The problem can also be formulated as a distortion–rate function, where we find the infimum over achievable distortions for given rate constraint. The relevant expression is:

 

The two formulations lead to functions which are inverses of each other.

The mutual information can be understood as a measure for 'prior' uncertainty the receiver has about the sender's signal (H(Y)), diminished by the uncertainty that is left after receiving information about the sender's signal ( ). Of course the decrease in uncertainty is due to the communicated amount of information, which is  .

As an example, in case there is no communication at all, then   and  . Alternatively, if the communication channel is perfect and the received signal   is identical to the signal   at the sender, then   and  .

In the definition of the rate–distortion function,   and   are the distortion between   and   for a given   and the prescribed maximum distortion, respectively. When we use the mean squared error as distortion measure, we have (for amplitude-continuous signals):

 

As the above equations show, calculating a rate–distortion function requires the stochastic description of the input   in terms of the PDF  , and then aims at finding the conditional PDF   that minimize rate for a given distortion  . These definitions can be formulated measure-theoretically to account for discrete and mixed random variables as well.

An analytical solution to this minimization problem is often difficult to obtain except in some instances for which we next offer two of the best known examples. The rate–distortion function of any source is known to obey several fundamental properties, the most important ones being that it is a continuous, monotonically decreasing convex (U) function and thus the shape for the function in the examples is typical (even measured rate–distortion functions in real life tend to have very similar forms).

Although analytical solutions to this problem are scarce, there are upper and lower bounds to these functions including the famous Shannon lower bound (SLB), which in the case of squared error and memoryless sources, states that for arbitrary sources with finite differential entropy,

 

where h(D) is the differential entropy of a Gaussian random variable with variance D. This lower bound is extensible to sources with memory and other distortion measures. One important feature of the SLB is that it is asymptotically tight in the low distortion regime for a wide class of sources and in some occasions, it actually coincides with the rate–distortion function. Shannon Lower Bounds can generally be found if the distortion between any two numbers can be expressed as a function of the difference between the value of these two numbers.

The Blahut–Arimoto algorithm, co-invented by Richard Blahut, is an elegant iterative technique for numerically obtaining rate–distortion functions of arbitrary finite input/output alphabet sources and much work has been done to extend it to more general problem instances.

When working with stationary sources with memory, it is necessary to modify the definition of the rate distortion function and it must be understood in the sense of a limit taken over sequences of increasing lengths.

 

where

 

and

 

where superscripts denote a complete sequence up to that time and the subscript 0 indicates initial state.

Memoryless (independent) Gaussian source with squared-error distortion edit

If we assume that   is a Gaussian random variable with variance  , and if we assume that successive samples of the signal   are stochastically independent (or equivalently, the source is memoryless, or the signal is uncorrelated), we find the following analytical expression for the rate–distortion function:

    [2]

The following figure shows what this function looks like:

 

Rate–distortion theory tell us that 'no compression system exists that performs outside the gray area'. The closer a practical compression system is to the red (lower) bound, the better it performs. As a general rule, this bound can only be attained by increasing the coding block length parameter. Nevertheless, even at unit blocklengths one can often find good (scalar) quantizers that operate at distances from the rate–distortion function that are practically relevant.[3]

This rate–distortion function holds only for Gaussian memoryless sources. It is known that the Gaussian source is the most "difficult" source to encode: for a given mean square error, it requires the greatest number of bits. The performance of a practical compression system working on—say—images, may well be below the   lower bound shown.

Memoryless (independent) Bernoulli source with Hamming distortion edit

The rate-distortion function of a Bernoulli random variable with Hamming distortion is given by:

 

where   denotes the binary entropy function.

Plot of the rate-distortion function for  :

 

Connecting rate-distortion theory to channel capacity edit

Suppose we want to transmit information about a source to the user with a distortion not exceeding D. Rate–distortion theory tells us that at least   bits/symbol of information from the source must reach the user. We also know from Shannon's channel coding theorem that if the source entropy is H bits/symbol, and the channel capacity is C (where  ), then   bits/symbol will be lost when transmitting this information over the given channel. For the user to have any hope of reconstructing with a maximum distortion D, we must impose the requirement that the information lost in transmission does not exceed the maximum tolerable loss of   bits/symbol. This means that the channel capacity must be at least as large as  .[4]

See also edit

References edit

  1. ^ Blau, Y.; Michaeli, T. (2019). "Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff" (PDF). Proceedings of the International Conference on Machine Learning. PMLR. pp. 675–685. arXiv:1901.07821.
  2. ^ Cover & Thomas 2012, p. 310
  3. ^ Cover, Thomas M.; Thomas, Joy A. (2012) [2006]. "10. Rate Distortion Theory". Elements of Information Theory (2nd ed.). Wiley. ISBN 978-1-118-58577-1.
  4. ^ Berger, Toby (1971). Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice Hall. ISBN 978-0-13-753103-5. LCCN 75-148254. OCLC 156968.

External links edit

  • Marzen, Sarah; DeDeo, Simon. "PyRated: a python package for rate distortion theory". PyRated is a very simple Python package to do the most basic calculation in rate-distortion theory: the determination of the "codebook" and the transmission rate R, given a utility function (distortion matrix) and a Lagrange multiplier beta.

rate, distortion, theory, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, m. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Rate distortion theory news newspapers books scholar JSTOR March 2012 Learn how and when to remove this message Rate distortion theory is a major branch of information theory which provides the theoretical foundations for lossy data compression it addresses the problem of determining the minimal number of bits per symbol as measured by the rate R that should be communicated over a channel so that the source input signal can be approximately reconstructed at the receiver output signal without exceeding an expected distortion D Contents 1 Introduction 2 Distortion functions 2 1 Hamming distortion 2 2 Squared error distortion 3 Rate distortion functions 3 1 Memoryless independent Gaussian source with squared error distortion 3 2 Memoryless independent Bernoulli source with Hamming distortion 4 Connecting rate distortion theory to channel capacity 5 See also 6 References 7 External linksIntroduction edit nbsp Rate distortion encoder and decoder An encoder f n displaystyle f n nbsp encodes a sequence X n displaystyle X n nbsp The encoded sequence Y n displaystyle Y n nbsp is then fed to a decoder g n displaystyle g n nbsp which outputs a sequence X n displaystyle hat X n nbsp We try to minimize the distortion between the original sequence X n displaystyle X n nbsp and the reconstructed sequence X n displaystyle hat X n nbsp Rate distortion theory gives an analytical expression for how much compression can be achieved using lossy compression methods Many of the existing audio speech image and video compression techniques have transforms quantization and bit rate allocation procedures that capitalize on the general shape of rate distortion functions Rate distortion theory was created by Claude Shannon in his foundational work on information theory In rate distortion theory the rate is usually understood as the number of bits per data sample to be stored or transmitted The notion of distortion is a subject of on going discussion 1 In the most simple case which is actually used in most cases the distortion is defined as the expected value of the square of the difference between input and output signal i e the mean squared error However since we know that most lossy compression techniques operate on data that will be perceived by human consumers listening to music watching pictures and video the distortion measure should preferably be modeled on human perception and perhaps aesthetics much like the use of probability in lossless compression distortion measures can ultimately be identified with loss functions as used in Bayesian estimation and decision theory In audio compression perceptual models and therefore perceptual distortion measures are relatively well developed and routinely used in compression techniques such as MP3 or Vorbis but are often not easy to include in rate distortion theory In image and video compression the human perception models are less well developed and inclusion is mostly limited to the JPEG and MPEG weighting quantization normalization matrix Distortion functions editDistortion functions measure the cost of representing a symbol x displaystyle x nbsp by an approximated symbol x displaystyle hat x nbsp Typical distortion functions are the Hamming distortion and the Squared error distortion Hamming distortion edit d x x 0 if x x 1 if x x displaystyle d x hat x begin cases 0 amp text if x hat x 1 amp text if x neq hat x end cases nbsp Squared error distortion edit d x x x x 2 displaystyle d x hat x left x hat x right 2 nbsp Rate distortion functions editThe functions that relate the rate and distortion are found as the solution of the following minimization problem inf Q Y X y x I Q Y X subject to D Q D displaystyle inf Q Y mid X y mid x I Q Y X text subject to D Q leq D nbsp Here Q Y X y x displaystyle Q Y mid X y mid x nbsp sometimes called a test channel is the conditional probability density function PDF of the communication channel output compressed signal Y displaystyle Y nbsp for a given input original signal X displaystyle X nbsp and I Q Y X displaystyle I Q Y X nbsp is the mutual information between Y displaystyle Y nbsp and X displaystyle X nbsp defined as I Y X H Y H Y X displaystyle I Y X H Y H Y mid X nbsp where H Y displaystyle H Y nbsp and H Y X displaystyle H Y mid X nbsp are the entropy of the output signal Y and the conditional entropy of the output signal given the input signal respectively H Y P Y y log 2 P Y y d y displaystyle H Y int infty infty P Y y log 2 P Y y dy nbsp H Y X Q Y X y x P X x log 2 Q Y X y x d x d y displaystyle H Y mid X int infty infty int infty infty Q Y mid X y mid x P X x log 2 Q Y mid X y mid x dx dy nbsp The problem can also be formulated as a distortion rate function where we find the infimum over achievable distortions for given rate constraint The relevant expression is inf Q Y X y x E D Q X Y subject to I Q Y X R displaystyle inf Q Y mid X y mid x E D Q X Y text subject to I Q Y X leq R nbsp The two formulations lead to functions which are inverses of each other The mutual information can be understood as a measure for prior uncertainty the receiver has about the sender s signal H Y diminished by the uncertainty that is left after receiving information about the sender s signal H Y X displaystyle H Y mid X nbsp Of course the decrease in uncertainty is due to the communicated amount of information which is I Y X displaystyle I left Y X right nbsp As an example in case there is no communication at all then H Y X H Y displaystyle H Y mid X H Y nbsp and I Y X 0 displaystyle I Y X 0 nbsp Alternatively if the communication channel is perfect and the received signal Y displaystyle Y nbsp is identical to the signal X displaystyle X nbsp at the sender then H Y X 0 displaystyle H Y mid X 0 nbsp and I Y X H X H Y displaystyle I Y X H X H Y nbsp In the definition of the rate distortion function D Q displaystyle D Q nbsp and D displaystyle D nbsp are the distortion between X displaystyle X nbsp and Y displaystyle Y nbsp for a given Q Y X y x displaystyle Q Y mid X y mid x nbsp and the prescribed maximum distortion respectively When we use the mean squared error as distortion measure we have for amplitude continuous signals D Q P X Y x y x y 2 d x d y Q Y X y x P X x x y 2 d x d y displaystyle D Q int infty infty int infty infty P X Y x y x y 2 dx dy int infty infty int infty infty Q Y mid X y mid x P X x x y 2 dx dy nbsp As the above equations show calculating a rate distortion function requires the stochastic description of the input X displaystyle X nbsp in terms of the PDF P X x displaystyle P X x nbsp and then aims at finding the conditional PDF Q Y X y x displaystyle Q Y mid X y mid x nbsp that minimize rate for a given distortion D displaystyle D nbsp These definitions can be formulated measure theoretically to account for discrete and mixed random variables as well An analytical solution to this minimization problem is often difficult to obtain except in some instances for which we next offer two of the best known examples The rate distortion function of any source is known to obey several fundamental properties the most important ones being that it is a continuous monotonically decreasing convex U function and thus the shape for the function in the examples is typical even measured rate distortion functions in real life tend to have very similar forms Although analytical solutions to this problem are scarce there are upper and lower bounds to these functions including the famous Shannon lower bound SLB which in the case of squared error and memoryless sources states that for arbitrary sources with finite differential entropy R D h X h D displaystyle R D geq h X h D nbsp where h D is the differential entropy of a Gaussian random variable with variance D This lower bound is extensible to sources with memory and other distortion measures One important feature of the SLB is that it is asymptotically tight in the low distortion regime for a wide class of sources and in some occasions it actually coincides with the rate distortion function Shannon Lower Bounds can generally be found if the distortion between any two numbers can be expressed as a function of the difference between the value of these two numbers The Blahut Arimoto algorithm co invented by Richard Blahut is an elegant iterative technique for numerically obtaining rate distortion functions of arbitrary finite input output alphabet sources and much work has been done to extend it to more general problem instances When working with stationary sources with memory it is necessary to modify the definition of the rate distortion function and it must be understood in the sense of a limit taken over sequences of increasing lengths R D lim n R n D displaystyle R D lim n rightarrow infty R n D nbsp where R n D 1 n inf Q Y n X n Q I Y n X n displaystyle R n D frac 1 n inf Q Y n mid X n in mathcal Q I Y n X n nbsp and Q Q Y n X n Y n X n X 0 E d X n Y n D displaystyle mathcal Q Q Y n mid X n Y n mid X n X 0 E d X n Y n leq D nbsp where superscripts denote a complete sequence up to that time and the subscript 0 indicates initial state Memoryless independent Gaussian source with squared error distortion edit If we assume that X displaystyle X nbsp is a Gaussian random variable with variance s 2 displaystyle sigma 2 nbsp and if we assume that successive samples of the signal X displaystyle X nbsp are stochastically independent or equivalently the source is memoryless or the signal is uncorrelated we find the following analytical expression for the rate distortion function R D 1 2 log 2 s x 2 D if 0 D s x 2 0 if D gt s x 2 displaystyle R D begin cases frac 1 2 log 2 sigma x 2 D amp text if 0 leq D leq sigma x 2 0 amp text if D gt sigma x 2 end cases nbsp 2 The following figure shows what this function looks like nbsp Rate distortion theory tell us that no compression system exists that performs outside the gray area The closer a practical compression system is to the red lower bound the better it performs As a general rule this bound can only be attained by increasing the coding block length parameter Nevertheless even at unit blocklengths one can often find good scalar quantizers that operate at distances from the rate distortion function that are practically relevant 3 This rate distortion function holds only for Gaussian memoryless sources It is known that the Gaussian source is the most difficult source to encode for a given mean square error it requires the greatest number of bits The performance of a practical compression system working on say images may well be below the R D displaystyle R left D right nbsp lower bound shown Memoryless independent Bernoulli source with Hamming distortion edit The rate distortion function of a Bernoulli random variable with Hamming distortion is given by R D H b p H b D 0 D min p 1 p 0 D gt min p 1 p displaystyle R D left begin matrix H b p H b D amp 0 leq D leq min p 1 p 0 amp D gt min p 1 p end matrix right nbsp where H b displaystyle H b nbsp denotes the binary entropy function Plot of the rate distortion function for p 0 5 displaystyle p 0 5 nbsp nbsp Connecting rate distortion theory to channel capacity editSuppose we want to transmit information about a source to the user with a distortion not exceeding D Rate distortion theory tells us that at least R D displaystyle R D nbsp bits symbol of information from the source must reach the user We also know from Shannon s channel coding theorem that if the source entropy is H bits symbol and the channel capacity is C where C lt H displaystyle C lt H nbsp then H C displaystyle H C nbsp bits symbol will be lost when transmitting this information over the given channel For the user to have any hope of reconstructing with a maximum distortion D we must impose the requirement that the information lost in transmission does not exceed the maximum tolerable loss of H R D displaystyle H R D nbsp bits symbol This means that the channel capacity must be at least as large as R D displaystyle R D nbsp 4 See also editBlahut Arimoto algorithm Class of algorithms in information theory Data compression Compact encoding of digital data Decorrelation Process of reducing correlation within one or more signals Rate distortion optimization decision algorithm used in video compressionPages displaying wikidata descriptions as a fallback Sphere packing Geometrical structure White noise Type of signal in signal processingReferences edit Blau Y Michaeli T 2019 Rethinking Lossy Compression The Rate Distortion Perception Tradeoff PDF Proceedings of the International Conference on Machine Learning PMLR pp 675 685 arXiv 1901 07821 Cover amp Thomas 2012 p 310 Cover Thomas M Thomas Joy A 2012 2006 10 Rate Distortion Theory Elements of Information Theory 2nd ed Wiley ISBN 978 1 118 58577 1 Berger Toby 1971 Rate Distortion Theory A Mathematical Basis for Data Compression Prentice Hall ISBN 978 0 13 753103 5 LCCN 75 148254 OCLC 156968 External links editMarzen Sarah DeDeo Simon PyRated a python package for rate distortion theory PyRated is a very simple Python package to do the most basic calculation in rate distortion theory the determination of the codebook and the transmission rate R given a utility function distortion matrix and a Lagrange multiplier beta VcDemo Image and Video Compression Learning Tool Retrieved from https en wikipedia org w index php title Rate distortion theory amp oldid 1221076326, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.