fbpx
Wikipedia

Jensen's inequality

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.[3]

Jensen's inequality generalizes the statement that a secant line of a convex function lies above its graph.
Visualizing convexity and Jensen's inequality

Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for t ∈ [0,1]),

while the graph of the function is the convex function of the weighted means,

Thus, Jensen's inequality is

In the context of probability theory, it is generally stated in the following form: if X is a random variable and φ is a convex function, then

The difference between the two sides of the inequality, , is called the Jensen gap.[4]

Statements edit

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.

Finite form edit

For a real convex function  , numbers   in its domain, and positive weights  , Jensen's inequality can be stated as:

 

 

 

 

 

(1)

and the inequality is reversed if   is concave, which is

 

 

 

 

 

(2)

Equality holds if and only if   or   is linear on a domain containing  .

As a particular case, if the weights   are all equal, then (1) and (2) become

 

 

 

 

 

(3)

 

 

 

 

 

(4)

For instance, the function log(x) is concave, so substituting   in the previous formula (4) establishes the (logarithm of the) familiar arithmetic-mean/geometric-mean inequality:

 

A common application has x as a function of another variable (or set of variables) t, that is,  . All of this carries directly over to the general continuous case: the weights ai are replaced by a non-negative integrable function f (x), such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic form edit

Let   be a probability space. Let   be a  -measurable function and   be convex. Then:[5]

 

In real analysis, we may require an estimate on

 

where  , and   is a non-negative Lebesgue-integrable function. In this case, the Lebesgue measure of   need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get[6]

 

Probabilistic form edit

The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let   be a probability space, X an integrable real-valued random variable and φ a convex function. Then:

 [7]

In this probability setting, the measure μ is intended as a probability  , the integral with respect to μ as an expected value  , and the function   as a random variable X.

Note that the equality holds if and only if φ is a linear function on some convex set   such that   (which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic setting edit

More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element   in T, such that for any element z in the dual space of T:  , and  . Then, for any measurable convex function φ and any sub-σ-algebra   of  :

 

Here   stands for the expectation conditioned to the σ-algebra  . This general statement reduces to the previous ones when the topological vector space T is the real axis, and   is the trivial σ-algebra {∅, Ω} (where is the empty set, and Ω is the sample space).[8]

A sharpened and generalized form edit

Let X be a one-dimensional random variable with mean   and variance  . Let   be a twice differentiable function, and define the function

 

Then[9]

 

In particular, when   is convex, then  , and the standard form of Jensen's inequality immediately follows for the case where   is additionally assumed to be twice differentiable.

Proofs edit

Intuitive graphical proof edit

 
A graphical "proof" of Jensen's inequality for the probabilistic case. The dashed curve along the X axis is the hypothetical distribution of X, while the dashed curve along the Y axis is the corresponding distribution of Y values. Note that the convex mapping Y(X) increasingly "stretches" the distribution for increasing values of X.
 
This is a proof without words of Jensen's inequality for n variables. Without loss of generality, the sum of the positive weights is 1. It follows that the weighted point lies in the convex hull of the original points, which lies above the function itself by the definition of convexity. The conclusion follows.[10]

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where X is a real number (see figure). Assuming a hypothetical distribution of X values, one can immediately identify the position of   and its image   in the graph. Noticing that for convex mappings Y = φ(x) of some x values the corresponding distribution of Y values is increasingly "stretched up" for increasing values of X, it is easy to see that the distribution of Y is broader in the interval corresponding to X > X0 and narrower in X < X0 for any X0; in particular, this is also true for  . Consequently, in this picture the expectation of Y will always shift upwards with respect to the position of  . A similar reasoning holds if the distribution of X covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

 

with equality when φ(X) is not strictly convex, e.g. when it is a straight line, or when X follows a degenerate distribution (i.e. is a constant).

The proofs below formalize this intuitive notion.

Proof 1 (finite form) edit

If λ1 and λ2 are two arbitrary nonnegative real numbers such that λ1 + λ2 = 1 then convexity of φ implies

 

This can be generalized: if λ1, ..., λn are nonnegative real numbers such that λ1 + ... + λn = 1, then

 

for any x1, ..., xn.

The finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose the statement is true for some n, so

 

for any λ1, ..., λn such that λ1 + ... + λn = 1.

One needs to prove it for n + 1. At least one of the λi is strictly smaller than  , say λn+1; therefore by convexity inequality:

 

Since λ1 + ... +λn + λn+1 = 1,

 ,

applying the inductive hypothesis gives

 

therefore

 

We deduce the equality is true for n + 1, by induction it follows that the result is also true for all integer n greater than 2.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

 

where μn is a measure given by an arbitrary convex combination of Dirac deltas:

 

Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form) edit

Let   be a real-valued  -integrable function on a probability space  , and let   be a convex function on the real numbers. Since   is convex, at each real number   we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of   at  , but which are below the graph of   at all points (support lines of the graph).

Now, if we define

 

because of the existence of subderivatives for convex functions, we may choose   and   such that

 

for all real   and

 

But then we have that

 

for almost all  . Since we have a probability measure, the integral is monotone with   so that

 

as desired.

Proof 3 (general inequality in a probabilistic setting) edit

Let X be an integrable random variable that takes values in a real topological vector space T. Since   is convex, for any  , the quantity

 

is decreasing as θ approaches 0+. In particular, the subdifferential of   evaluated at x in the direction y is well-defined by

 

It is easily seen that the subdifferential is linear in y[citation needed] (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for θ = 1, one gets

 

In particular, for an arbitrary sub-σ-algebra   we can evaluate the last inequality when   to obtain

 

Now, if we take the expectation conditioned to   on both sides of the previous expression, we get the result since:

 

by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation:

 

Applications and special cases edit

Form involving a probability density function edit

Suppose Ω is a measurable subset of the real line and f(x) is a non-negative function such that

 

In probabilistic language, f is a probability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

If g is any real-valued measurable function and   is convex over the range of g, then

 

If g(x) = x, then this form of the inequality reduces to a commonly used special case:

 

This is applied in Variational Bayesian methods.

Example: even moments of a random variable edit

If g(x) = x2n, and X is a random variable, then g is convex as

 

and so

 

In particular, if some even moment 2n of X is finite, X has a finite mean. An extension of this argument shows X has finite moments of every order   dividing n.

Alternative finite form edit

Let Ω = {x1, ... xn}, and take μ to be the counting measure on Ω, then the general form reduces to a statement about sums:

 

provided that λi ≥ 0 and

 

There is also an infinite discrete form.

Statistical physics edit

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

 

where the expected values are with respect to some probability distribution in the random variable X.

Proof: Let   in  

Information theory edit

If p(x) is the true probability density for X, and q(x) is another density, then applying Jensen's inequality for the random variable Y(X) = q(X)/p(X) and the convex function φ(y) = −log(y) gives

 

Therefore:

 

a result called Gibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is non-negative is called the Kullback–Leibler divergence of q from p.

Since −log(x) is a strictly convex function for x > 0, it follows that equality holds when p(x) equals q(x) almost everywhere.

Rao–Blackwell theorem edit

If L is a convex function and   a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get

 

So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating

 

the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed. Further, because T is a sufficient statistics,   does not depend on θ, hence, becomes a statistics.

This result is known as the Rao–Blackwell theorem.

Financial Performance Simulation edit

A popular method of measuring the investment performance of an investment is the Internal Rate of Return (IRR) which is the rate by which a series of uncertain future cash flows are discounted using Present Value Theory to cause the sum of the future cash flows to equal the initial investment. While it is tempting to perform Monte Carlo simulation of the IRR, Jensen's Inequality introduces a bias due to fact that the IRR function is a curved function and the expectation operator is a linear function.[11]

See also edit

Notes edit

  1. ^ Jensen, J. L. W. V. (1906). "Sur les fonctions convexes et les inégalités entre les valeurs moyennes". Acta Mathematica. 30 (1): 175–193. doi:10.1007/BF02418571.
  2. ^ Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality". Archiv der Mathematik. 100 (6): 561–570. doi:10.1007/s00013-013-0522-3. MR 3069109. S2CID 56372266.
  3. ^ Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005). A Modern Introduction to Probability and Statistics: Understanding Why and How. Springer Texts in Statistics. London: Springer. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.
  4. ^ Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019). "Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions" (PDF). The Australian Journal of Mathematical Analysis and Applications. 16 (2). arXiv:1712.05267.
  5. ^ p. 25 of Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. ISBN 978-1108473682.
  6. ^ Niculescu, Constantin P. "Integral inequalities", P. 12.
  7. ^ p. 29 of Rick Durrett (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press. ISBN 978-1108473682.
  8. ^ Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 in Perlman, Michael D. (1974). "Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space". Journal of Multivariate Analysis. 4 (1): 52–65. doi:10.1016/0047-259X(74)90005-0. hdl:11299/199167.
  9. ^ Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality". American Statistician. 73 (3): 278–281. arXiv:1707.08644. doi:10.1080/00031305.2017.1419145. S2CID 88515366.
  10. ^ Bradley, CJ (2006). Introduction to Inequalities. Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97. ISBN 978-1-906001-11-7.
  11. ^ Brown, R.J., Klingenberg, B (2015). "Real Estate: heavy tail modelling using Excel". Journal of Property Investment and Finance. 33 (4): 393–407. doi:10.1108/JPIF-05-2014-0033 – via JSTOR.{{cite journal}}: CS1 maint: multiple names: authors list (link)

References edit

External links edit

jensen, inequality, analytic, functions, jensen, formula, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, news. For Jensen s inequality for analytic functions see Jensen s formula This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Jensen s inequality news newspapers books scholar JSTOR October 2011 Learn how and when to remove this template message In mathematics Jensen s inequality named after the Danish mathematician Johan Jensen relates the value of a convex function of an integral to the integral of the convex function It was proved by Jensen in 1906 1 building on an earlier proof of the same inequality for doubly differentiable functions by Otto Holder in 1889 2 Given its generality the inequality appears in many forms depending on the context some of which are presented below In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation it is a simple corollary that the opposite is true of concave transformations 3 Jensen s inequality generalizes the statement that a secant line of a convex function lies above its graph source source Visualizing convexity and Jensen s inequalityJensen s inequality generalizes the statement that the secant line of a convex function lies above the graph of the function which is Jensen s inequality for two points the secant line consists of weighted means of the convex function for t 0 1 t f x 1 1 t f x 2 displaystyle tf x 1 1 t f x 2 while the graph of the function is the convex function of the weighted means f t x 1 1 t x 2 displaystyle f tx 1 1 t x 2 Thus Jensen s inequality is f t x 1 1 t x 2 t f x 1 1 t f x 2 displaystyle f tx 1 1 t x 2 leq tf x 1 1 t f x 2 In the context of probability theory it is generally stated in the following form if X is a random variable and f is a convex function then f E X E f X displaystyle varphi operatorname E X leq operatorname E left varphi X right The difference between the two sides of the inequality E f X f E X displaystyle operatorname E left varphi X right varphi left operatorname E X right is called the Jensen gap 4 Contents 1 Statements 1 1 Finite form 1 2 Measure theoretic form 1 3 Probabilistic form 1 4 General inequality in a probabilistic setting 1 5 A sharpened and generalized form 2 Proofs 2 1 Intuitive graphical proof 2 2 Proof 1 finite form 2 3 Proof 2 measure theoretic form 2 4 Proof 3 general inequality in a probabilistic setting 3 Applications and special cases 3 1 Form involving a probability density function 3 2 Example even moments of a random variable 3 3 Alternative finite form 3 4 Statistical physics 3 5 Information theory 3 6 Rao Blackwell theorem 3 7 Financial Performance Simulation 4 See also 5 Notes 6 References 7 External linksStatements editThe classical form of Jensen s inequality involves several numbers and weights The inequality can be stated quite generally using either the language of measure theory or equivalently probability In the probabilistic setting the inequality can be further generalized to its full strength Finite form edit For a real convex function f displaystyle varphi nbsp numbers x 1 x 2 x n displaystyle x 1 x 2 ldots x n nbsp in its domain and positive weights a i displaystyle a i nbsp Jensen s inequality can be stated as f a i x i a i a i f x i a i displaystyle varphi left frac sum a i x i sum a i right leq frac sum a i varphi x i sum a i nbsp 1 and the inequality is reversed if f displaystyle varphi nbsp is concave which is f a i x i a i a i f x i a i displaystyle varphi left frac sum a i x i sum a i right geq frac sum a i varphi x i sum a i nbsp 2 Equality holds if and only if x 1 x 2 x n displaystyle x 1 x 2 cdots x n nbsp or f displaystyle varphi nbsp is linear on a domain containing x 1 x 2 x n displaystyle x 1 x 2 cdots x n nbsp As a particular case if the weights a i displaystyle a i nbsp are all equal then 1 and 2 become f x i n f x i n displaystyle varphi left frac sum x i n right leq frac sum varphi x i n nbsp 3 f x i n f x i n displaystyle varphi left frac sum x i n right geq frac sum varphi x i n nbsp 4 For instance the function log x is concave so substituting f x log x displaystyle varphi x log x nbsp in the previous formula 4 establishes the logarithm of the familiar arithmetic mean geometric mean inequality log i 1 n x i n i 1 n log x i n or x 1 x 2 x n n x 1 x 2 x n n displaystyle log left frac sum i 1 n x i n right geq frac sum i 1 n log left x i right n quad text or quad frac x 1 x 2 cdots x n n geq sqrt n x 1 cdot x 2 cdots x n nbsp A common application has x as a function of another variable or set of variables t that is x i g t i displaystyle x i g t i nbsp All of this carries directly over to the general continuous case the weights ai are replaced by a non negative integrable function f x such as a probability distribution and the summations are replaced by integrals Measure theoretic form edit Let W A m displaystyle Omega A mu nbsp be a probability space Let f W R displaystyle f Omega to mathbb R nbsp be a m displaystyle mu nbsp measurable function and f R R displaystyle varphi mathbb R to mathbb R nbsp be convex Then 5 f W f d m W f f d m displaystyle varphi left int Omega f mathrm d mu right leq int Omega varphi circ f mathrm d mu nbsp In real analysis we may require an estimate on f a b f x d x displaystyle varphi left int a b f x dx right nbsp where a b R displaystyle a b in mathbb R nbsp and f a b R displaystyle f colon a b to mathbb R nbsp is a non negative Lebesgue integrable function In this case the Lebesgue measure of a b displaystyle a b nbsp need not be unity However by integration by substitution the interval can be rescaled so that it has measure unity Then Jensen s inequality can be applied to get 6 f 1 b a a b f x d x 1 b a a b f f x d x displaystyle varphi left frac 1 b a int a b f x dx right leq frac 1 b a int a b varphi f x dx nbsp Probabilistic form edit The same result can be equivalently stated in a probability theory setting by a simple change of notation Let W F P displaystyle Omega mathfrak F operatorname P nbsp be a probability space X an integrable real valued random variable and f a convex function Then f E X E f X displaystyle varphi left operatorname E X right leq operatorname E left varphi X right nbsp 7 In this probability setting the measure m is intended as a probability P displaystyle operatorname P nbsp the integral with respect to m as an expected value E displaystyle operatorname E nbsp and the function f displaystyle f nbsp as a random variable X Note that the equality holds if and only if f is a linear function on some convex set A displaystyle A nbsp such that P X A 1 displaystyle mathrm P X in A 1 nbsp which follows by inspecting the measure theoretical proof below General inequality in a probabilistic setting edit More generally let T be a real topological vector space and X a T valued integrable random variable In this general setting integrable means that there exists an element E X displaystyle operatorname E X nbsp in T such that for any element z in the dual space of T E z X lt displaystyle operatorname E langle z X rangle lt infty nbsp and z E X E z X displaystyle langle z operatorname E X rangle operatorname E langle z X rangle nbsp Then for any measurable convex function f and any sub s algebra G displaystyle mathfrak G nbsp of F displaystyle mathfrak F nbsp f E X G E f X G displaystyle varphi left operatorname E left X mid mathfrak G right right leq operatorname E left varphi X mid mathfrak G right nbsp Here E G displaystyle operatorname E cdot mid mathfrak G nbsp stands for the expectation conditioned to the s algebra G displaystyle mathfrak G nbsp This general statement reduces to the previous ones when the topological vector space T is the real axis and G displaystyle mathfrak G nbsp is the trivial s algebra W where is the empty set and W is the sample space 8 A sharpened and generalized form edit Let X be a one dimensional random variable with mean m displaystyle mu nbsp and variance s 2 0 displaystyle sigma 2 geq 0 nbsp Let f x displaystyle varphi x nbsp be a twice differentiable function and define the function h x f x f m x m 2 f m x m displaystyle h x triangleq frac varphi left x right varphi left mu right left x mu right 2 frac varphi left mu right x mu nbsp Then 9 s 2 inf f x 2 s 2 inf h x E f X f E X s 2 sup h x s 2 sup f x 2 displaystyle sigma 2 inf frac varphi x 2 leq sigma 2 inf h x leq E left varphi left X right right varphi left E X right leq sigma 2 sup h x leq sigma 2 sup frac varphi x 2 nbsp In particular when f x displaystyle varphi x nbsp is convex then f x 0 displaystyle varphi x geq 0 nbsp and the standard form of Jensen s inequality immediately follows for the case where f x displaystyle varphi x nbsp is additionally assumed to be twice differentiable Proofs editIntuitive graphical proof edit nbsp A graphical proof of Jensen s inequality for the probabilistic case The dashed curve along the X axis is the hypothetical distribution of X while the dashed curve along the Y axis is the corresponding distribution of Y values Note that the convex mapping Y X increasingly stretches the distribution for increasing values of X nbsp This is a proof without words of Jensen s inequality for n variables Without loss of generality the sum of the positive weights is 1 It follows that the weighted point lies in the convex hull of the original points which lies above the function itself by the definition of convexity The conclusion follows 10 Jensen s inequality can be proved in several ways and three different proofs corresponding to the different statements above will be offered Before embarking on these mathematical derivations however it is worth analyzing an intuitive graphical argument based on the probabilistic case where X is a real number see figure Assuming a hypothetical distribution of X values one can immediately identify the position of E X displaystyle operatorname E X nbsp and its image f E X displaystyle varphi operatorname E X nbsp in the graph Noticing that for convex mappings Y f x of some x values the corresponding distribution of Y values is increasingly stretched up for increasing values of X it is easy to see that the distribution of Y is broader in the interval corresponding to X gt X0 and narrower in X lt X0 for any X0 in particular this is also true for X 0 E X displaystyle X 0 operatorname E X nbsp Consequently in this picture the expectation of Y will always shift upwards with respect to the position of f E X displaystyle varphi operatorname E X nbsp A similar reasoning holds if the distribution of X covers a decreasing portion of the convex function or both a decreasing and an increasing portion of it This proves the inequality i e f E X E f X E Y displaystyle varphi operatorname E X leq operatorname E varphi X operatorname E Y nbsp with equality when f X is not strictly convex e g when it is a straight line or when X follows a degenerate distribution i e is a constant The proofs below formalize this intuitive notion Proof 1 finite form edit If l1 and l2 are two arbitrary nonnegative real numbers such that l1 l2 1 then convexity of f implies x 1 x 2 f l 1 x 1 l 2 x 2 l 1 f x 1 l 2 f x 2 displaystyle forall x 1 x 2 qquad varphi left lambda 1 x 1 lambda 2 x 2 right leq lambda 1 varphi x 1 lambda 2 varphi x 2 nbsp This can be generalized if l1 ln are nonnegative real numbers such that l1 ln 1 then f l 1 x 1 l 2 x 2 l n x n l 1 f x 1 l 2 f x 2 l n f x n displaystyle varphi lambda 1 x 1 lambda 2 x 2 cdots lambda n x n leq lambda 1 varphi x 1 lambda 2 varphi x 2 cdots lambda n varphi x n nbsp for any x1 xn The finite form of the Jensen s inequality can be proved by induction by convexity hypotheses the statement is true for n 2 Suppose the statement is true for some n so f i 1 n l i x i i 1 n l i f x i displaystyle varphi left sum i 1 n lambda i x i right leq sum i 1 n lambda i varphi left x i right nbsp for any l1 ln such that l1 ln 1 One needs to prove it for n 1 At least one of the li is strictly smaller than 1 displaystyle 1 nbsp say ln 1 therefore by convexity inequality f i 1 n 1 l i x i f 1 l n 1 i 1 n l i 1 l n 1 x i l n 1 x n 1 1 l n 1 f i 1 n l i 1 l n 1 x i l n 1 f x n 1 displaystyle begin aligned varphi left sum i 1 n 1 lambda i x i right amp varphi left 1 lambda n 1 sum i 1 n frac lambda i 1 lambda n 1 x i lambda n 1 x n 1 right amp leq 1 lambda n 1 varphi left sum i 1 n frac lambda i 1 lambda n 1 x i right lambda n 1 varphi x n 1 end aligned nbsp Since l1 ln ln 1 1 i 1 n l i 1 l n 1 1 displaystyle sum i 1 n frac lambda i 1 lambda n 1 1 nbsp applying the inductive hypothesis gives f i 1 n l i 1 l n 1 x i i 1 n l i 1 l n 1 f x i displaystyle varphi left sum i 1 n frac lambda i 1 lambda n 1 x i right leq sum i 1 n frac lambda i 1 lambda n 1 varphi x i nbsp therefore f i 1 n 1 l i x i 1 l n 1 i 1 n l i 1 l n 1 f x i l n 1 f x n 1 i 1 n 1 l i f x i displaystyle begin aligned varphi left sum i 1 n 1 lambda i x i right amp leq 1 lambda n 1 sum i 1 n frac lambda i 1 lambda n 1 varphi x i lambda n 1 varphi x n 1 sum i 1 n 1 lambda i varphi x i end aligned nbsp We deduce the equality is true for n 1 by induction it follows that the result is also true for all integer n greater than 2 In order to obtain the general inequality from this finite form one needs to use a density argument The finite form can be rewritten as f x d m n x f x d m n x displaystyle varphi left int x d mu n x right leq int varphi x d mu n x nbsp where mn is a measure given by an arbitrary convex combination of Dirac deltas m n i 1 n l i d x i displaystyle mu n sum i 1 n lambda i delta x i nbsp Since convex functions are continuous and since convex combinations of Dirac deltas are weakly dense in the set of probability measures as could be easily verified the general statement is obtained simply by a limiting procedure Proof 2 measure theoretic form edit Let g displaystyle g nbsp be a real valued m displaystyle mu nbsp integrable function on a probability space W displaystyle Omega nbsp and let f displaystyle varphi nbsp be a convex function on the real numbers Since f displaystyle varphi nbsp is convex at each real number x displaystyle x nbsp we have a nonempty set of subderivatives which may be thought of as lines touching the graph of f displaystyle varphi nbsp at x displaystyle x nbsp but which are below the graph of f displaystyle varphi nbsp at all points support lines of the graph Now if we define x 0 W g d m displaystyle x 0 int Omega g d mu nbsp because of the existence of subderivatives for convex functions we may choose a displaystyle a nbsp and b displaystyle b nbsp such that a x b f x displaystyle ax b leq varphi x nbsp for all real x displaystyle x nbsp and a x 0 b f x 0 displaystyle ax 0 b varphi x 0 nbsp But then we have that f g w a g w b displaystyle varphi circ g omega geq ag omega b nbsp for almost all w W displaystyle omega in Omega nbsp Since we have a probability measure the integral is monotone with m W 1 displaystyle mu Omega 1 nbsp so that W f g d m W a g b d m a W g d m b W d m a x 0 b f x 0 f W g d m displaystyle int Omega varphi circ g d mu geq int Omega ag b d mu a int Omega g d mu b int Omega d mu ax 0 b varphi x 0 varphi left int Omega g d mu right nbsp as desired Proof 3 general inequality in a probabilistic setting edit Let X be an integrable random variable that takes values in a real topological vector space T Since f T R displaystyle varphi T to mathbb R nbsp is convex for any x y T displaystyle x y in T nbsp the quantity f x 8 y f x 8 displaystyle frac varphi x theta y varphi x theta nbsp is decreasing as 8 approaches 0 In particular the subdifferential of f displaystyle varphi nbsp evaluated at x in the direction y is well defined by D f x y lim 8 0 f x 8 y f x 8 inf 8 0 f x 8 y f x 8 displaystyle D varphi x cdot y lim theta downarrow 0 frac varphi x theta y varphi x theta inf theta neq 0 frac varphi x theta y varphi x theta nbsp It is easily seen that the subdifferential is linear in y citation needed that is false and the assertion requires Hahn Banach theorem to be proved and since the infimum taken in the right hand side of the previous formula is smaller than the value of the same term for 8 1 one gets f x f x y D f x y displaystyle varphi x leq varphi x y D varphi x cdot y nbsp In particular for an arbitrary sub s algebra G displaystyle mathfrak G nbsp we can evaluate the last inequality when x E X G y X E X G displaystyle x operatorname E X mid mathfrak G y X operatorname E X mid mathfrak G nbsp to obtain f E X G f X D f E X G X E X G displaystyle varphi operatorname E X mid mathfrak G leq varphi X D varphi operatorname E X mid mathfrak G cdot X operatorname E X mid mathfrak G nbsp Now if we take the expectation conditioned to G displaystyle mathfrak G nbsp on both sides of the previous expression we get the result since E D f E X G X E X G G D f E X G E X E X G G 0 displaystyle operatorname E left left D varphi operatorname E X mid mathfrak G cdot X operatorname E X mid mathfrak G right mid mathfrak G right D varphi operatorname E X mid mathfrak G cdot operatorname E left X operatorname E X mid mathfrak G right mid mathfrak G 0 nbsp by the linearity of the subdifferential in the y variable and the following well known property of the conditional expectation E E X G G E X G displaystyle operatorname E left left operatorname E X mid mathfrak G right mid mathfrak G right operatorname E X mid mathfrak G nbsp Applications and special cases editForm involving a probability density function edit Suppose W is a measurable subset of the real line and f x is a non negative function such that f x d x 1 displaystyle int infty infty f x dx 1 nbsp In probabilistic language f is a probability density function Then Jensen s inequality becomes the following statement about convex integrals If g is any real valued measurable function and f textstyle varphi nbsp is convex over the range of g then f g x f x d x f g x f x d x displaystyle varphi left int infty infty g x f x dx right leq int infty infty varphi g x f x dx nbsp If g x x then this form of the inequality reduces to a commonly used special case f x f x d x f x f x d x displaystyle varphi left int infty infty x f x dx right leq int infty infty varphi x f x dx nbsp This is applied in Variational Bayesian methods Example even moments of a random variable edit If g x x2n and X is a random variable then g is convex as d 2 g d x 2 x 2 n 2 n 1 x 2 n 2 0 x R displaystyle frac d 2 g dx 2 x 2n 2n 1 x 2n 2 geq 0 quad forall x in mathbb R nbsp and so g E X E X 2 n E X 2 n displaystyle g operatorname E X operatorname E X 2n leq operatorname E X 2n nbsp In particular if some even moment 2n of X is finite X has a finite mean An extension of this argument shows X has finite moments of every order l N displaystyle l in mathbb N nbsp dividing n Alternative finite form edit Let W x1 xn and take m to be the counting measure on W then the general form reduces to a statement about sums f i 1 n g x i l i i 1 n f g x i l i displaystyle varphi left sum i 1 n g x i lambda i right leq sum i 1 n varphi g x i lambda i nbsp provided that li 0 and l 1 l n 1 displaystyle lambda 1 cdots lambda n 1 nbsp There is also an infinite discrete form Statistical physics edit Jensen s inequality is of particular importance in statistical physics when the convex function is an exponential giving e E X E e X displaystyle e operatorname E X leq operatorname E left e X right nbsp where the expected values are with respect to some probability distribution in the random variable X Proof Let f x e x displaystyle varphi x e x nbsp in f E X E f X displaystyle varphi left operatorname E X right leq operatorname E left varphi X right nbsp Information theory edit If p x is the true probability density for X and q x is another density then applying Jensen s inequality for the random variable Y X q X p X and the convex function f y log y gives E f Y f E Y displaystyle operatorname E varphi Y geq varphi operatorname E Y nbsp Therefore D p x q x p x log q x p x d x log p x q x p x d x log q x d x 0 displaystyle D p x q x int p x log left frac q x p x right dx leq log left int p x frac q x p x dx right log left int q x dx right 0 nbsp a result called Gibbs inequality It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q The quantity that is non negative is called the Kullback Leibler divergence of q from p Since log x is a strictly convex function for x gt 0 it follows that equality holds when p x equals q x almost everywhere Rao Blackwell theorem edit Main article Rao Blackwell theorem If L is a convex function and G displaystyle mathfrak G nbsp a sub sigma algebra then from the conditional version of Jensen s inequality we get L E d X G E L d X G E L E d X G E L d X displaystyle L operatorname E delta X mid mathfrak G leq operatorname E L delta X mid mathfrak G quad Longrightarrow quad operatorname E L operatorname E delta X mid mathfrak G leq operatorname E L delta X nbsp So if d X is some estimator of an unobserved parameter 8 given a vector of observables X and if T X is a sufficient statistic for 8 then an improved estimator in the sense of having a smaller expected loss L can be obtained by calculating d 1 X E 8 d X T X T X displaystyle delta 1 X operatorname E theta delta X mid T X T X nbsp the expected value of d with respect to 8 taken over all possible vectors of observations X compatible with the same value of T X as that observed Further because T is a sufficient statistics d 1 X displaystyle delta 1 X nbsp does not depend on 8 hence becomes a statistics This result is known as the Rao Blackwell theorem Financial Performance Simulation edit A popular method of measuring the investment performance of an investment is the Internal Rate of Return IRR which is the rate by which a series of uncertain future cash flows are discounted using Present Value Theory to cause the sum of the future cash flows to equal the initial investment While it is tempting to perform Monte Carlo simulation of the IRR Jensen s Inequality introduces a bias due to fact that the IRR function is a curved function and the expectation operator is a linear function 11 See also editKaramata s inequality for a more general inequality Popoviciu s inequality Law of averages A proof without words of Jensen s inequalityNotes edit Jensen J L W V 1906 Sur les fonctions convexes et les inegalites entre les valeurs moyennes Acta Mathematica 30 1 175 193 doi 10 1007 BF02418571 Guessab A Schmeisser G 2013 Necessary and sufficient conditions for the validity of Jensen s inequality Archiv der Mathematik 100 6 561 570 doi 10 1007 s00013 013 0522 3 MR 3069109 S2CID 56372266 Dekking F M Kraaikamp C Lopuhaa H P Meester L E 2005 A Modern Introduction to Probability and Statistics Understanding Why and How Springer Texts in Statistics London Springer doi 10 1007 1 84628 168 7 ISBN 978 1 85233 896 1 Gao Xiang Sitharam Meera Roitberg Adrian 2019 Bounds on the Jensen Gap and Implications for Mean Concentrated Distributions PDF The Australian Journal of Mathematical Analysis and Applications 16 2 arXiv 1712 05267 p 25 of Rick Durrett 2019 Probability Theory and Examples 5th ed Cambridge University Press ISBN 978 1108473682 Niculescu Constantin P Integral inequalities P 12 p 29 of Rick Durrett 2019 Probability Theory and Examples 5th ed Cambridge University Press ISBN 978 1108473682 Attention In this generality additional assumptions on the convex function and or the topological vector space are needed see Example 1 3 on p 53 in Perlman Michael D 1974 Jensen s Inequality for a Convex Vector Valued Function on an Infinite Dimensional Space Journal of Multivariate Analysis 4 1 52 65 doi 10 1016 0047 259X 74 90005 0 hdl 11299 199167 Liao J Berg A 2018 Sharpening Jensen s Inequality American Statistician 73 3 278 281 arXiv 1707 08644 doi 10 1080 00031305 2017 1419145 S2CID 88515366 Bradley CJ 2006 Introduction to Inequalities Leeds United Kingdom United Kingdom Mathematics Trust p 97 ISBN 978 1 906001 11 7 Brown R J Klingenberg B 2015 Real Estate heavy tail modelling using Excel Journal of Property Investment and Finance 33 4 393 407 doi 10 1108 JPIF 05 2014 0033 via JSTOR a href Template Cite journal html title Template Cite journal cite journal a CS1 maint multiple names authors list link References editDavid Chandler 1987 Introduction to Modern Statistical Mechanics Oxford ISBN 0 19 504277 8 Tristan Needham 1993 A Visual Explanation of Jensen s Inequality American Mathematical Monthly 100 8 768 71 Nicola Fusco Paolo Marcellini Carlo Sbordone 1996 Analisi Matematica Due Liguori ISBN 978 88 207 2675 1 Walter Rudin 1987 Real and Complex Analysis McGraw Hill ISBN 0 07 054234 1 Rick Durrett 2019 Probability Theory and Examples 5th ed Cambridge University Press p 430 ISBN 978 1108473682 Retrieved 21 Dec 2020 Sam Savage 2012 The Flaw of Averages Why We Underestimate Risk in the Face of Uncertainty 1st ed Wiley ISBN 978 0471381976External links editJensen s Operator Inequality of Hansen and Pedersen Jensen inequality Encyclopedia of Mathematics EMS Press 2001 1994 Weisstein Eric W Jensen s inequality MathWorld Arthur Lohwater 1982 Introduction to Inequalities Online e book in PDF format Retrieved from https en wikipedia org w index php title Jensen 27s inequality amp oldid 1183903325, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.