fbpx
Wikipedia

Law of the unconscious statistician

In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem which expresses the expected value of a function g(X) of a random variable X in terms of g and the probability distribution of X.

The form of the law depends on the type of random variable X in question. If the distribution of X is discrete and one knows its probability mass function pX, then the expected value of g(X) is

where the sum is over all possible values x of X. If instead the distribution of X is continuous with probability density function fX, then the expected value of g(X) is

Both of these special cases can be expressed in terms of the cumulative probability distribution function FX of X, with the expected value of g(X) now given by the Lebesgue–Stieltjes integral

In even greater generality, X could be a random element in any measurable space, in which case the law is given in terms of measure theory and the Lebesgue integral. In this setting, there is no need to restrict the context to probability measures, and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure.

Etymology Edit

This proposition is (sometimes) known as the law of the unconscious statistician because of a purported tendency to think of the identity as the very definition of the expected value, rather than (more formally) as a consequence of its true definition.[1] The naming is sometimes attributed to Sheldon Ross' textbook Introduction to Probability Models, although he removed the reference in later editions.[2] Many statistics textbooks do present the result as the definition of expected value.[3]

Joint distributions Edit

A similar property holds for joint distributions, or equivalently, for random vectors. For discrete random variables X and Y, a function of two variables g, and joint probability mass function  :[4]

 

In the absolutely continuous case, with   being the joint probability density function,

 

Special cases Edit

A number of special cases are given here. In the simplest case, where the random variable X takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification if X is a discrete random vector or even a discrete random element.

The case of a continuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration. However, in the framework of measure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete) random elements, and the case of a continuous random variable is then a special case by making use of the Radon–Nikodym theorem.

Discrete case Edit

Suppose that X is a random variable which takes on only finitely or countably many different values x1, x2, ..., with probabilities p1, p2, .... Then for any function g of these values, the random variable g(X) has values g(x1), g(x2), ..., although some of these may coincide with each other. For example, this is the case if X can take on both values 1 and −1 and g(x) = x2.

Let y1, y2, ... enumerate the possible distinct values of g(X), and for each i let Ii denote the collection of all j with g(xj) = yi. Then according to the definition of expected value of a random variable as a weighted average of possible outputs, there is

 

But this can be rewritten as

 

This equality relates the average of the outputs of g(X) as weighted by the probabilities of the outputs themselves to the average of the outputs of g(X) as weighted by the probabilities of the outputs of X.

If X takes on only finitely many possible values, the above is fully rigorous. However, if X takes on countably many values, the last equality given does not always hold, as seen by the Riemann series theorem. Because of this, it is necessary to assume the absolute convergence of the sums in question.[5]

Continuous case Edit

Suppose that X is a random variable whose distribution has a continuous density f. If g is a general function, then the probability that g(X) is valued in a set of real numbers K equals the probability that X is valued in g−1(K), which is given by

 

Under various conditions on g, the change-of-variables formula for integration can be applied to relate this to an integral over K, and hence to identify the density of g(X) in terms of the density of X. In the simplest case, if g is differentiable with nowhere-vanishing derivative, then the above integral can be written as

 

thereby identifying g(X) as possessing the density f (g−1(y))(g−1)′(y). The expected value of g(X) is then identified as

 

where the equality follows by another use of the change-of-variables formula for integration. This shows that the expected value of g(X) is encoded entirely by the function g and the density f of X.[6]

The assumption that g is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such as g(x) = x2. The result still holds true in these broader settings, although the proof requires more sophisticated results from mathematical analysis such as Sard's theorem and the coarea formula. In even greater generality, using the Lebesgue theory as below, it can be found that the identity

 

holds true whenever X has a density f (which does not have to be continuous) and whenever g is a measurable function for which g(X) has finite expected value. (Every continuous function is measurable.) Furthermore, without modification to the proof, this holds even if X is a random vector (with density) and g is a multivariable function; the integral is then taken over the multi-dimensional range of values of X.

Measure-theoretic formulation Edit

An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral. Here, the setting is that of a measure space (Ω, μ) and a measurable map X from Ω to a measurable space Ω'. The theorem then says that for any measurable function g on Ω' which is valued in real numbers (or even the extended real number line), there is

 

(interpreted as saying, in particular, that either side of the equality exists if the other side exists). Here X μ denotes the pushforward measure on Ω'. The 'discrete case' given above is the special case arising when X takes on only countably many values and μ is a probability measure. In fact, the discrete case (although without the restriction to probability measures) is the first step in proving the general measure-theoretic formulation, as the general version follows therefrom by an application of the monotone convergence theorem.[7] Without any major changes, the result can also be formulated in the setting of outer measures.[8]

If μ is a σ-finite measure, the theory of the Radon–Nikodym derivative is applicable. In the special case that the measure X μ is absolutely continuous relative to some background σ-finite measure ν on Ω', there is a real-valued function fX on Ω' representing the Radon–Nikodym derivative of the two measures, and then

 

In the further special case that Ω' is the real number line, as in the contexts discussed above, it is natural to take ν to be the Lebesgue measure, and this then recovers the 'continuous case' given above whenever μ is a probability measure. (In this special case, the condition of σ-finiteness is vacuous, since Lebesgue measure and every probability measure are trivially σ-finite.)[9]

References Edit

  1. ^ DeGroot & Schervish 2014, pp. 213−214.
  2. ^ Casella & Berger 2001, Section 2.2; Ross 2019.
  3. ^ Casella & Berger 2001, Section 2.2.
  4. ^ Ross 2019.
  5. ^ Feller 1968, Section IX.2.
  6. ^ Papoulis & Pillai 2002, Chapter 5.
  7. ^ Bogachev 2007, Section 3.6; Cohn 2013, Section 2.6; Halmos 1950, Section 39.
  8. ^ Federer 1969, Section 2.4.
  9. ^ Halmos 1950, Section 39.
  • Bogachev, V. I. (2007). Measure theory. Volume I. Berlin: Springer-Verlag. doi:10.1007/978-3-540-34514-5. ISBN 978-3-540-34513-8. MR 2267655. Zbl 1120.28001.
  • Casella, George; Berger, Roger L. (2001). Statistical inference. Duxbury Advanced Series (Second edition of 1990 original ed.). Pacific Grove, CA: Duxbury. ISBN 0-534-11958-1. Zbl 0699.62001.
  • Cohn, Donald L. (2013). Measure theory. Birkhäuser Advanced Texts: Basler Lehrbücher (Second edition of 1980 original ed.). New York: Birkhäuser/Springer. doi:10.1007/978-1-4614-6956-8. ISBN 978-1-4614-6955-1. MR 3098996. Zbl 1292.28002.
  • DeGroot, Morris H.; Schervish, Mark J. (2014). Probability and statistics (Fourth edition of 1975 original ed.). Pearson Education. ISBN 0-321-50046-6. MR 0373075. Zbl 0619.62001.
  • Federer, Herbert (1969). Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften. Vol. 153. Berlin–Heidelberg–New York: Springer-Verlag. doi:10.1007/978-3-642-62010-2. ISBN 978-3-540-60656-7. MR 0257325. Zbl 0176.00801.
  • Feller, William (1968). An introduction to probability theory and its applications. Volume I (Third edition of 1950 original ed.). New York–London–Sydney: John Wiley & Sons, Inc. MR 0228020. Zbl 0155.23101.
  • Halmos, Paul R. (1950). Measure theory. New York: D. Van Nostrand Co., Inc. doi:10.1007/978-1-4684-9440-2. MR 0033869. Zbl 0040.16802.
  • Papoulis, Athanasios; Pillai, S. Unnikrishna (2002). Probability, random variables, and stochastic processes (Fourth edition of 1965 original ed.). New York: McGraw-Hill. ISBN 0-07-366011-6.
  • Ross, Sheldon M. (2019). Introduction to probability models (Twelfth edition of 1972 original ed.). London: Academic Press. doi:10.1016/C2017-0-01324-1. ISBN 978-0-12-814346-9. MR 3931305. Zbl 1408.60002.

unconscious, statistician, probability, theory, statistics, unconscious, statistician, lotus, theorem, which, expresses, expected, value, function, random, variable, terms, probability, distribution, form, depends, type, random, variable, question, distributio. In probability theory and statistics the law of the unconscious statistician or LOTUS is a theorem which expresses the expected value of a function g X of a random variable X in terms of g and the probability distribution of X The form of the law depends on the type of random variable X in question If the distribution of X is discrete and one knows its probability mass function pX then the expected value of g X is E g X x g x p X x displaystyle operatorname E g X sum x g x p X x where the sum is over all possible values x of X If instead the distribution of X is continuous with probability density function fX then the expected value of g X is E g X g x f X x d x displaystyle operatorname E g X int infty infty g x f X x mathrm d x Both of these special cases can be expressed in terms of the cumulative probability distribution function FX of X with the expected value of g X now given by the Lebesgue Stieltjes integral E g X g x d F X x displaystyle operatorname E g X int infty infty g x mathrm d F X x In even greater generality X could be a random element in any measurable space in which case the law is given in terms of measure theory and the Lebesgue integral In this setting there is no need to restrict the context to probability measures and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure Contents 1 Etymology 2 Joint distributions 3 Special cases 3 1 Discrete case 3 2 Continuous case 3 3 Measure theoretic formulation 4 ReferencesEtymology EditThis proposition is sometimes known as the law of the unconscious statistician because of a purported tendency to think of the identity as the very definition of the expected value rather than more formally as a consequence of its true definition 1 The naming is sometimes attributed to Sheldon Ross textbook Introduction to Probability Models although he removed the reference in later editions 2 Many statistics textbooks do present the result as the definition of expected value 3 Joint distributions EditA similar property holds for joint distributions or equivalently for random vectors For discrete random variables X and Y a function of two variables g and joint probability mass function p X Y x y displaystyle p X Y x y nbsp 4 E g X Y y x g x y p X Y x y displaystyle operatorname E g X Y sum y sum x g x y p X Y x y nbsp In the absolutely continuous case with f X Y x y displaystyle f X Y x y nbsp being the joint probability density function E g X Y g x y f X Y x y d x d y displaystyle operatorname E g X Y int infty infty int infty infty g x y f X Y x y mathrm d x mathrm d y nbsp Special cases EditA number of special cases are given here In the simplest case where the random variable X takes on countably many values so that its distribution is discrete the proof is particularly simple and holds without modification if X is a discrete random vector or even a discrete random element The case of a continuous random variable is more subtle since the proof in generality requires subtle forms of the change of variables formula for integration However in the framework of measure theory the discrete case generalizes straightforwardly to general not necessarily discrete random elements and the case of a continuous random variable is then a special case by making use of the Radon Nikodym theorem Discrete case Edit Suppose that X is a random variable which takes on only finitely or countably many different values x1 x2 with probabilities p1 p2 Then for any function g of these values the random variable g X has values g x1 g x2 although some of these may coincide with each other For example this is the case if X can take on both values 1 and 1 and g x x2 Let y1 y2 enumerate the possible distinct values of g X and for each i let Ii denote the collection of all j with g xj yi Then according to the definition of expected value of a random variable as a weighted average of possible outputs there is E g X i y i P g X y i displaystyle operatorname E g X sum i y i text P g X y i nbsp But this can be rewritten as i y i P g X y i i y i P X x j for some j I i i y i j I i p j j g x j p j displaystyle sum i y i text P g X y i sum i y i text P X x j text for some j in I i sum i y i sum j in I i p j sum j g x j p j nbsp This equality relates the average of the outputs of g X as weighted by the probabilities of the outputs themselves to the average of the outputs of g X as weighted by the probabilities of the outputs of X If X takes on only finitely many possible values the above is fully rigorous However if X takes on countably many values the last equality given does not always hold as seen by the Riemann series theorem Because of this it is necessary to assume the absolute convergence of the sums in question 5 Continuous case Edit Suppose that X is a random variable whose distribution has a continuous density f If g is a general function then the probability that g X is valued in a set of real numbers K equals the probability that X is valued in g 1 K which is given by g 1 K f x d x displaystyle int g 1 K f x mathrm d x nbsp Under various conditions on g the change of variables formula for integration can be applied to relate this to an integral over K and hence to identify the density of g X in terms of the density of X In the simplest case if g is differentiable with nowhere vanishing derivative then the above integral can be written as K f g 1 y g 1 y d y displaystyle int K f g 1 y g 1 y mathrm d y nbsp thereby identifying g X as possessing the density f g 1 y g 1 y The expected value of g X is then identified as y f g 1 y g 1 y d y g x f x d x displaystyle int infty infty yf g 1 y g 1 y mathrm d y int infty infty g x f x mathrm d x nbsp where the equality follows by another use of the change of variables formula for integration This shows that the expected value of g X is encoded entirely by the function g and the density f of X 6 The assumption that g is differentiable with nonvanishing derivative which is necessary for applying the usual change of variables formula excludes many typical cases such as g x x2 The result still holds true in these broader settings although the proof requires more sophisticated results from mathematical analysis such as Sard s theorem and the coarea formula In even greater generality using the Lebesgue theory as below it can be found that the identity E g X g x f x d x displaystyle operatorname E g X int infty infty g x f x mathrm d x nbsp holds true whenever X has a density f which does not have to be continuous and whenever g is a measurable function for which g X has finite expected value Every continuous function is measurable Furthermore without modification to the proof this holds even if X is a random vector with density and g is a multivariable function the integral is then taken over the multi dimensional range of values of X Measure theoretic formulation Edit An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral Here the setting is that of a measure space W m and a measurable map X from W to a measurable space W The theorem then says that for any measurable function g on W which is valued in real numbers or even the extended real number line there is W g X d m W g d X m displaystyle int Omega g circ X mathrm d mu int Omega g mathrm d X sharp mu nbsp interpreted as saying in particular that either side of the equality exists if the other side exists Here X m denotes the pushforward measure on W The discrete case given above is the special case arising when X takes on only countably many values and m is a probability measure In fact the discrete case although without the restriction to probability measures is the first step in proving the general measure theoretic formulation as the general version follows therefrom by an application of the monotone convergence theorem 7 Without any major changes the result can also be formulated in the setting of outer measures 8 If m is a s finite measure the theory of the Radon Nikodym derivative is applicable In the special case that the measure X m is absolutely continuous relative to some background s finite measure n on W there is a real valued function fX on W representing the Radon Nikodym derivative of the two measures and then W g d X m W g f X d n displaystyle int Omega g mathrm d X sharp mu int Omega gf X mathrm d nu nbsp In the further special case that W is the real number line as in the contexts discussed above it is natural to take n to be the Lebesgue measure and this then recovers the continuous case given above whenever m is a probability measure In this special case the condition of s finiteness is vacuous since Lebesgue measure and every probability measure are trivially s finite 9 References Edit DeGroot amp Schervish 2014 pp 213 214 Casella amp Berger 2001 Section 2 2 Ross 2019 Casella amp Berger 2001 Section 2 2 Ross 2019 Feller 1968 Section IX 2 Papoulis amp Pillai 2002 Chapter 5 Bogachev 2007 Section 3 6 Cohn 2013 Section 2 6 Halmos 1950 Section 39 Federer 1969 Section 2 4 Halmos 1950 Section 39 Bogachev V I 2007 Measure theory Volume I Berlin Springer Verlag doi 10 1007 978 3 540 34514 5 ISBN 978 3 540 34513 8 MR 2267655 Zbl 1120 28001 Casella George Berger Roger L 2001 Statistical inference Duxbury Advanced Series Second edition of 1990 original ed Pacific Grove CA Duxbury ISBN 0 534 11958 1 Zbl 0699 62001 Cohn Donald L 2013 Measure theory Birkhauser Advanced Texts Basler Lehrbucher Second edition of 1980 original ed New York Birkhauser Springer doi 10 1007 978 1 4614 6956 8 ISBN 978 1 4614 6955 1 MR 3098996 Zbl 1292 28002 DeGroot Morris H Schervish Mark J 2014 Probability and statistics Fourth edition of 1975 original ed Pearson Education ISBN 0 321 50046 6 MR 0373075 Zbl 0619 62001 Federer Herbert 1969 Geometric measure theory Die Grundlehren der mathematischen Wissenschaften Vol 153 Berlin Heidelberg New York Springer Verlag doi 10 1007 978 3 642 62010 2 ISBN 978 3 540 60656 7 MR 0257325 Zbl 0176 00801 Feller William 1968 An introduction to probability theory and its applications Volume I Third edition of 1950 original ed New York London Sydney John Wiley amp Sons Inc MR 0228020 Zbl 0155 23101 Halmos Paul R 1950 Measure theory New York D Van Nostrand Co Inc doi 10 1007 978 1 4684 9440 2 MR 0033869 Zbl 0040 16802 Papoulis Athanasios Pillai S Unnikrishna 2002 Probability random variables and stochastic processes Fourth edition of 1965 original ed New York McGraw Hill ISBN 0 07 366011 6 Ross Sheldon M 2019 Introduction to probability models Twelfth edition of 1972 original ed London Academic Press doi 10 1016 C2017 0 01324 1 ISBN 978 0 12 814346 9 MR 3931305 Zbl 1408 60002 Retrieved from https en wikipedia org w index php title Law of the unconscious statistician amp oldid 1167281419, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.