fbpx
Wikipedia

Glivenko–Cantelli theorem

In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows.[1] Specifically, the empirical distribution function converges uniformly to the true distribution function almost surely.

The left diagram illustrates Glivenko–Cantelli theorem for uniform distributions. The right diagram illustrates the Donsker–Skorokhod–Kolmogorov theorem
The same diagram for normal distributions

The uniform convergence of more general empirical measures becomes an important property of the Glivenko–Cantelli classes of functions or sets.[2] The Glivenko–Cantelli classes arise in Vapnik–Chervonenkis theory, with applications to machine learning. Applications can be found in econometrics making use of M-estimators.

Statement edit

Assume that   are independent and identically distributed random variables in   with common cumulative distribution function  . The empirical distribution function for   is defined by

 

where   is the indicator function of the set   For every (fixed)     is a sequence of random variables which converge to   almost surely by the strong law of large numbers. Glivenko and Cantelli strengthened this result by proving uniform convergence of   to  

Theorem

  almost surely.[3](p 265)

This theorem originates with Valery Glivenko[4] and Francesco Cantelli,[5] in 1933.

Remarks
  • If   is a stationary ergodic process, then   converges almost surely to   The Glivenko–Cantelli theorem gives a stronger mode of convergence than this in the iid case.
  • An even stronger uniform convergence result for the empirical distribution function is available in the form of an extended type of law of the iterated logarithm.[3](p 268) See asymptotic properties of the empirical distribution function for this and related results.

Proof edit

For simplicity, consider a case of continuous random variable  . Fix   such that   for  . Now for all   there exists   such that  .

 

Therefore,

 

Since   by strong law of large numbers, we can guarantee that for any positive   and any integer   such that  , we can find   such that for all  , we have  . Combined with the above result, this further implies that  , which is the definition of almost sure convergence.

Empirical measures edit

One can generalize the empirical distribution function by replacing the set   by an arbitrary set C from a class of sets   to obtain an empirical measure indexed by sets  

 

Where   is the indicator function of each set  .

Further generalization is the map induced by   on measurable real-valued functions f, which is given by

 

Then it becomes an important property of these classes whether the strong law of large numbers holds uniformly on   or  .

Glivenko–Cantelli class edit

Consider a set   with a sigma algebra of Borel subsets A and a probability measure   For a class of subsets,

 

and a class of functions

 

define random variables

 
 

where   is the empirical measure,   is the corresponding map, and

  assuming that it exists.

Definitions

  • A class   is called a Glivenko–Cantelli class (or GC class, or sometimes strong GC class) with respect to a probability measure P if
  almost surely as  
  • A class is   is a weak Glivenko-Cantelli class with respect to P if it instead satisfies the weaker condition
  in probability as  
  • A class is called a universal Glivenko–Cantelli class if it is a GC class with respect to any probability measure   on  .
  • A class is a weak uniform Glivenko–Cantelli class if the convergence occurs uniformly over all probability measures   on  : For every  ,
  as  
  • A class is a (strong) uniform Glivenko-Cantelli class if it satisfies the stronger condition that for every  ,
  as  

Glivenko–Cantelli classes of functions (as well as their uniform and universal forms) are defined similarly, replacing all instances of   with  .

The weak and strong versions of the various Glivenko-Cantelli properties often coincide under certain regularity conditions. The following definition commonly appears in such regularity conditions:

  • A class of functions   is image-admissible Suslin if there exists a Suslin space   and a surjection   such that the map   is measurable  .
  • A class of measurable sets   is image-admissible Suslin if the class of functions   is image-admissible Suslin, where   denotes the indicator function for the set  .


Theorems

The following two theorems give sufficient conditions for the weak and strong versions of the Glivenko-Cantelli property to be equivalent.

Theorem (Talagrand, 1987)[6]

Let   be a class of functions that is integrable  , and define  . Then the following are equivalent:
  •   is a weak Glivenko-Cantelli class and   is dominated by an integrable function
  •   is a Glivenko-Cantelli class


Theorem (Dudley, Giné, and Zinn, 1991)[7]

Suppose that a function class   is bounded. Also suppose that the set   is image-admissible Suslin. Then   is a weak uniform Glivenko-Cantelli class if and only if it is a strong uniform Glivenko-Cantelli class.

The following theorem is central to statistical learning of binary classification tasks.

Theorem (Vapnik and Chervonenkis, 1968)[8]

Under certain consistency conditions, a universally measurable class of sets   is a uniform Glivenko-Cantelli class if and only if it is a Vapnik–Chervonenkis class.

There exist a variety of consistency conditions for the equivalence of uniform Glivenko-Cantelli and Vapnik-Chervonenkis classes. In particular, either of the following conditions for a class   suffice:[9]

  •   is image-admissible Suslin.
  •   is universally separable: There exists a countable subset   of   such that each set   can be written as the pointwise limit of sets in  .

Examples edit

  • Let   and  . The classical Glivenko–Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem,
 , that is   is uniformly Glivenko–Cantelli class.
  • Let P be a nonatomic probability measure on S and   be a class of all finite subsets in S. Because  ,  ,  , we have that   and so   is not a GC class with respect to P.

See also edit

References edit

  1. ^ Howard G.Tucker (1959). "A Generalization of the Glivenko–Cantelli Theorem". The Annals of Mathematical Statistics. 30 (3): 828–830. doi:10.1214/aoms/1177706212. JSTOR 2237422.
  2. ^ van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. p. 279. ISBN 978-0-521-78450-4.
  3. ^ a b van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge University Press. ISBN 978-0-521-78450-4.
  4. ^ Glivenko, V. (1933). "Sulla determinazione empirica delle leggi di probabilità". Giorn. Ist. Ital. Attuari (in Italian). 4: 92–99.
  5. ^ Cantelli, F.P. (1933). "Sulla determinazione empirica delle leggi di probabilità". Giorn. Ist. Ital. Attuari. 4: 421–424.
  6. ^ Talagrand, M. (1987). "The Glivenko-Cantelli Problem". Annals of Probability. 15: 837–870. doi:10.1214/AOP/1176992069.
  7. ^ Dudley, Richard M.; Giné, Eva; Zinn, Joel C. (1991). "Uniform and universal Glivenko-Cantelli classes". Journal of Theoretical Probability. 4: 485–510. doi:10.1007/BF01210321.
  8. ^ Vapnik, V.N.; Chervonenkis, A.Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264–280. doi:10.1137/1116025.
  9. ^ Pestov, Vladimir (2011). "PAC learnability versus VC dimension: A footnote to a basic result of statistical learning". The 2011 International Joint Conference on Neural Networks. pp. 1141–1145. arXiv:1104.2097. doi:10.1109/IJCNN.2011.6033352.

Further reading edit

glivenko, cantelli, theorem, theory, probability, sometimes, referred, fundamental, theorem, statistics, named, after, valery, ivanovich, glivenko, francesco, paolo, cantelli, describes, asymptotic, behaviour, empirical, distribution, function, number, indepen. In the theory of probability the Glivenko Cantelli theorem sometimes referred to as the Fundamental Theorem of Statistics named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows 1 Specifically the empirical distribution function converges uniformly to the true distribution function almost surely The left diagram illustrates Glivenko Cantelli theorem for uniform distributions The right diagram illustrates the Donsker Skorokhod Kolmogorov theorem The same diagram for normal distributions The uniform convergence of more general empirical measures becomes an important property of the Glivenko Cantelli classes of functions or sets 2 The Glivenko Cantelli classes arise in Vapnik Chervonenkis theory with applications to machine learning Applications can be found in econometrics making use of M estimators Contents 1 Statement 2 Proof 3 Empirical measures 4 Glivenko Cantelli class 5 Examples 6 See also 7 References 8 Further readingStatement editAssume that X 1 X 2 displaystyle X 1 X 2 dots nbsp are independent and identically distributed random variables in R displaystyle mathbb R nbsp with common cumulative distribution function F x displaystyle F x nbsp The empirical distribution function for X 1 X n displaystyle X 1 dots X n nbsp is defined by F n x 1 n i 1 n I X i x 1 n i X i x 1 i n displaystyle F n x frac 1 n sum i 1 n I X i infty x frac 1 n bigl left i mid X i leq x 1 leq i leq n right bigr nbsp where I C displaystyle I C nbsp is the indicator function of the set C displaystyle C nbsp For every fixed x displaystyle x nbsp F n x displaystyle F n x nbsp is a sequence of random variables which converge to F x displaystyle F x nbsp almost surely by the strong law of large numbers Glivenko and Cantelli strengthened this result by proving uniform convergence of F n displaystyle F n nbsp to F displaystyle F nbsp Theorem F n F sup x R F n x F x 0 displaystyle F n F infty sup x in mathbb R bigl F n x F x bigr longrightarrow 0 nbsp almost surely 3 p 265 This theorem originates with Valery Glivenko 4 and Francesco Cantelli 5 in 1933 Remarks If X n displaystyle X n nbsp is a stationary ergodic process then F n x displaystyle F n x nbsp converges almost surely to F x E 1 X 1 x displaystyle F x operatorname mathbb E bigl 1 X 1 leq x bigr nbsp The Glivenko Cantelli theorem gives a stronger mode of convergence than this in the iid case An even stronger uniform convergence result for the empirical distribution function is available in the form of an extended type of law of the iterated logarithm 3 p 268 See asymptotic properties of the empirical distribution function for this and related results Proof editFor simplicity consider a case of continuous random variable X displaystyle X nbsp Fix x 0 lt x 1 lt lt x m 1 lt x m displaystyle infty x 0 lt x 1 lt cdots lt x m 1 lt x m infty nbsp such that F x j F x j 1 1 m displaystyle F x j F x j 1 frac 1 m nbsp for j 1 m displaystyle j 1 dots m nbsp Now for all x R displaystyle x in mathbb R nbsp there exists j 1 m displaystyle j in 1 dots m nbsp such that x x j 1 x j displaystyle x in x j 1 x j nbsp F n x F x F n x j F x j 1 F n x j F x j 1 m F n x F x F n x j 1 F x j F n x j 1 F x j 1 1 m displaystyle begin aligned F n x F x amp leq F n x j F x j 1 F n x j F x j frac 1 m F n x F x amp geq F n x j 1 F x j F n x j 1 F x j 1 frac 1 m end aligned nbsp Therefore F n F sup x R F n x F x max j 1 m F n x j F x j 1 m displaystyle F n F infty sup x in mathbb R F n x F x leq max j in 1 dots m F n x j F x j frac 1 m nbsp Since max j 1 m F n x j F x j 0 a s textstyle max j in 1 dots m F n x j F x j to 0 text a s nbsp by strong law of large numbers we can guarantee that for any positive e textstyle varepsilon nbsp and any integer m textstyle m nbsp such that 1 m lt e textstyle 1 m lt varepsilon nbsp we can find N textstyle N nbsp such that for all n N displaystyle n geq N nbsp we have max j 1 m F n x j F x j e 1 m a s textstyle max j in 1 dots m F n x j F x j leq varepsilon 1 m text a s nbsp Combined with the above result this further implies that F n F e a s textstyle F n F infty leq varepsilon text a s nbsp which is the definition of almost sure convergence Empirical measures editOne can generalize the empirical distribution function by replacing the set x displaystyle infty x nbsp by an arbitrary set C from a class of sets C displaystyle mathcal C nbsp to obtain an empirical measure indexed by sets C C displaystyle C in mathcal C nbsp P n C 1 n i 1 n I C X i C C displaystyle P n C frac 1 n sum i 1 n I C X i C in mathcal C nbsp Where I C x displaystyle I C x nbsp is the indicator function of each set C displaystyle C nbsp Further generalization is the map induced by P n displaystyle P n nbsp on measurable real valued functions f which is given by f P n f S f d P n 1 n i 1 n f X i f F displaystyle f mapsto P n f int S f dP n frac 1 n sum i 1 n f X i f in mathcal F nbsp Then it becomes an important property of these classes whether the strong law of large numbers holds uniformly on F displaystyle mathcal F nbsp or C displaystyle mathcal C nbsp Glivenko Cantelli class editConsider a set S displaystyle mathcal S nbsp with a sigma algebra of Borel subsets A and a probability measure P displaystyle mathbb P nbsp For a class of subsets C C C is measurable subset of S displaystyle mathcal C subset bigl C C text is measurable subset of mathcal S bigr nbsp and a class of functions F f S R f is measurable displaystyle mathcal F subset bigl f mathcal S to mathbb R f mbox is measurable bigr nbsp define random variables P n P C sup C C P n C P C displaystyle bigl mathbb P n mathbb P bigr mathcal C sup C in mathcal C bigl mathbb P n C mathbb P C bigr nbsp P n P F sup f F P n f P f displaystyle bigl mathbb P n mathbb P bigr mathcal F sup f in mathcal F bigl mathbb P n f mathbb P f bigr nbsp where P n C displaystyle mathbb P n C nbsp is the empirical measure P n f displaystyle mathbb P n f nbsp is the corresponding map and P f S f d P displaystyle mathbb P f int mathcal S f mathrm d mathbb P nbsp assuming that it exists Definitions A class C displaystyle mathcal C nbsp is called a Glivenko Cantelli class or GC class or sometimes strong GC class with respect to a probability measure P if P n P C 0 displaystyle bigl mathbb P n mathbb P bigr mathcal C to 0 nbsp almost surely as n displaystyle n to infty nbsp dd A class is C displaystyle mathcal C nbsp is a weak Glivenko Cantelli class with respect to P if it instead satisfies the weaker condition P n P C 0 displaystyle bigl mathbb P n mathbb P bigr mathcal C to 0 nbsp in probability as n displaystyle n to infty nbsp dd A class is called a universal Glivenko Cantelli class if it is a GC class with respect to any probability measure P displaystyle mathbb P nbsp on S A displaystyle mathcal S A nbsp A class is a weak uniform Glivenko Cantelli class if the convergence occurs uniformly over all probability measures P displaystyle mathbb P nbsp on S A displaystyle mathcal S A nbsp For every e gt 0 displaystyle varepsilon gt 0 nbsp sup P P S A Pr P n P C gt e 0 displaystyle sup mathbb P in mathbb P mathcal S A Pr left bigl mathbb P n mathbb P bigr mathcal C gt varepsilon right to 0 nbsp as n displaystyle n to infty nbsp dd A class is a strong uniform Glivenko Cantelli class if it satisfies the stronger condition that for every e gt 0 displaystyle varepsilon gt 0 nbsp sup P P S A Pr sup m n P m P C gt e 0 displaystyle sup mathbb P in mathbb P mathcal S A Pr left sup m geq n bigl mathbb P m mathbb P bigr mathcal C gt varepsilon right to 0 nbsp as n displaystyle n to infty nbsp dd Glivenko Cantelli classes of functions as well as their uniform and universal forms are defined similarly replacing all instances of C displaystyle mathcal C nbsp with F displaystyle mathcal F nbsp The weak and strong versions of the various Glivenko Cantelli properties often coincide under certain regularity conditions The following definition commonly appears in such regularity conditions A class of functions F displaystyle mathcal F nbsp is image admissible Suslin if there exists a Suslin space W displaystyle Omega nbsp and a surjection T W F displaystyle T Omega rightarrow mathcal F nbsp such that the map x y T y x displaystyle x y mapsto T y x nbsp is measurable X W displaystyle mathcal X times Omega nbsp A class of measurable sets C displaystyle mathcal C nbsp is image admissible Suslin if the class of functions 1 C C C displaystyle mathbf 1 C mid C in mathcal C nbsp is image admissible Suslin where 1 C displaystyle mathbf 1 C nbsp denotes the indicator function for the set C displaystyle C nbsp TheoremsThe following two theorems give sufficient conditions for the weak and strong versions of the Glivenko Cantelli property to be equivalent Theorem Talagrand 1987 6 Let F displaystyle mathcal F nbsp be a class of functions that is integrable P displaystyle mathbb P nbsp and define F 0 f P f f F displaystyle mathcal F 0 f mathbb P f mid f in mathcal F nbsp Then the following are equivalent F displaystyle mathcal F nbsp is a weak Glivenko Cantelli class and F 0 displaystyle mathcal F 0 nbsp is dominated by an integrable function F displaystyle mathcal F nbsp is a Glivenko Cantelli class Theorem Dudley Gine and Zinn 1991 7 Suppose that a function class F displaystyle mathcal F nbsp is bounded Also suppose that the set F 0 f inf f f F displaystyle mathcal F 0 f inf f mid f in mathcal F nbsp is image admissible Suslin Then F displaystyle mathcal F nbsp is a weak uniform Glivenko Cantelli class if and only if it is a strong uniform Glivenko Cantelli class The following theorem is central to statistical learning of binary classification tasks Theorem Vapnik and Chervonenkis 1968 8 Under certain consistency conditions a universally measurable class of sets C displaystyle mathcal C nbsp is a uniform Glivenko Cantelli class if and only if it is a Vapnik Chervonenkis class There exist a variety of consistency conditions for the equivalence of uniform Glivenko Cantelli and Vapnik Chervonenkis classes In particular either of the following conditions for a class C displaystyle mathcal C nbsp suffice 9 C displaystyle mathcal C nbsp is image admissible Suslin C displaystyle mathcal C nbsp is universally separable There exists a countable subset C 0 displaystyle mathcal C 0 nbsp of C displaystyle mathcal C nbsp such that each set C C displaystyle C in mathcal C nbsp can be written as the pointwise limit of sets in C 0 displaystyle mathcal C 0 nbsp Examples editLet S R displaystyle S mathbb R nbsp and C t t R displaystyle mathcal C infty t t in mathbb R nbsp The classical Glivenko Cantelli theorem implies that this class is a universal GC class Furthermore by Kolmogorov s theorem sup P P S A P n P C n 1 2 displaystyle sup P in mathcal P S A P n P mathcal C sim n 1 2 nbsp that is C displaystyle mathcal C nbsp is uniformly Glivenko Cantelli class Let P be a nonatomic probability measure on S and C displaystyle mathcal C nbsp be a class of all finite subsets in S Because A n X 1 X n C displaystyle A n X 1 ldots X n in mathcal C nbsp P A n 0 displaystyle P A n 0 nbsp P n A n 1 displaystyle P n A n 1 nbsp we have that P n P C 1 displaystyle P n P mathcal C 1 nbsp and so C displaystyle mathcal C nbsp is not a GC class with respect to P See also editDonsker s theorem Dvoretzky Kiefer Wolfowitz inequality strengthens the Glivenko Cantelli theorem by quantifying the rate of convergence References edit Howard G Tucker 1959 A Generalization of the Glivenko Cantelli Theorem The Annals of Mathematical Statistics 30 3 828 830 doi 10 1214 aoms 1177706212 JSTOR 2237422 van der Vaart A W 1998 Asymptotic Statistics Cambridge University Press p 279 ISBN 978 0 521 78450 4 a b van der Vaart A W 1998 Asymptotic Statistics Cambridge University Press ISBN 978 0 521 78450 4 Glivenko V 1933 Sulla determinazione empirica delle leggi di probabilita Giorn Ist Ital Attuari in Italian 4 92 99 Cantelli F P 1933 Sulla determinazione empirica delle leggi di probabilita Giorn Ist Ital Attuari 4 421 424 Talagrand M 1987 The Glivenko Cantelli Problem Annals of Probability 15 837 870 doi 10 1214 AOP 1176992069 Dudley Richard M Gine Eva Zinn Joel C 1991 Uniform and universal Glivenko Cantelli classes Journal of Theoretical Probability 4 485 510 doi 10 1007 BF01210321 Vapnik V N Chervonenkis A Ya 1971 On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities Theory of Probability amp Its Applications 16 2 264 280 doi 10 1137 1116025 Pestov Vladimir 2011 PAC learnability versus VC dimension A footnote to a basic result of statistical learning The 2011 International Joint Conference on Neural Networks pp 1141 1145 arXiv 1104 2097 doi 10 1109 IJCNN 2011 6033352 Further reading editDudley R M 1999 Uniform Central Limit Theorems Cambridge University Press ISBN 0 521 46102 2 Pitman E J G 1979 The Sample Distribution Function Some Basic Theory for Statistical Inference London UK Chapman and Hall p 79 97 ISBN 0 470 26554 X Shorack G R Wellner J A 1986 Empirical Processes with Applications to Statistics Wiley ISBN 0 471 86725 X van der Vaart A W Wellner J A 1996 Weak Convergence and Empirical Processes Springer ISBN 0 387 94640 3 van der Vaart Aad W Wellner Jon A 1996 Glivenko Cantelli Theorems Springer van der Vaart Aad W Wellner Jon A 2000 Preservation Theorems for Glivenko Cantelli and Uniform Glivenko Cantelli Classes Springer Retrieved from https en wikipedia org w index php title Glivenko Cantelli theorem amp oldid 1219972505, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.