fbpx
Wikipedia

Law of large numbers

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.[1]

An illustration of the law of large numbers using a particular run of rolls of a single die. As the number of rolls in this run increases, the average of the values of all the results approaches 3.5. Although each run would show a distinctive shape over a small number of throws (at the left), over a large number of rolls (to the right) the shapes would be extremely similar.

The LLN is important because it guarantees stable long-term results for the averages of some random events.[1][2] For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a large number of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy).

The LLN only applies to the average. Therefore, while

other formulas that look similar are not verified, such as the raw deviation from "theoretical results":

not only does it not converge toward zero as n increases, but it tends to increase in absolute value as n increases.

Examples Edit

For example, a single roll of a fair, six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal probability. Therefore, the expected value of the average of the rolls is:

 

According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their values (sometimes called the sample mean) will approach 3.5, with the precision increasing as more dice are rolled.

It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the theoretical probability of success, and the average of n such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency.

For example, a fair coin toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to 12. Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly 12. In particular, the proportion of heads after n flips will almost surely converge to 12 as n approaches infinity.

Although the proportion of heads (and tails) approaches 12, almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, the expected difference grows, but at a slower rate than the number of flips.

Another good example of the LLN is the Monte Carlo method. These methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The larger the number of repetitions, the better the approximation tends to be. The reason that this method is important is mainly that, sometimes, it is difficult or impossible to use other approaches.[3]

Limitation Edit

The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of n results taken from the Cauchy distribution or some Pareto distributions (α<1) will not converge as n becomes larger; the reason is heavy tails. The Cauchy distribution and the Pareto distribution represent two cases: the Cauchy distribution does not have an expectation,[4] whereas the expectation of the Pareto distribution (α<1) is infinite.[5] One way to generate the Cauchy-distributed example is where the random numbers equal the tangent of an angle uniformly distributed between −90° and +90°.[6] The median is zero, but the expected value does not exist, and indeed the average of n such variables have the same distribution as one such variable. It does not converge in probability toward zero (or any other value) as n goes to infinity.

And if the trials embed a selection bias, typical in human economic/rational behaviour, the law of large numbers does not help in solving the bias. Even if the number of trials is increased the selection bias remains.

History Edit

 
Diffusion is an example of the law of large numbers. Initially, there are solute molecules on the left side of a barrier (magenta line) and none on the right. The barrier is removed, and the solute diffuses to fill the whole container.
  • Top: With a single molecule, the motion appears to be quite random.
  • Middle: With more molecules, there is clearly a trend where the solute fills the container more and more uniformly, but there are also random fluctuations.
  • Bottom: With an enormous number of solute molecules (too many to see), the randomness is essentially gone: The solute appears to move smoothly and systematically from high-concentration areas to low-concentration areas. In realistic situations, chemists can describe diffusion as a deterministic macroscopic phenomenon (see Fick's laws), despite its underlying random nature.

The Italian mathematician Gerolamo Cardano (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials.[7] This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by Jacob Bernoulli.[8] It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's theorem". This should not be confused with Bernoulli's principle, named after Jacob Bernoulli's nephew Daniel Bernoulli. In 1837, S. D. Poisson further described it under the name "la loi des grands nombres" ("the law of large numbers").[9][10] Thereafter, it was known under both names, but the "law of large numbers" is most frequently used.

After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev,[11] Markov, Borel, Cantelli, Kolmogorov and Khinchin. Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the expected value exists for the weak law of large numbers to be true.[12][13] These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law, in reference to two different modes of convergence of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.[12]

Forms Edit

There are two different versions of the law of large numbers that are described below. They are called the strong law of large numbers and the weak law of large numbers.[14][1] Stated for the case where X1, X2, ... is an infinite sequence of independent and identically distributed (i.i.d.) Lebesgue integrable random variables with expected value E(X1) = E(X2) = ... = µ, both versions of the law state that the sample average

 

converges to the expected value:

 

 

 

 

 

(1)

(Lebesgue integrability of Xj means that the expected value E(Xj) exists according to Lebesgue integration and is finite. It does not mean that the associated probability measure is absolutely continuous with respect to Lebesgue measure.)

Introductory probability texts often additionally assume identical finite variance   (for all  ) and no correlation between random variables. In that case, the variance of the average of n random variables is

 

which can be used to shorten and simplify the proofs. This assumption of finite variance is not necessary. Large or infinite variance will make the convergence slower, but the LLN holds anyway.[15]

Mutual independence of the random variables can be replaced by pairwise independence[16] or exchangeability[17] in both versions of the law.

The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see Convergence of random variables.

Weak law Edit

 
 
 
Simulation illustrating the law of large numbers. Each frame, a coin that is red on one side and blue on the other is flipped, and a dot is added in the corresponding column. A pie chart shows the proportion of red and blue so far. Notice that while the proportion varies significantly at first, it approaches 50% as the number of trials increases.

The weak law of large numbers (also called Khinchin's law) states that the sample average converges in probability towards the expected value[18]

 

 

 

 

 

(2)

That is, for any positive number ε,

 

Interpreting this result, the weak law states that for any nonzero margin specified (ε), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.

As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by Chebyshev as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first n values goes to zero as n goes to infinity.[13] As an example, assume that each random variable in the series follows a Gaussian distribution (normal distribution) with mean zero, but with variance equal to  , which is not bounded. At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is asymptotic to  . The variance of the average is therefore asymptotic to   and goes to zero.

There are also examples of the weak law applying even though the expected value does not exist.

Strong law Edit

The strong law of large numbers (also called Kolmogorov's law) states that the sample average converges almost surely to the expected value[19]

 

 

 

 

 

(3)

That is,

 

What this means is that the probability that, as the number of trials n goes to infinity, the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate subsequence.[15]

The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem. This view justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-term average".

Law 3 is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability). See differences between the weak law and the strong law.

The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on something (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that).[20]

If the summands are independent but not identically distributed, then

 

 

 

 

 

(2)

provided that each Xk has a finite second moment and

 

This statement is known as Kolmogorov's strong law, see e.g. Sen & Singer (1993, Theorem 2.3.10).

Differences between the weak law and the strong law Edit

The weak law states that for a specified large n, the average   is likely to be near μ. Thus, it leaves open the possibility that   happens an infinite number of times, although at infrequent intervals. (Not necessarily   for all n).

The strong law shows that this almost surely will not occur. It does not imply that with probability 1, we have that for any ε > 0 the inequality   holds for all large enough n, since the convergence is not necessarily uniform on the set where it holds.[21]

The strong law does not hold in the following cases, but the weak law does.[22][23]

  1. Let X be an exponentially distributed random variable with parameter 1. The random variable   has no expected value according to Lebesgue integration, but using conditional convergence and interpreting the integral as a Dirichlet integral, which is an improper Riemann integral, we can say:
     
  2. Let X be a geometrically distributed random variable with probability 0.5. The random variable   does not have an expected value in the conventional sense because the infinite series is not absolutely convergent, but using conditional convergence, we can say:
     
  3. If the cumulative distribution function of a random variable is
     
    then it has no expected value, but the weak law is true.[24][25]
  4. Let Xk be plus or minus   (starting at sufficiently large k so that the denominator is positive) with probability 12 for each.[20] The variance of Xk is then   Kolmogorov's strong law does not apply because the partial sum in his criterion up to k = n is asymptotic to   and this is unbounded. If we replace the random variables with Gaussian variables having the same variances, namely  , then the average at any point will also be normally distributed. The width of the distribution of the average will tend toward zero (standard deviation asymptotic to  ), but for a given ε, there is probability which does not go to zero with n, while the average sometime after the nth trial will come back up to ε. Since the width of the distribution of the average is not zero, it must have a positive lower bound p(ε), which means there is a probability of at least p(ε) that the average will attain ε after n trials. It will happen with probability p(ε)/2 before some m which depends on n. But even after m, there is still a probability of at least p(ε) that it will happen. (This seems to indicate that p(ε)=1 and the average will attain ε an infinite number of times.)

Uniform law of large numbers Edit

Suppose f(x,θ) is some function defined for θ ∈ Θ, and continuous in θ. Then for any fixed θ, the sequence {f(X1,θ), f(X2,θ), ...} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E[f(X,θ)]. This is the pointwise (in θ) convergence.

The uniform law of large numbers states the conditions under which the convergence happens uniformly in θ. If[26][27]

  1. Θ is compact,
  2. f(x,θ) is continuous at each θ ∈ Θ for almost all xs, and measurable function of x at each θ.
  3. there exists a dominating function d(x) such that E[d(X)] < ∞, and
     

Then E[f(X,θ)] is continuous in θ, and

 

This result is useful to derive consistency of a large class of estimators (see Extremum estimator).

Borel's law of large numbers Edit

Borel's law of large numbers, named after Émile Borel, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if E denotes the event in question, p its probability of occurrence, and Nn(E) the number of times E occurs in the first n trials, then with probability one,[28]

 

This theorem makes rigorous the intuitive notion of probability as the long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.

Chebyshev's inequality. Let X be a random variable with finite expected value μ and finite non-zero variance σ2. Then for any real number k > 0,

 

Proof of the weak law Edit

Given X1, X2, ... an infinite sequence of i.i.d. random variables with finite expected value  , we are interested in the convergence of the sample average

 

The weak law of large numbers states:

 

 

 

 

 

(2)

Proof using Chebyshev's inequality assuming finite variance Edit

This proof uses the assumption of finite variance   (for all  ). The independence of the random variables implies no correlation between them, and we have that

 

The common mean μ of the sequence is the mean of the sample average:

 

Using Chebyshev's inequality on   results in

 

This may be used to obtain the following:

 

As n approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

 

 

 

 

 

(2)

Proof using convergence of characteristic functions Edit

By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as

 

All X1, X2, ... have the same characteristic function, so we will simply denote this φX.

Among the basic properties of characteristic functions there are

 
if X and Y are independent.

These rules can be used to calculate the characteristic function of   in terms of φX:

 

The limit eitμ is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem,   converges in distribution to μ:

 

μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see Convergence of random variables.) Therefore,

 

 

 

 

 

(2)

This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.

Proof of the strong law Edit

We give a relatively simple proof of the strong law under the assumptions that the   are iid,  ,  , and  .

Let us first note that without loss of generality we can assume that   by centering. In this case, the strong law says that

 
or
 
It is equivalent to show that
 
Note that
 
and thus to prove the strong law we need to show that for every  , we have
 
Define the events  , and if we can show that
 
then the Borel-Cantelli Lemma implies the result. So let us estimate  .

We compute

 
We first claim that every term of the form   where all subscripts are distinct, must have zero expectation. This is because   by independence, and the last term is zero --- and similarly for the other terms. Therefore the only terms in the sum with nonzero expectation are   and  . Since the   are identically distributed, all of these are the same, and moreover  .

There are   terms of the form   and   terms of the form  , and so

 
Note that the right-hand side is a quadratic polynomial in  , and as such there exists a   such that   for   sufficiently large. By Chebyshev,
 
for   sufficiently large, and therefore this series is summable. Since this holds for any  , we have established the Strong LLN.


Another proof can be found in [29]

For a proof without the added assumption of a finite fourth moment, see Section 22 of.[30]

Consequences Edit

The law of large numbers provides an expectation of an unknown distribution from a realization of the sequence, but also any feature of the probability distribution.[1] By applying Borel's law of large numbers, one could easily obtain the probability mass function. For each event in the objective probability mass function, one could approximate the probability of the event's occurrence with the proportion of times that any specified event occurs. The larger the number of repetitions, the better the approximation. As for the continuous case:  , for small positive h. Thus, for large n:

 

With this method, one can cover the whole x-axis with a grid (with grid size 2h) and obtain a bar graph which is called a histogram.

See also Edit

Notes Edit

  1. ^ a b c d Dekking, Michel (2005). A Modern Introduction to Probability and Statistics. Springer. pp. 181–190. ISBN 9781852338961.
  2. ^ Yao, Kai; Gao, Jinwu (2016). "Law of Large Numbers for Uncertain Random Variables". IEEE Transactions on Fuzzy Systems. 24 (3): 615–621. doi:10.1109/TFUZZ.2015.2466080. ISSN 1063-6706. S2CID 2238905.
  3. ^ Kroese, Dirk P.; Brereton, Tim; Taimre, Thomas; Botev, Zdravko I. (2014). "Why the Monte Carlo method is so important today". Wiley Interdisciplinary Reviews: Computational Statistics. 6 (6): 386–392. doi:10.1002/wics.1314. S2CID 18521840.
  4. ^ Dekking, Michel (2005). A Modern Introduction to Probability and Statistics. Springer. pp. 92. ISBN 9781852338961.
  5. ^ Dekking, Michel (2005). A Modern Introduction to Probability and Statistics. Springer. pp. 63. ISBN 9781852338961.
  6. ^ Pitman, E. J. G.; Williams, E. J. (1967). "Cauchy-Distributed Functions of Cauchy Variates". The Annals of Mathematical Statistics. 38 (3): 916–918. ISSN 0003-4851.
  7. ^ Mlodinow, L. (2008). The Drunkard's Walk. New York: Random House. p. 50.
  8. ^ Bernoulli, Jakob (1713). "4". Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis (in Latin). Translated by Sheynin, Oscar.
  9. ^ Poisson names the "law of large numbers" (la loi des grands nombres) in: Poisson, S. D. (1837). Probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilitiés (in French). Paris, France: Bachelier. p. 7. He attempts a two-part proof of the law on pp. 139–143 and pp. 277 ff.
  10. ^ Hacking, Ian (1983). "19th-century Cracks in the Concept of Determinism". Journal of the History of Ideas. 44 (3): 455–475. doi:10.2307/2709176. JSTOR 2709176.
  11. ^ Tchebichef, P. (1846). "Démonstration élémentaire d'une proposition générale de la théorie des probabilités". Journal für die reine und angewandte Mathematik (in French). 1846 (33): 259–267. doi:10.1515/crll.1846.33.259. S2CID 120850863.
  12. ^ a b Seneta 2013.
  13. ^ a b Yuri Prohorov. "Law of large numbers". Encyclopedia of Mathematics. EMS Press.
  14. ^ Bhattacharya, Rabi; Lin, Lizhen; Patrangenaru, Victor (2016). A Course in Mathematical Statistics and Large Sample Theory. Springer Texts in Statistics. New York, NY: Springer New York. doi:10.1007/978-1-4939-4032-5. ISBN 978-1-4939-4030-1.
  15. ^ a b "The strong law of large numbers – What's new". Terrytao.wordpress.com. 19 June 2008. Retrieved 2012-06-09.
  16. ^ Etemadi, N. Z. (1981). "An elementary proof of the strong law of large numbers". Wahrscheinlichkeitstheorie Verw Gebiete. 55 (1): 119–122. doi:10.1007/BF01013465. S2CID 122166046.
  17. ^ Kingman, J. F. C. (April 1978). "Uses of Exchangeability". The Annals of Probability. 6 (2). doi:10.1214/aop/1176995566. ISSN 0091-1798.
  18. ^ Loève 1977, Chapter 1.4, p. 14
  19. ^ Loève 1977, Chapter 17.3, p. 251
  20. ^ a b Yuri Prokhorov. "Strong law of large numbers". Encyclopedia of Mathematics.
  21. ^ Ross (2009)
  22. ^ Lehmann, Erich L.; Romano, Joseph P. (2006-03-30). Weak law converges to constant. Springer. ISBN 9780387276052.
  23. ^ Dguvl Hun Hong; Sung Ho Lee (1998). (PDF). Communications of the Korean Mathematical Society. 13 (2): 385–391. Archived from the original (PDF) on 2016-07-01. Retrieved 2014-06-28.
  24. ^ Mukherjee, Sayan. (PDF). Archived from the original (PDF) on 2013-03-09. Retrieved 2014-06-28.
  25. ^ J. Geyer, Charles. "Law of large numbers" (PDF).
  26. ^ Newey & McFadden 1994, Lemma 2.4
  27. ^ Jennrich, Robert I. (1969). "Asymptotic Properties of Non-Linear Least Squares Estimators". The Annals of Mathematical Statistics. 40 (2): 633–643. doi:10.1214/aoms/1177697731.
  28. ^ Wen, Liu (1991). "An Analytic Technique to Prove Borel's Strong Law of Large Numbers". The American Mathematical Monthly. 98 (2): 146–148. doi:10.2307/2323947. JSTOR 2323947.
  29. ^ Etemadi, Nasrollah (1981). "An elementary proof of the strong law of large numbers". Zeitschrift f{\"u}r Wahrscheinlichkeitstheorie und verwandte Gebiete. Springer. 55: 119–122. doi:10.1007/BF01013465. S2CID 122166046.
  30. ^ Billingsley, Patrick (1979). Probability and Measure.

References Edit

  • Grimmett, G. R.; Stirzaker, D. R. (1992). Probability and Random Processes (2nd ed.). Oxford: Clarendon Press. ISBN 0-19-853665-8.
  • Durrett, Richard (1995). Probability: Theory and Examples (2nd ed.). Duxbury Press.
  • Martin Jacobsen (1992). Videregående Sandsynlighedsregning [Advanced Probability Theory] (in Danish) (3rd ed.). Copenhagen: HCØ-tryk. ISBN 87-91180-71-6.
  • Loève, Michel (1977). Probability theory 1 (4th ed.). Springer.
  • Newey, Whitney K.; McFadden, Daniel (1994). "36". Large sample estimation and hypothesis testing. Handbook of econometrics. Vol. IV. Elsevier Science. pp. 2111–2245.
  • Ross, Sheldon (2009). A first course in probability (8th ed.). Prentice Hall. ISBN 978-0-13-603313-4.
  • Sen, P. K; Singer, J. M. (1993). Large sample methods in statistics. Chapman & Hall.
  • Seneta, Eugene (2013). "A Tricentenary history of the Law of Large Numbers". Bernoulli. 19 (4): 1088–1121. arXiv:1309.6488. doi:10.3150/12-BEJSP12. S2CID 88520834.

External links Edit

  • "Law of large numbers", Encyclopedia of Mathematics, EMS Press, 2001 [1994]
  • Weisstein, Eric W. "Weak Law of Large Numbers". MathWorld.
  • Weisstein, Eric W. "Strong Law of Large Numbers". MathWorld.
  • by Yihui Xie using the R package animation
  • Apple CEO Tim Cook said something that would make statisticians cringe. "We don't believe in such laws as laws of large numbers. This is sort of, uh, old dogma, I think, that was cooked up by somebody [..]" said Tim Cook and while: "However, the law of large numbers has nothing to do with large companies, large revenues, or large growth rates. The law of large numbers is a fundamental concept in probability theory and statistics, tying together theoretical probabilities that we can calculate to the actual outcomes of experiments that we empirically perform. explained Business Insider

large, numbers, confused, with, truly, large, numbers, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspap. Not to be confused with law of truly large numbers This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Law of large numbers news newspapers books scholar JSTOR March 2015 Learn how and when to remove this template message In probability theory the law of large numbers LLN is a theorem that describes the result of performing the same experiment a large number of times According to the law the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed 1 An illustration of the law of large numbers using a particular run of rolls of a single die As the number of rolls in this run increases the average of the values of all the results approaches 3 5 Although each run would show a distinctive shape over a small number of throws at the left over a large number of rolls to the right the shapes would be extremely similar The LLN is important because it guarantees stable long term results for the averages of some random events 1 2 For example while a casino may lose money in a single spin of the roulette wheel its earnings will tend towards a predictable percentage over a large number of spins Any winning streak by a player will eventually be overcome by the parameters of the game Importantly the law applies as the name indicates only when a large number of observations are considered There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be balanced by the others see the gambler s fallacy The LLN only applies to the average Therefore whilelim n i 1 n X i n X displaystyle lim n to infty sum i 1 n frac X i n overline X other formulas that look similar are not verified such as the raw deviation from theoretical results i 1 n X i n X displaystyle sum i 1 n X i n times overline X not only does it not converge toward zero as n increases but it tends to increase in absolute value as n increases Contents 1 Examples 2 Limitation 3 History 4 Forms 4 1 Weak law 4 2 Strong law 4 3 Differences between the weak law and the strong law 4 4 Uniform law of large numbers 4 5 Borel s law of large numbers 5 Proof of the weak law 5 1 Proof using Chebyshev s inequality assuming finite variance 5 2 Proof using convergence of characteristic functions 6 Proof of the strong law 7 Consequences 8 See also 9 Notes 10 References 11 External linksExamples EditFor example a single roll of a fair six sided die produces one of the numbers 1 2 3 4 5 or 6 each with equal probability Therefore the expected value of the average of the rolls is 1 2 3 4 5 6 6 3 5 displaystyle frac 1 2 3 4 5 6 6 3 5 nbsp According to the law of large numbers if a large number of six sided dice are rolled the average of their values sometimes called the sample mean will approach 3 5 with the precision increasing as more dice are rolled It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability For a Bernoulli random variable the expected value is the theoretical probability of success and the average of n such variables assuming they are independent and identically distributed i i d is precisely the relative frequency For example a fair coin toss is a Bernoulli trial When a fair coin is flipped once the theoretical probability that the outcome will be heads is equal to 1 2 Therefore according to the law of large numbers the proportion of heads in a large number of coin flips should be roughly 1 2 In particular the proportion of heads after n flips will almost surely converge to 1 2 as n approaches infinity Although the proportion of heads and tails approaches 1 2 almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large That is the probability that the absolute difference is a small number approaches zero as the number of flips becomes large Also almost surely the ratio of the absolute difference to the number of flips will approach zero Intuitively the expected difference grows but at a slower rate than the number of flips Another good example of the LLN is the Monte Carlo method These methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results The larger the number of repetitions the better the approximation tends to be The reason that this method is important is mainly that sometimes it is difficult or impossible to use other approaches 3 Limitation EditThe average of the results obtained from a large number of trials may fail to converge in some cases For instance the average of n results taken from the Cauchy distribution or some Pareto distributions a lt 1 will not converge as n becomes larger the reason is heavy tails The Cauchy distribution and the Pareto distribution represent two cases the Cauchy distribution does not have an expectation 4 whereas the expectation of the Pareto distribution a lt 1 is infinite 5 One way to generate the Cauchy distributed example is where the random numbers equal the tangent of an angle uniformly distributed between 90 and 90 6 The median is zero but the expected value does not exist and indeed the average of n such variables have the same distribution as one such variable It does not converge in probability toward zero or any other value as n goes to infinity And if the trials embed a selection bias typical in human economic rational behaviour the law of large numbers does not help in solving the bias Even if the number of trials is increased the selection bias remains History Edit nbsp Diffusion is an example of the law of large numbers Initially there are solute molecules on the left side of a barrier magenta line and none on the right The barrier is removed and the solute diffuses to fill the whole container Top With a single molecule the motion appears to be quite random Middle With more molecules there is clearly a trend where the solute fills the container more and more uniformly but there are also random fluctuations Bottom With an enormous number of solute molecules too many to see the randomness is essentially gone The solute appears to move smoothly and systematically from high concentration areas to low concentration areas In realistic situations chemists can describe diffusion as a deterministic macroscopic phenomenon see Fick s laws despite its underlying random nature The Italian mathematician Gerolamo Cardano 1501 1576 stated without proof that the accuracies of empirical statistics tend to improve with the number of trials 7 This was then formalized as a law of large numbers A special form of the LLN for a binary random variable was first proved by Jacob Bernoulli 8 It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his Ars Conjectandi The Art of Conjecturing in 1713 He named this his Golden Theorem but it became generally known as Bernoulli s theorem This should not be confused with Bernoulli s principle named after Jacob Bernoulli s nephew Daniel Bernoulli In 1837 S D Poisson further described it under the name la loi des grands nombres the law of large numbers 9 10 Thereafter it was known under both names but the law of large numbers is most frequently used After Bernoulli and Poisson published their efforts other mathematicians also contributed to refinement of the law including Chebyshev 11 Markov Borel Cantelli Kolmogorov and Khinchin Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables it suffices that the expected value exists for the weak law of large numbers to be true 12 13 These further studies have given rise to two prominent forms of the LLN One is called the weak law and the other the strong law in reference to two different modes of convergence of the cumulative sample means to the expected value in particular as explained below the strong form implies the weak 12 Forms EditThere are two different versions of the law of large numbers that are described below They are called thestrong law of large numbers and the weak law of large numbers 14 1 Stated for the case where X1 X2 is an infinite sequence of independent and identically distributed i i d Lebesgue integrable random variables with expected value E X1 E X2 µ both versions of the law state that the sample averageX n 1 n X 1 X n displaystyle overline X n frac 1 n X 1 cdots X n nbsp converges to the expected value X n m as n displaystyle overline X n to mu quad textrm as n to infty nbsp 1 Lebesgue integrability of Xj means that the expected value E Xj exists according to Lebesgue integration and is finite It does not mean that the associated probability measure is absolutely continuous with respect to Lebesgue measure Introductory probability texts often additionally assume identical finite variance Var X i s 2 displaystyle operatorname Var X i sigma 2 nbsp for all i displaystyle i nbsp and no correlation between random variables In that case the variance of the average of n random variables isVar X n Var 1 n X 1 X n 1 n 2 Var X 1 X n n s 2 n 2 s 2 n displaystyle operatorname Var overline X n operatorname Var tfrac 1 n X 1 cdots X n frac 1 n 2 operatorname Var X 1 cdots X n frac n sigma 2 n 2 frac sigma 2 n nbsp which can be used to shorten and simplify the proofs This assumption of finite variance is not necessary Large or infinite variance will make the convergence slower but the LLN holds anyway 15 Mutual independence of the random variables can be replaced by pairwise independence 16 or exchangeability 17 in both versions of the law The difference between the strong and the weak version is concerned with the mode of convergence being asserted For interpretation of these modes see Convergence of random variables Weak law Edit nbsp nbsp nbsp Simulation illustrating the law of large numbers Each frame a coin that is red on one side and blue on the other is flipped and a dot is added in the corresponding column A pie chart shows the proportion of red and blue so far Notice that while the proportion varies significantly at first it approaches 50 as the number of trials increases The weak law of large numbers also called Khinchin s law states that the sample average converges in probability towards the expected value 18 X n P m when n displaystyle overline X n overset P rightarrow mu qquad textrm when n to infty nbsp 2 That is for any positive number e lim n Pr X n m lt e 1 displaystyle lim n to infty Pr left overline X n mu lt varepsilon right 1 nbsp Interpreting this result the weak law states that for any nonzero margin specified e no matter how small with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value that is within the margin As mentioned earlier the weak law applies in the case of i i d random variables but it also applies in some other cases For example the variance may be different for each random variable in the series keeping the expected value constant If the variances are bounded then the law applies as shown by Chebyshev as early as 1867 If the expected values change during the series then we can simply apply the law to the average deviation from the respective expected values The law then states that this converges in probability to zero In fact Chebyshev s proof works so long as the variance of the average of the first n values goes to zero as n goes to infinity 13 As an example assume that each random variable in the series follows a Gaussian distribution normal distribution with mean zero but with variance equal to 2 n log n 1 displaystyle 2n log n 1 nbsp which is not bounded At each stage the average will be normally distributed as the average of a set of normally distributed variables The variance of the sum is equal to the sum of the variances which is asymptotic to n 2 log n displaystyle n 2 log n nbsp The variance of the average is therefore asymptotic to 1 log n displaystyle 1 log n nbsp and goes to zero There are also examples of the weak law applying even though the expected value does not exist Strong law Edit The strong law of large numbers also called Kolmogorov s law states that the sample average converges almost surely to the expected value 19 X n a s m when n displaystyle overline X n overset text a s longrightarrow mu qquad textrm when n to infty nbsp 3 That is Pr lim n X n m 1 displaystyle Pr left lim n to infty overline X n mu right 1 nbsp What this means is that the probability that as the number of trials n goes to infinity the average of the observations converges to the expected value is equal to one The modern proof of the strong law is more complex than that of the weak law and relies on passing to an appropriate subsequence 15 The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem This view justifies the intuitive interpretation of the expected value for Lebesgue integration only of a random variable when sampled repeatedly as the long term average Law 3 is called the strong law because random variables which converge strongly almost surely are guaranteed to converge weakly in probability However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak in probability See differences between the weak law and the strong law The strong law applies to independent identically distributed random variables having an expected value like the weak law This was proved by Kolmogorov in 1930 It can also apply in other cases Kolmogorov also showed in 1933 that if the variables are independent and identically distributed then for the average to converge almost surely on something this can be considered another statement of the strong law it is necessary that they have an expected value and then of course the average will converge almost surely on that 20 If the summands are independent but not identically distributed then X n E X n a s 0 displaystyle overline X n operatorname E big overline X n big overset text a s longrightarrow 0 nbsp 2 provided that each Xk has a finite second moment and k 1 1 k 2 Var X k lt displaystyle sum k 1 infty frac 1 k 2 operatorname Var X k lt infty nbsp This statement is known as Kolmogorov s strong law see e g Sen amp Singer 1993 Theorem 2 3 10 Differences between the weak law and the strong law Edit The weak law states that for a specified large n the average X n displaystyle overline X n nbsp is likely to be near m Thus it leaves open the possibility that X n m gt e displaystyle overline X n mu gt varepsilon nbsp happens an infinite number of times although at infrequent intervals Not necessarily X n m 0 displaystyle overline X n mu neq 0 nbsp for all n The strong law shows that this almost surely will not occur It does not imply that with probability 1 we have that for any e gt 0 the inequality X n m lt e displaystyle overline X n mu lt varepsilon nbsp holds for all large enough n since the convergence is not necessarily uniform on the set where it holds 21 The strong law does not hold in the following cases but the weak law does 22 23 Let X be an exponentially distributed random variable with parameter 1 The random variable sin X e X X 1 displaystyle sin X e X X 1 nbsp has no expected value according to Lebesgue integration but using conditional convergence and interpreting the integral as a Dirichlet integral which is an improper Riemann integral we can say E sin X e X X x 0 sin x e x x e x d x p 2 displaystyle E left frac sin X e X X right int x 0 infty frac sin x e x x e x dx frac pi 2 nbsp Let X be a geometrically distributed random variable with probability 0 5 The random variable 2 X 1 X X 1 displaystyle 2 X 1 X X 1 nbsp does not have an expected value in the conventional sense because the infinite series is not absolutely convergent but using conditional convergence we can say E 2 X 1 X X x 1 2 x 1 x x 2 x ln 2 displaystyle E left frac 2 X 1 X X right sum x 1 infty frac 2 x 1 x x 2 x ln 2 nbsp If the cumulative distribution function of a random variable is 1 F x e 2 x ln x x e F x e 2 x ln x x e displaystyle begin cases 1 F x amp frac e 2x ln x amp x geq e F x amp frac e 2x ln x amp x leq e end cases nbsp then it has no expected value but the weak law is true 24 25 Let Xk be plus or minus k log log log k textstyle sqrt k log log log k nbsp starting at sufficiently large k so that the denominator is positive with probability 1 2 for each 20 The variance of Xk is then k log log log k displaystyle k log log log k nbsp Kolmogorov s strong law does not apply because the partial sum in his criterion up to k n is asymptotic to log n log log log n displaystyle log n log log log n nbsp and this is unbounded If we replace the random variables with Gaussian variables having the same variances namely k log log log k textstyle sqrt k log log log k nbsp then the average at any point will also be normally distributed The width of the distribution of the average will tend toward zero standard deviation asymptotic to 1 2 log log log n textstyle 1 sqrt 2 log log log n nbsp but for a given e there is probability which does not go to zero with n while the average sometime after the nth trial will come back up to e Since the width of the distribution of the average is not zero it must have a positive lower bound p e which means there is a probability of at least p e that the average will attain e after n trials It will happen with probability p e 2 before some m which depends on n But even after m there is still a probability of at least p e that it will happen This seems to indicate that p e 1 and the average will attain e an infinite number of times Uniform law of large numbers Edit Suppose f x 8 is some function defined for 8 8 and continuous in 8 Then for any fixed 8 the sequence f X1 8 f X2 8 will be a sequence of independent and identically distributed random variables such that the sample mean of this sequence converges in probability to E f X 8 This is the pointwise in 8 convergence The uniform law of large numbers states the conditions under which the convergence happens uniformly in 8 If 26 27 8 is compact f x 8 is continuous at each 8 8 for almost all xs and measurable function of x at each 8 there exists a dominating function d x such that E d X lt and f x 8 d x for all 8 8 displaystyle left f x theta right leq d x quad text for all theta in Theta nbsp Then E f X 8 is continuous in 8 andsup 8 8 1 n i 1 n f X i 8 E f X 8 P 0 displaystyle sup theta in Theta left frac 1 n sum i 1 n f X i theta operatorname E f X theta right overset mathrm P rightarrow 0 nbsp This result is useful to derive consistency of a large class of estimators see Extremum estimator Borel s law of large numbers Edit Borel s law of large numbers named after Emile Borel states that if an experiment is repeated a large number of times independently under identical conditions then the proportion of times that any specified event occurs approximately equals the probability of the event s occurrence on any particular trial the larger the number of repetitions the better the approximation tends to be More precisely if E denotes the event in question p its probability of occurrence and Nn E the number of times E occurs in the first n trials then with probability one 28 N n E n p as n displaystyle frac N n E n to p text as n to infty nbsp This theorem makes rigorous the intuitive notion of probability as the long run relative frequency of an event s occurrence It is a special case of any of several more general laws of large numbers in probability theory Chebyshev s inequality Let X be a random variable with finite expected value m and finite non zero variance s2 Then for any real number k gt 0 Pr X m k s 1 k 2 displaystyle Pr X mu geq k sigma leq frac 1 k 2 nbsp Proof of the weak law EditGiven X1 X2 an infinite sequence of i i d random variables with finite expected value E X 1 E X 2 m lt displaystyle E X 1 E X 2 cdots mu lt infty nbsp we are interested in the convergence of the sample averageX n 1 n X 1 X n displaystyle overline X n tfrac 1 n X 1 cdots X n nbsp The weak law of large numbers states X n P m when n displaystyle overline X n overset P rightarrow mu qquad textrm when n to infty nbsp 2 Proof using Chebyshev s inequality assuming finite variance Edit This proof uses the assumption of finite variance Var X i s 2 displaystyle operatorname Var X i sigma 2 nbsp for all i displaystyle i nbsp The independence of the random variables implies no correlation between them and we have thatVar X n Var 1 n X 1 X n 1 n 2 Var X 1 X n n s 2 n 2 s 2 n displaystyle operatorname Var overline X n operatorname Var tfrac 1 n X 1 cdots X n frac 1 n 2 operatorname Var X 1 cdots X n frac n sigma 2 n 2 frac sigma 2 n nbsp The common mean m of the sequence is the mean of the sample average E X n m displaystyle E overline X n mu nbsp Using Chebyshev s inequality on X n displaystyle overline X n nbsp results inP X n m e s 2 n e 2 displaystyle operatorname P left overline X n mu right geq varepsilon leq frac sigma 2 n varepsilon 2 nbsp This may be used to obtain the following P X n m lt e 1 P X n m e 1 s 2 n e 2 displaystyle operatorname P left overline X n mu right lt varepsilon 1 operatorname P left overline X n mu right geq varepsilon geq 1 frac sigma 2 n varepsilon 2 nbsp As n approaches infinity the expression approaches 1 And by definition of convergence in probability we have obtained X n P m when n displaystyle overline X n overset P rightarrow mu qquad textrm when n to infty nbsp 2 Proof using convergence of characteristic functions Edit By Taylor s theorem for complex functions the characteristic function of any random variable X with finite mean m can be written asf X t 1 i t m o t t 0 displaystyle varphi X t 1 it mu o t quad t rightarrow 0 nbsp All X1 X2 have the same characteristic function so we will simply denote this fX Among the basic properties of characteristic functions there aref 1 n X t f X t n and f X Y t f X t f Y t displaystyle varphi frac 1 n X t varphi X tfrac t n quad text and quad varphi X Y t varphi X t varphi Y t quad nbsp if X and Y are independent These rules can be used to calculate the characteristic function of X n displaystyle overline X n nbsp in terms of fX f X n t f X t n n 1 i m t n o t n n e i t m as n displaystyle varphi overline X n t left varphi X left t over n right right n left 1 i mu t over n o left t over n right right n rightarrow e it mu quad text as quad n to infty nbsp The limit eitm is the characteristic function of the constant random variable m and hence by the Levy continuity theorem X n displaystyle overline X n nbsp converges in distribution to m X n D m for n displaystyle overline X n overset mathcal D rightarrow mu qquad text for qquad n to infty nbsp m is a constant which implies that convergence in distribution to m and convergence in probability to m are equivalent see Convergence of random variables Therefore X n P m when n displaystyle overline X n overset P rightarrow mu qquad textrm when n to infty nbsp 2 This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin as long as the latter exists Proof of the strong law EditWe give a relatively simple proof of the strong law under the assumptions that the X i displaystyle X i nbsp are iid E X i m lt displaystyle mathbb E X i mu lt infty nbsp Var X i s 2 lt displaystyle operatorname Var X i sigma 2 lt infty nbsp and E X i 4 t lt displaystyle mathbb E X i 4 tau lt infty nbsp Let us first note that without loss of generality we can assume that m 0 displaystyle mu 0 nbsp by centering In this case the strong law says thatPr lim n X n 0 1 displaystyle Pr left lim n to infty overline X n 0 right 1 nbsp or Pr w lim n S n w n 0 1 displaystyle Pr left omega lim n to infty frac S n omega n 0 right 1 nbsp It is equivalent to show that Pr w lim n S n w n 0 0 displaystyle Pr left omega lim n to infty frac S n omega n neq 0 right 0 nbsp Note that lim n S n w n 0 ϵ gt 0 S n w n ϵ infinitely often displaystyle lim n to infty frac S n omega n neq 0 iff exists epsilon gt 0 left frac S n omega n right geq epsilon mbox infinitely often nbsp and thus to prove the strong law we need to show that for every ϵ gt 0 displaystyle epsilon gt 0 nbsp we have Pr w S n w n ϵ infinitely often 0 displaystyle Pr left omega S n omega geq n epsilon mbox infinitely often right 0 nbsp Define the events A n w S n n ϵ displaystyle A n omega S n geq n epsilon nbsp and if we can show that n 1 Pr A n lt displaystyle sum n 1 infty Pr A n lt infty nbsp then the Borel Cantelli Lemma implies the result So let us estimate Pr A n displaystyle Pr A n nbsp We computeE S n 4 E i 1 n X i 4 E 1 i j k l n X i X j X k X l displaystyle mathbb E S n 4 mathbb E left left sum i 1 n X i right 4 right mathbb E left sum 1 leq i j k l leq n X i X j X k X l right nbsp We first claim that every term of the form X i 3 X j X i 2 X j X k X i X j X k X l displaystyle X i 3 X j X i 2 X j X k X i X j X k X l nbsp where all subscripts are distinct must have zero expectation This is because E X i 3 X j E X i 3 E X j displaystyle mathbb E X i 3 X j mathbb E X i 3 mathbb E X j nbsp by independence and the last term is zero and similarly for the other terms Therefore the only terms in the sum with nonzero expectation are E X i 4 displaystyle mathbb E X i 4 nbsp and E X i 2 X j 2 displaystyle mathbb E X i 2 X j 2 nbsp Since the X i displaystyle X i nbsp are identically distributed all of these are the same and moreover E X i 2 X j 2 E X i 2 2 displaystyle mathbb E X i 2 X j 2 mathbb E X i 2 2 nbsp There are n displaystyle n nbsp terms of the form E X i 4 displaystyle mathbb E X i 4 nbsp and 3 n n 1 displaystyle 3n n 1 nbsp terms of the form E X i 2 2 displaystyle mathbb E X i 2 2 nbsp and soE S n 4 n t 3 n n 1 s 4 displaystyle mathbb E S n 4 n tau 3n n 1 sigma 4 nbsp Note that the right hand side is a quadratic polynomial in n displaystyle n nbsp and as such there exists a C gt 0 displaystyle C gt 0 nbsp such that E S n 4 C n 2 displaystyle mathbb E S n 4 leq Cn 2 nbsp for n displaystyle n nbsp sufficiently large By Chebyshev Pr S n n ϵ 1 n ϵ 4 E S n 4 C ϵ 4 n 2 displaystyle Pr S n geq n epsilon leq frac 1 n epsilon 4 mathbb E S n 4 leq frac C epsilon 4 n 2 nbsp for n displaystyle n nbsp sufficiently large and therefore this series is summable Since this holds for any ϵ gt 0 displaystyle epsilon gt 0 nbsp we have established the Strong LLN Another proof can be found in 29 For a proof without the added assumption of a finite fourth moment see Section 22 of 30 Consequences EditThe law of large numbers provides an expectation of an unknown distribution from a realization of the sequence but also any feature of the probability distribution 1 By applying Borel s law of large numbers one could easily obtain the probability mass function For each event in the objective probability mass function one could approximate the probability of the event s occurrence with the proportion of times that any specified event occurs The larger the number of repetitions the better the approximation As for the continuous case C a h a h displaystyle C a h a h nbsp for small positive h Thus for large n N n C n p P X C a h a h f x d x 2 h f a displaystyle frac N n C n thickapprox p P X in C int a h a h f x dx thickapprox 2hf a nbsp With this method one can cover the whole x axis with a grid with grid size 2h and obtain a bar graph which is called a histogram See also EditAsymptotic equipartition property Central limit theorem Infinite monkey theorem Law of averages Law of the iterated logarithm Law of truly large numbers Lindy effect Regression toward the mean Sortition Strong law of small numbersNotes Edit a b c d Dekking Michel 2005 A Modern Introduction to Probability and Statistics Springer pp 181 190 ISBN 9781852338961 Yao Kai Gao Jinwu 2016 Law of Large Numbers for Uncertain Random Variables IEEE Transactions on Fuzzy Systems 24 3 615 621 doi 10 1109 TFUZZ 2015 2466080 ISSN 1063 6706 S2CID 2238905 Kroese Dirk P Brereton Tim Taimre Thomas Botev Zdravko I 2014 Why the Monte Carlo method is so important today Wiley Interdisciplinary Reviews Computational Statistics 6 6 386 392 doi 10 1002 wics 1314 S2CID 18521840 Dekking Michel 2005 A Modern Introduction to Probability and Statistics Springer pp 92 ISBN 9781852338961 Dekking Michel 2005 A Modern Introduction to Probability and Statistics Springer pp 63 ISBN 9781852338961 Pitman E J G Williams E J 1967 Cauchy Distributed Functions of Cauchy Variates The Annals of Mathematical Statistics 38 3 916 918 ISSN 0003 4851 Mlodinow L 2008 The Drunkard s Walk New York Random House p 50 Bernoulli Jakob 1713 4 Ars Conjectandi Usum amp Applicationem Praecedentis Doctrinae in Civilibus Moralibus amp Oeconomicis in Latin Translated by Sheynin Oscar Poisson names the law of large numbers la loi des grands nombres in Poisson S D 1837 Probabilite des jugements en matiere criminelle et en matiere civile precedees des regles generales du calcul des probabilities in French Paris France Bachelier p 7 He attempts a two part proof of the law on pp 139 143 and pp 277 ff Hacking Ian 1983 19th century Cracks in the Concept of Determinism Journal of the History of Ideas 44 3 455 475 doi 10 2307 2709176 JSTOR 2709176 Tchebichef P 1846 Demonstration elementaire d une proposition generale de la theorie des probabilites Journal fur die reine und angewandte Mathematik in French 1846 33 259 267 doi 10 1515 crll 1846 33 259 S2CID 120850863 a b Seneta 2013 a b Yuri Prohorov Law of large numbers Encyclopedia of Mathematics EMS Press Bhattacharya Rabi Lin Lizhen Patrangenaru Victor 2016 A Course in Mathematical Statistics and Large Sample Theory Springer Texts in Statistics New York NY Springer New York doi 10 1007 978 1 4939 4032 5 ISBN 978 1 4939 4030 1 a b The strong law of large numbers What s new Terrytao wordpress com 19 June 2008 Retrieved 2012 06 09 Etemadi N Z 1981 An elementary proof of the strong law of large numbers Wahrscheinlichkeitstheorie Verw Gebiete 55 1 119 122 doi 10 1007 BF01013465 S2CID 122166046 Kingman J F C April 1978 Uses of Exchangeability The Annals of Probability 6 2 doi 10 1214 aop 1176995566 ISSN 0091 1798 Loeve 1977 Chapter 1 4 p 14 Loeve 1977 Chapter 17 3 p 251 a b Yuri Prokhorov Strong law of large numbers Encyclopedia of Mathematics Ross 2009 Lehmann Erich L Romano Joseph P 2006 03 30 Weak law converges to constant Springer ISBN 9780387276052 Dguvl Hun Hong Sung Ho Lee 1998 A Note on the Weak Law of Large Numbers for Exchangeable Random Variables PDF Communications of the Korean Mathematical Society 13 2 385 391 Archived from the original PDF on 2016 07 01 Retrieved 2014 06 28 Mukherjee Sayan Law of large numbers PDF Archived from the original PDF on 2013 03 09 Retrieved 2014 06 28 J Geyer Charles Law of large numbers PDF Newey amp McFadden 1994 Lemma 2 4 Jennrich Robert I 1969 Asymptotic Properties of Non Linear Least Squares Estimators The Annals of Mathematical Statistics 40 2 633 643 doi 10 1214 aoms 1177697731 Wen Liu 1991 An Analytic Technique to Prove Borel s Strong Law of Large Numbers The American Mathematical Monthly 98 2 146 148 doi 10 2307 2323947 JSTOR 2323947 Etemadi Nasrollah 1981 An elementary proof of the strong law of large numbers Zeitschrift f u r Wahrscheinlichkeitstheorie und verwandte Gebiete Springer 55 119 122 doi 10 1007 BF01013465 S2CID 122166046 Billingsley Patrick 1979 Probability and Measure References EditGrimmett G R Stirzaker D R 1992 Probability and Random Processes 2nd ed Oxford Clarendon Press ISBN 0 19 853665 8 Durrett Richard 1995 Probability Theory and Examples 2nd ed Duxbury Press Martin Jacobsen 1992 Videregaende Sandsynlighedsregning Advanced Probability Theory in Danish 3rd ed Copenhagen HCO tryk ISBN 87 91180 71 6 Loeve Michel 1977 Probability theory 1 4th ed Springer Newey Whitney K McFadden Daniel 1994 36 Large sample estimation and hypothesis testing Handbook of econometrics Vol IV Elsevier Science pp 2111 2245 Ross Sheldon 2009 A first course in probability 8th ed Prentice Hall ISBN 978 0 13 603313 4 Sen P K Singer J M 1993 Large sample methods in statistics Chapman amp Hall Seneta Eugene 2013 A Tricentenary history of the Law of Large Numbers Bernoulli 19 4 1088 1121 arXiv 1309 6488 doi 10 3150 12 BEJSP12 S2CID 88520834 External links Edit Law of large numbers Encyclopedia of Mathematics EMS Press 2001 1994 Weisstein Eric W Weak Law of Large Numbers MathWorld Weisstein Eric W Strong Law of Large Numbers MathWorld Animations for the Law of Large Numbers by Yihui Xie using the R package animation Apple CEO Tim Cook said something that would make statisticians cringe We don t believe in such laws as laws of large numbers This is sort of uh old dogma I think that was cooked up by somebody said Tim Cook and while However the law of large numbers has nothing to do with large companies large revenues or large growth rates The law of large numbers is a fundamental concept in probability theory and statistics tying together theoretical probabilities that we can calculate to the actual outcomes of experiments that we empirically perform explained Business Insider Retrieved from https en wikipedia org w index php title Law of large numbers amp oldid 1180107455 Strong law, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.