fbpx
Wikipedia

Degrees of freedom (statistics)

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.[1]

Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself. For example, if the variance is to be estimated from a random sample of N independent scores, then the degrees of freedom is equal to the number of independent scores (N) minus the number of parameters estimated as intermediate steps (one, namely, the sample mean) and is therefore equal to N − 1.[2]

Mathematically, degrees of freedom is the number of dimensions of the domain of a random vector, or essentially the number of "free" components (how many components need to be known before the vector is fully determined).

The term is most often used in the context of linear models (linear regression, analysis of variance), where certain random vectors are constrained to lie in linear subspaces, and the number of degrees of freedom is the dimension of the subspace. The degrees of freedom are also commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such vectors, and the parameters of chi-squared and other distributions that arise in associated statistical testing problems.

While introductory textbooks may introduce degrees of freedom as distribution parameters or through hypothesis testing, it is the underlying geometry that defines degrees of freedom, and is critical to a proper understanding of the concept.

History

Although the basic concept of degrees of freedom was recognized as early as 1821 in the work of German astronomer and mathematician Carl Friedrich Gauss,[3] its modern definition and usage was first elaborated by English statistician William Sealy Gosset in his 1908 Biometrika article "The Probable Error of a Mean", published under the pen name "Student".[4] While Gosset did not actually use the term 'degrees of freedom', he explained the concept in the course of developing what became known as Student's t-distribution. The term itself was popularized by English statistician and biologist Ronald Fisher, beginning with his 1922 work on chi squares.[5]

Notation

In equations, the typical symbol for degrees of freedom is ν (lowercase Greek letter nu). In text and tables, the abbreviation "d.f." is commonly used. R. A. Fisher used n to symbolize degrees of freedom but modern usage typically reserves n for sample size.

Of random vectors

Geometrically, the degrees of freedom can be interpreted as the dimension of certain vector subspaces. As a starting point, suppose that we have a sample of independent normally distributed observations,

 

This can be represented as an n-dimensional random vector:

 

Since this random vector can lie anywhere in n-dimensional space, it has n degrees of freedom.

Now, let   be the sample mean. The random vector can be decomposed as the sum of the sample mean plus a vector of residuals:

 

The first vector on the right-hand side is constrained to be a multiple of the vector of 1's, and the only free quantity is  . It therefore has 1 degree of freedom.

The second vector is constrained by the relation  . The first n − 1 components of this vector can be anything. However, once you know the first n − 1 components, the constraint tells you the value of the nth component. Therefore, this vector has n − 1 degrees of freedom.

Mathematically, the first vector is the Oblique projection of the data vector onto the subspace spanned by the vector of 1's. The 1 degree of freedom is the dimension of this subspace. The second residual vector is the least-squares projection onto the (n − 1)-dimensional orthogonal complement of this subspace, and has n − 1 degrees of freedom.

In statistical testing applications, often one is not directly interested in the component vectors, but rather in their squared lengths. In the example above, the residual sum-of-squares is

 

If the data points   are normally distributed with mean 0 and variance  , then the residual sum of squares has a scaled chi-squared distribution (scaled by the factor  ), with n − 1 degrees of freedom. The degrees-of-freedom, here a parameter of the distribution, can still be interpreted as the dimension of an underlying vector subspace.

Likewise, the one-sample t-test statistic,

 

follows a Student's t distribution with n − 1 degrees of freedom when the hypothesized mean   is correct. Again, the degrees-of-freedom arises from the residual vector in the denominator.

In structural equation models

When the results of structural equation models (SEM) are presented, they generally include one or more indices of overall model fit, the most common of which is a χ2 statistic. This forms the basis for other indices that are commonly reported. Although it is these other statistics that are most commonly interpreted, the degrees of freedom of the χ2 are essential to understanding model fit as well as the nature of the model itself.

Degrees of freedom in SEM are computed as a difference between the number of unique pieces of information that are used as input into the analysis, sometimes called knowns, and the number of parameters that are uniquely estimated, sometimes called unknowns. For example, in a one-factor confirmatory factor analysis with 4 items, there are 10 knowns (the six unique covariances among the four items and the four item variances) and 8 unknowns (4 factor loadings and 4 error variances) for 2 degrees of freedom. Degrees of freedom are important to the understanding of model fit if for no other reason than that, all else being equal, the fewer degrees of freedom, the better indices such as χ2 will be.

It has been shown that degrees of freedom can be used by readers of papers that contain SEMs to determine if the authors of those papers are in fact reporting the correct model fit statistics. In the organizational sciences, for example, nearly half of papers published in top journals report degrees of freedom that are inconsistent with the models described in those papers, leaving the reader to wonder which models were actually tested.[6]

Of residuals

A common way to think of degrees of freedom is as the number of independent pieces of information available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the sample mean.

In fitting statistical models to data, the vectors of residuals are constrained to lie in a space of smaller dimension than the number of components in the vector. That smaller dimension is the number of degrees of freedom for error, also called residual degrees of freedom.

Example

Perhaps the simplest example is this. Suppose

 

are random variables each with expected value μ, and let

 

be the "sample mean." Then the quantities

 

are residuals that may be considered estimates of the errors Xi − μ. The sum of the residuals (unlike the sum of the errors) is necessarily 0. If one knows the values of any n − 1 of the residuals, one can thus find the last one. That means they are constrained to lie in a space of dimension n − 1. One says that there are n − 1 degrees of freedom for errors.

An example which is only slightly less simple is that of least squares estimation of a and b in the model

 

where xi is given, but ei and hence Yi are random. Let   and   be the least-squares estimates of a and b. Then the residuals

 

are constrained to lie within the space defined by the two equations

 
 

One says that there are n − 2 degrees of freedom for error.

Notationally, the capital letter Y is used in specifying the model, while lower-case y in the definition of the residuals; that is because the former are hypothesized random variables and the latter are actual data.

We can generalise this to multiple regression involving p parameters and covariates (e.g. p − 1 predictors and one mean (=intercept in the regression)), in which case the cost in degrees of freedom of the fit is p, leaving n - p degrees of freedom for errors

In linear models

The demonstration of the t and chi-squared distributions for one-sample problems above is the simplest example where degrees-of-freedom arise. However, similar geometry and vector decompositions underlie much of the theory of linear models, including linear regression and analysis of variance. An explicit example based on comparison of three means is presented here; the geometry of linear models is discussed in more complete detail by Christensen (2002).[7]

Suppose independent observations are made for three populations,  ,   and  . The restriction to three groups and equal sample sizes simplifies notation, but the ideas are easily generalized.

The observations can be decomposed as

 

where   are the means of the individual samples, and   is the mean of all 3n observations. In vector notation this decomposition can be written as

 

The observation vector, on the left-hand side, has 3n degrees of freedom. On the right-hand side, the first vector has one degree of freedom (or dimension) for the overall mean. The second vector depends on three random variables,  ,   and  . However, these must sum to 0 and so are constrained; the vector therefore must lie in a 2-dimensional subspace, and has 2 degrees of freedom. The remaining 3n − 3 degrees of freedom are in the residual vector (made up of n − 1 degrees of freedom within each of the populations).

In analysis of variance (ANOVA)

In statistical testing problems, one usually is not interested in the component vectors themselves, but rather in their squared lengths, or Sum of Squares. The degrees of freedom associated with a sum-of-squares is the degrees-of-freedom of the corresponding component vectors.

The three-population example above is an example of one-way Analysis of Variance. The model, or treatment, sum-of-squares is the squared length of the second vector,

 

with 2 degrees of freedom. The residual, or error, sum-of-squares is

 

with 3(n−1) degrees of freedom. Of course, introductory books on ANOVA usually state formulae without showing the vectors, but it is this underlying geometry that gives rise to SS formulae, and shows how to unambiguously determine the degrees of freedom in any given situation.

Under the null hypothesis of no difference between population means (and assuming that standard ANOVA regularity assumptions are satisfied) the sums of squares have scaled chi-squared distributions, with the corresponding degrees of freedom. The F-test statistic is the ratio, after scaling by the degrees of freedom. If there is no difference between population means this ratio follows an F-distribution with 2 and 3n − 3 degrees of freedom.

In some complicated settings, such as unbalanced split-plot designs, the sums-of-squares no longer have scaled chi-squared distributions. Comparison of sum-of-squares with degrees-of-freedom is no longer meaningful, and software may report certain fractional 'degrees of freedom' in these cases. Such numbers have no genuine degrees-of-freedom interpretation, but are simply providing an approximate chi-squared distribution for the corresponding sum-of-squares. The details of such approximations are beyond the scope of this page.

In probability distributions

Several commonly encountered statistical distributions (Student's t, chi-squared, F) have parameters that are commonly referred to as degrees of freedom. This terminology simply reflects that in many applications where these distributions occur, the parameter corresponds to the degrees of freedom of an underlying random vector, as in the preceding ANOVA example. Another simple example is: if   are independent normal   random variables, the statistic

 

follows a chi-squared distribution with n − 1 degrees of freedom. Here, the degrees of freedom arises from the residual sum-of-squares in the numerator, and in turn the n − 1 degrees of freedom of the underlying residual vector  .

In the application of these distributions to linear models, the degrees of freedom parameters can take only integer values. The underlying families of distributions allow fractional values for the degrees-of-freedom parameters, which can arise in more sophisticated uses. One set of examples is problems where chi-squared approximations based on effective degrees of freedom are used. In other applications, such as modelling heavy-tailed data, a t or F-distribution may be used as an empirical model. In these cases, there is no particular degrees of freedom interpretation to the distribution parameters, even though the terminology may continue to be used.

In non-standard regression

Many non-standard regression methods, including regularized least squares (e.g., ridge regression), linear smoothers, smoothing splines, and semiparametric regression are not based on ordinary least squares projections, but rather on regularized (generalized and/or penalized) least-squares, and so degrees of freedom defined in terms of dimensionality is generally not useful for these procedures. However, these procedures are still linear in the observations, and the fitted values of the regression can be expressed in the form

 

where   is the vector of fitted values at each of the original covariate values from the fitted model, y is the original vector of responses, and H is the hat matrix or, more generally, smoother matrix.

For statistical inference, sums-of-squares can still be formed: the model sum-of-squares is  ; the residual sum-of-squares is  . However, because H does not correspond to an ordinary least-squares fit (i.e. is not an orthogonal projection), these sums-of-squares no longer have (scaled, non-central) chi-squared distributions, and dimensionally defined degrees-of-freedom are not useful.

The effective degrees of freedom of the fit can be defined in various ways to implement goodness-of-fit tests, cross-validation, and other statistical inference procedures. Here one can distinguish between regression effective degrees of freedom and residual effective degrees of freedom.

Regression effective degrees of freedom

For the regression effective degrees of freedom, appropriate definitions can include the trace of the hat matrix,[8] tr(H), the trace of the quadratic form of the hat matrix, tr(H'H), the form tr(2HH H'), or the Satterthwaite approximation, tr(H'H)2/tr(H'HH'H).[9] In the case of linear regression, the hat matrix H is X(X 'X)−1X ', and all these definitions reduce to the usual degrees of freedom. Notice that

 

the regression (not residual) degrees of freedom in linear models are "the sum of the sensitivities of the fitted values with respect to the observed response values",[10] i.e. the sum of leverage scores.

One way to help to conceptualize this is to consider a simple smoothing matrix like a Gaussian blur, used to mitigate data noise. In contrast to a simple linear or polynomial fit, computing the effective degrees of freedom of the smoothing function is not straight-forward. In these cases, it is important to estimate the Degrees of Freedom permitted by the   matrix so that the residual degrees of freedom can then be used to estimate statistical tests such as  .

Residual effective degrees of freedom

There are corresponding definitions of residual effective degrees-of-freedom (redf), with H replaced by I − H. For example, if the goal is to estimate error variance, the redf would be defined as tr((I − H)'(I − H)), and the unbiased estimate is (with  ),

 

or:[11][12][13][14]

 
 

The last approximation above[12] reduces the computational cost from O(n2) to only O(n). In general the numerator would be the objective function being minimized; e.g., if the hat matrix includes an observation covariance matrix, Σ, then   becomes  .

General

Note that unlike in the original case, non-integer degrees of freedom are allowed, though the value must usually still be constrained between 0 and n.[15]

Consider, as an example, the k-nearest neighbour smoother, which is the average of the k nearest measured values to the given point. Then, at each of the n measured points, the weight of the original value on the linear combination that makes up the predicted value is just 1/k. Thus, the trace of the hat matrix is n/k. Thus the smooth costs n/k effective degrees of freedom.

As another example, consider the existence of nearly duplicated observations. Naive application of classical formula, np, would lead to over-estimation of the residuals degree of freedom, as if each observation were independent. More realistically, though, the hat matrix H = X(X ' Σ−1 X)−1X ' Σ−1 would involve an observation covariance matrix Σ indicating the non-zero correlation among observations.

The more general formulation of effective degree of freedom would result in a more realistic estimate for, e.g., the error variance σ2, which in its turn scales the unknown parameters' a posteriori standard deviation; the degree of freedom will also affect the expansion factor necessary to produce an error ellipse for a given confidence level.

Other formulations

Similar concepts are the equivalent degrees of freedom in non-parametric regression,[16] the degree of freedom of signal in atmospheric studies,[17][18] and the non-integer degree of freedom in geodesy.[19][20]

The residual sum-of-squares   has a generalized chi-squared distribution, and the theory associated with this distribution[21] provides an alternative route to the answers provided above.[further explanation needed]

See also

References

  1. ^ "Degrees of Freedom". Glossary of Statistical Terms. Animated Software. Retrieved 2008-08-21.
  2. ^ Lane, David M. "Degrees of Freedom". HyperStat Online. Statistics Solutions. Retrieved 2008-08-21.
  3. ^ Walker, H. M. (April 1940). "Degrees of Freedom" (PDF). Journal of Educational Psychology. 31 (4): 253–269. doi:10.1037/h0054588.
  4. ^ Student (March 1908). "The Probable Error of a Mean". Biometrika. 6 (1): 1–25. doi:10.2307/2331554. JSTOR 2331554.
  5. ^ Fisher, R. A. (January 1922). "On the Interpretation of χ2 from Contingency Tables, and the Calculation of P". Journal of the Royal Statistical Society. 85 (1): 87–94. doi:10.2307/2340521. JSTOR 2340521.
  6. ^ Cortina, J. M., Green, J. P., Keeler, K. R., & Vandenberg, R. J. (2017). Degrees of freedom in SEM: Are we testing the models that we claim to test?. Organizational Research Methods, 20(3), 350-378.
  7. ^ Christensen, Ronald (2002). Plane Answers to Complex Questions: The Theory of Linear Models (Third ed.). New York: Springer. ISBN 0-387-95361-2.
  8. ^ Trevor Hastie, Robert Tibshirani, Jerome H. Friedman (2009), The elements of statistical learning: data mining, inference, and prediction, 2nd ed., 746 p. ISBN 978-0-387-84857-0, doi:10.1007/978-0-387-84858-7, [1] (eq.(5.16))
  9. ^ Fox, J.; Sage Publications, inc; SAGE. (2000). Nonparametric Simple Regression: Smoothing Scatterplots. Nonparametric Simple Regression: Smoothing Scatterplots. SAGE Publications. p. 58. ISBN 978-0-7619-1585-0. Retrieved 2020-08-28. {{cite book}}: |first2= has generic name (help)
  10. ^ Ye, J. (1998), "On Measuring and Correcting the Effects of Data Mining and Model Selection", Journal of the American Statistical Association, 93 (441), 120–131. JSTOR 2669609 (eq.(7))
  11. ^ Clive Loader (1999), Local regression and likelihood, ISBN 978-0-387-98775-0, doi:10.1007/b98858, (eq.(2.18), p. 30)
  12. ^ a b Trevor Hastie, Robert Tibshirani (1990), Generalized additive models, CRC Press, (p. 54) and (eq.(B.1), p. 305))
  13. ^ Simon N. Wood (2006), Generalized additive models: an introduction with R, CRC Press, (eq.(4,14), p. 172)
  14. ^ David Ruppert, M. P. Wand, R. J. Carroll (2003), Semiparametric Regression, Cambridge University Press (eq.(3.28), p. 82)
  15. ^ James S. Hodges (2014) Richly Parameterized Linear Models, CRC Press. [2]
  16. ^ Peter J. Green, B. W. Silverman (1994), Nonparametric regression and generalized linear models: a roughness penalty approach, CRC Press (eq.(3.15), p. 37)
  17. ^ Clive D. Rodgers (2000), Inverse methods for atmospheric sounding: theory and practice, World Scientific (eq.(2.56), p. 31)
  18. ^ Adrian Doicu, Thomas Trautmann, Franz Schreier (2010), Numerical Regularization for Atmospheric Inverse Problems, Springer (eq.(4.26), p. 114)
  19. ^ D. Dong, T. A. Herring and R. W. King (1997), Estimating regional deformation from a combination of space and terrestrial geodetic data, J. Geodesy, 72 (4), 200–214, doi:10.1007/s001900050161 (eq.(27), p. 205)
  20. ^ H. Theil (1963), "On the Use of Incomplete Prior Information in Regression Analysis", Journal of the American Statistical Association, 58 (302), 401–414 JSTOR 2283275 (eq.(5.19)–(5.20))
  21. ^ Jones, D.A. (1983) "Statistical analysis of empirical models fitted by optimisation", Biometrika, 70 (1), 67–88

Further reading

  • Bowers, David (1982). Statistics for Economists. London: Macmillan. pp. 175–178. ISBN 0-333-30110-2.
  • Eisenhauer, J. G. (2008). "Degrees of Freedom". Teaching Statistics. 30 (3): 75–78. doi:10.1111/j.1467-9639.2008.00324.x.
  • Good, I. J. (1973). "What Are Degrees of Freedom?". The American Statistician. 27 (5): 227–228. doi:10.1080/00031305.1973.10479042. JSTOR 3087407.
  • Walker, H. W. (1940). "Degrees of Freedom". Journal of Educational Psychology. 31 (4): 253–269. doi:10.1037/h0054588.

External links

  • Yu, Chong-ho (1997) Illustrating degrees of freedom in terms of sample size and dimensionality
  • Dallal, GE. (2003)

degrees, freedom, statistics, other, uses, degrees, freedom, statistics, number, degrees, freedom, number, values, final, calculation, statistic, that, free, vary, estimates, statistical, parameters, based, upon, different, amounts, information, data, number, . For other uses see Degrees of freedom In statistics the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary 1 Estimates of statistical parameters can be based upon different amounts of information or data The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom In general the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself For example if the variance is to be estimated from a random sample of N independent scores then the degrees of freedom is equal to the number of independent scores N minus the number of parameters estimated as intermediate steps one namely the sample mean and is therefore equal to N 1 2 Mathematically degrees of freedom is the number of dimensions of the domain of a random vector or essentially the number of free components how many components need to be known before the vector is fully determined The term is most often used in the context of linear models linear regression analysis of variance where certain random vectors are constrained to lie in linear subspaces and the number of degrees of freedom is the dimension of the subspace The degrees of freedom are also commonly associated with the squared lengths or sum of squares of the coordinates of such vectors and the parameters of chi squared and other distributions that arise in associated statistical testing problems While introductory textbooks may introduce degrees of freedom as distribution parameters or through hypothesis testing it is the underlying geometry that defines degrees of freedom and is critical to a proper understanding of the concept Contents 1 History 2 Notation 3 Of random vectors 4 In structural equation models 4 1 Of residuals 4 1 1 Example 5 In linear models 6 In analysis of variance ANOVA 7 In probability distributions 8 In non standard regression 8 1 Regression effective degrees of freedom 8 2 Residual effective degrees of freedom 8 3 General 8 4 Other formulations 9 See also 10 References 11 Further reading 12 External linksHistory EditAlthough the basic concept of degrees of freedom was recognized as early as 1821 in the work of German astronomer and mathematician Carl Friedrich Gauss 3 its modern definition and usage was first elaborated by English statistician William Sealy Gosset in his 1908 Biometrika article The Probable Error of a Mean published under the pen name Student 4 While Gosset did not actually use the term degrees of freedom he explained the concept in the course of developing what became known as Student s t distribution The term itself was popularized by English statistician and biologist Ronald Fisher beginning with his 1922 work on chi squares 5 Notation EditIn equations the typical symbol for degrees of freedom is n lowercase Greek letter nu In text and tables the abbreviation d f is commonly used R A Fisher used n to symbolize degrees of freedom but modern usage typically reserves n for sample size Of random vectors EditGeometrically the degrees of freedom can be interpreted as the dimension of certain vector subspaces As a starting point suppose that we have a sample of independent normally distributed observations X 1 X n displaystyle X 1 dots X n This can be represented as an n dimensional random vector X 1 X n displaystyle begin pmatrix X 1 vdots X n end pmatrix Since this random vector can lie anywhere in n dimensional space it has n degrees of freedom Now let X displaystyle bar X be the sample mean The random vector can be decomposed as the sum of the sample mean plus a vector of residuals X 1 X n X 1 1 X 1 X X n X displaystyle begin pmatrix X 1 vdots X n end pmatrix bar X begin pmatrix 1 vdots 1 end pmatrix begin pmatrix X 1 bar X vdots X n bar X end pmatrix The first vector on the right hand side is constrained to be a multiple of the vector of 1 s and the only free quantity is X displaystyle bar X It therefore has 1 degree of freedom The second vector is constrained by the relation i 1 n X i X 0 textstyle sum i 1 n X i bar X 0 The first n 1 components of this vector can be anything However once you know the first n 1 components the constraint tells you the value of the nth component Therefore this vector has n 1 degrees of freedom Mathematically the first vector is the Oblique projection of the data vector onto the subspace spanned by the vector of 1 s The 1 degree of freedom is the dimension of this subspace The second residual vector is the least squares projection onto the n 1 dimensional orthogonal complement of this subspace and has n 1 degrees of freedom In statistical testing applications often one is not directly interested in the component vectors but rather in their squared lengths In the example above the residual sum of squares is i 1 n X i X 2 X 1 X X n X 2 displaystyle sum i 1 n X i bar X 2 begin Vmatrix X 1 bar X vdots X n bar X end Vmatrix 2 If the data points X i displaystyle X i are normally distributed with mean 0 and variance s 2 displaystyle sigma 2 then the residual sum of squares has a scaled chi squared distribution scaled by the factor s 2 displaystyle sigma 2 with n 1 degrees of freedom The degrees of freedom here a parameter of the distribution can still be interpreted as the dimension of an underlying vector subspace Likewise the one sample t test statistic n X m 0 i 1 n X i X 2 n 1 displaystyle frac sqrt n bar X mu 0 sqrt sum limits i 1 n X i bar X 2 n 1 follows a Student s t distribution with n 1 degrees of freedom when the hypothesized mean m 0 displaystyle mu 0 is correct Again the degrees of freedom arises from the residual vector in the denominator In structural equation models EditWhen the results of structural equation models SEM are presented they generally include one or more indices of overall model fit the most common of which is a x2 statistic This forms the basis for other indices that are commonly reported Although it is these other statistics that are most commonly interpreted the degrees of freedom of the x2 are essential to understanding model fit as well as the nature of the model itself Degrees of freedom in SEM are computed as a difference between the number of unique pieces of information that are used as input into the analysis sometimes called knowns and the number of parameters that are uniquely estimated sometimes called unknowns For example in a one factor confirmatory factor analysis with 4 items there are 10 knowns the six unique covariances among the four items and the four item variances and 8 unknowns 4 factor loadings and 4 error variances for 2 degrees of freedom Degrees of freedom are important to the understanding of model fit if for no other reason than that all else being equal the fewer degrees of freedom the better indices such as x2 will be It has been shown that degrees of freedom can be used by readers of papers that contain SEMs to determine if the authors of those papers are in fact reporting the correct model fit statistics In the organizational sciences for example nearly half of papers published in top journals report degrees of freedom that are inconsistent with the models described in those papers leaving the reader to wonder which models were actually tested 6 Of residuals Edit Further information Residuals statistics A common way to think of degrees of freedom is as the number of independent pieces of information available to estimate another piece of information More concretely the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn For example if we have two observations when calculating the mean we have two independent observations however when calculating the variance we have only one independent observation since the two observations are equally distant from the sample mean In fitting statistical models to data the vectors of residuals are constrained to lie in a space of smaller dimension than the number of components in the vector That smaller dimension is the number of degrees of freedom for error also called residual degrees of freedom Example Edit Perhaps the simplest example is this Suppose X 1 X n displaystyle X 1 dots X n are random variables each with expected value m and let X n X 1 X n n displaystyle overline X n frac X 1 cdots X n n be the sample mean Then the quantities X i X n displaystyle X i overline X n are residuals that may be considered estimates of the errors Xi m The sum of the residuals unlike the sum of the errors is necessarily 0 If one knows the values of any n 1 of the residuals one can thus find the last one That means they are constrained to lie in a space of dimension n 1 One says that there are n 1 degrees of freedom for errors An example which is only slightly less simple is that of least squares estimation of a and b in the model Y i a b x i e i for i 1 n displaystyle Y i a bx i e i text for i 1 dots n where xi is given but ei and hence Yi are random Let a displaystyle widehat a and b displaystyle widehat b be the least squares estimates of a and b Then the residuals e i y i a b x i displaystyle widehat e i y i widehat a widehat b x i are constrained to lie within the space defined by the two equations e 1 e n 0 displaystyle widehat e 1 cdots widehat e n 0 x 1 e 1 x n e n 0 displaystyle x 1 widehat e 1 cdots x n widehat e n 0 One says that there are n 2 degrees of freedom for error Notationally the capital letter Y is used in specifying the model while lower case y in the definition of the residuals that is because the former are hypothesized random variables and the latter are actual data We can generalise this to multiple regression involving p parameters and covariates e g p 1 predictors and one mean intercept in the regression in which case the cost in degrees of freedom of the fit is p leaving n p degrees of freedom for errorsIn linear models EditThe demonstration of the t and chi squared distributions for one sample problems above is the simplest example where degrees of freedom arise However similar geometry and vector decompositions underlie much of the theory of linear models including linear regression and analysis of variance An explicit example based on comparison of three means is presented here the geometry of linear models is discussed in more complete detail by Christensen 2002 7 Suppose independent observations are made for three populations X 1 X n displaystyle X 1 ldots X n Y 1 Y n displaystyle Y 1 ldots Y n and Z 1 Z n displaystyle Z 1 ldots Z n The restriction to three groups and equal sample sizes simplifies notation but the ideas are easily generalized The observations can be decomposed as X i M X M X i X Y i M Y M Y i Y Z i M Z M Z i Z displaystyle begin aligned X i amp bar M bar X bar M X i bar X Y i amp bar M bar Y bar M Y i bar Y Z i amp bar M bar Z bar M Z i bar Z end aligned where X Y Z displaystyle bar X bar Y bar Z are the means of the individual samples and M X Y Z 3 displaystyle bar M bar X bar Y bar Z 3 is the mean of all 3n observations In vector notation this decomposition can be written as X 1 X n Y 1 Y n Z 1 Z n M 1 1 1 1 1 1 X M X M Y M Y M Z M Z M X 1 X X n X Y 1 Y Y n Y Z 1 Z Z n Z displaystyle begin pmatrix X 1 vdots X n Y 1 vdots Y n Z 1 vdots Z n end pmatrix bar M begin pmatrix 1 vdots 1 1 vdots 1 1 vdots 1 end pmatrix begin pmatrix bar X bar M vdots bar X bar M bar Y bar M vdots bar Y bar M bar Z bar M vdots bar Z bar M end pmatrix begin pmatrix X 1 bar X vdots X n bar X Y 1 bar Y vdots Y n bar Y Z 1 bar Z vdots Z n bar Z end pmatrix The observation vector on the left hand side has 3n degrees of freedom On the right hand side the first vector has one degree of freedom or dimension for the overall mean The second vector depends on three random variables X M displaystyle bar X bar M Y M displaystyle bar Y bar M and Z M displaystyle overline Z overline M However these must sum to 0 and so are constrained the vector therefore must lie in a 2 dimensional subspace and has 2 degrees of freedom The remaining 3n 3 degrees of freedom are in the residual vector made up of n 1 degrees of freedom within each of the populations In analysis of variance ANOVA EditIn statistical testing problems one usually is not interested in the component vectors themselves but rather in their squared lengths or Sum of Squares The degrees of freedom associated with a sum of squares is the degrees of freedom of the corresponding component vectors The three population example above is an example of one way Analysis of Variance The model or treatment sum of squares is the squared length of the second vector SST n X M 2 n Y M 2 n Z M 2 displaystyle text SST n bar X bar M 2 n bar Y bar M 2 n bar Z bar M 2 with 2 degrees of freedom The residual or error sum of squares is SSE i 1 n X i X 2 i 1 n Y i Y 2 i 1 n Z i Z 2 displaystyle text SSE sum i 1 n X i bar X 2 sum i 1 n Y i bar Y 2 sum i 1 n Z i bar Z 2 with 3 n 1 degrees of freedom Of course introductory books on ANOVA usually state formulae without showing the vectors but it is this underlying geometry that gives rise to SS formulae and shows how to unambiguously determine the degrees of freedom in any given situation Under the null hypothesis of no difference between population means and assuming that standard ANOVA regularity assumptions are satisfied the sums of squares have scaled chi squared distributions with the corresponding degrees of freedom The F test statistic is the ratio after scaling by the degrees of freedom If there is no difference between population means this ratio follows an F distribution with 2 and 3n 3 degrees of freedom In some complicated settings such as unbalanced split plot designs the sums of squares no longer have scaled chi squared distributions Comparison of sum of squares with degrees of freedom is no longer meaningful and software may report certain fractional degrees of freedom in these cases Such numbers have no genuine degrees of freedom interpretation but are simply providing an approximate chi squared distribution for the corresponding sum of squares The details of such approximations are beyond the scope of this page In probability distributions EditSeveral commonly encountered statistical distributions Student s t chi squared F have parameters that are commonly referred to as degrees of freedom This terminology simply reflects that in many applications where these distributions occur the parameter corresponds to the degrees of freedom of an underlying random vector as in the preceding ANOVA example Another simple example is if X i i 1 n displaystyle X i i 1 ldots n are independent normal m s 2 displaystyle mu sigma 2 random variables the statistic i 1 n X i X 2 s 2 displaystyle frac sum i 1 n X i bar X 2 sigma 2 follows a chi squared distribution with n 1 degrees of freedom Here the degrees of freedom arises from the residual sum of squares in the numerator and in turn the n 1 degrees of freedom of the underlying residual vector X i X displaystyle X i bar X In the application of these distributions to linear models the degrees of freedom parameters can take only integer values The underlying families of distributions allow fractional values for the degrees of freedom parameters which can arise in more sophisticated uses One set of examples is problems where chi squared approximations based on effective degrees of freedom are used In other applications such as modelling heavy tailed data a t or F distribution may be used as an empirical model In these cases there is no particular degrees of freedom interpretation to the distribution parameters even though the terminology may continue to be used In non standard regression EditMany non standard regression methods including regularized least squares e g ridge regression linear smoothers smoothing splines and semiparametric regression are not based on ordinary least squares projections but rather on regularized generalized and or penalized least squares and so degrees of freedom defined in terms of dimensionality is generally not useful for these procedures However these procedures are still linear in the observations and the fitted values of the regression can be expressed in the form y H y displaystyle hat y Hy where y displaystyle hat y is the vector of fitted values at each of the original covariate values from the fitted model y is the original vector of responses and H is the hat matrix or more generally smoother matrix For statistical inference sums of squares can still be formed the model sum of squares is H y 2 displaystyle Hy 2 the residual sum of squares is y H y 2 displaystyle y Hy 2 However because H does not correspond to an ordinary least squares fit i e is not an orthogonal projection these sums of squares no longer have scaled non central chi squared distributions and dimensionally defined degrees of freedom are not useful The effective degrees of freedom of the fit can be defined in various ways to implement goodness of fit tests cross validation and other statistical inference procedures Here one can distinguish between regression effective degrees of freedom and residual effective degrees of freedom Regression effective degrees of freedom Edit For the regression effective degrees of freedom appropriate definitions can include the trace of the hat matrix 8 tr H the trace of the quadratic form of the hat matrix tr H H the form tr 2H H H or the Satterthwaite approximation tr H H 2 tr H HH H 9 In the case of linear regression the hat matrix H is X X X 1X and all these definitions reduce to the usual degrees of freedom Notice that tr H i h i i i y i y i displaystyle operatorname tr H sum i h ii sum i frac partial hat y i partial y i the regression not residual degrees of freedom in linear models are the sum of the sensitivities of the fitted values with respect to the observed response values 10 i e the sum of leverage scores One way to help to conceptualize this is to consider a simple smoothing matrix like a Gaussian blur used to mitigate data noise In contrast to a simple linear or polynomial fit computing the effective degrees of freedom of the smoothing function is not straight forward In these cases it is important to estimate the Degrees of Freedom permitted by the H displaystyle H matrix so that the residual degrees of freedom can then be used to estimate statistical tests such as x 2 displaystyle chi 2 Residual effective degrees of freedom Edit There are corresponding definitions of residual effective degrees of freedom redf with H replaced by I H For example if the goal is to estimate error variance the redf would be defined as tr I H I H and the unbiased estimate is with r y H y displaystyle hat r y Hy s 2 r 2 tr I H I H displaystyle hat sigma 2 frac hat r 2 operatorname tr left I H I H right or 11 12 13 14 s 2 r 2 n tr 2 H H H r 2 n 2 tr H tr H H displaystyle hat sigma 2 frac hat r 2 n operatorname tr 2H HH frac hat r 2 n 2 operatorname tr H operatorname tr HH s 2 r 2 n 1 25 tr H 0 5 displaystyle hat sigma 2 approx frac hat r 2 n 1 25 operatorname tr H 0 5 The last approximation above 12 reduces the computational cost from O n2 to only O n In general the numerator would be the objective function being minimized e g if the hat matrix includes an observation covariance matrix S then r 2 displaystyle hat r 2 becomes r S 1 r displaystyle hat r Sigma 1 hat r General Edit Note that unlike in the original case non integer degrees of freedom are allowed though the value must usually still be constrained between 0 and n 15 Consider as an example the k nearest neighbour smoother which is the average of the k nearest measured values to the given point Then at each of the n measured points the weight of the original value on the linear combination that makes up the predicted value is just 1 k Thus the trace of the hat matrix is n k Thus the smooth costs n k effective degrees of freedom As another example consider the existence of nearly duplicated observations Naive application of classical formula n p would lead to over estimation of the residuals degree of freedom as if each observation were independent More realistically though the hat matrix H X X S 1 X 1X S 1 would involve an observation covariance matrix S indicating the non zero correlation among observations The more general formulation of effective degree of freedom would result in a more realistic estimate for e g the error variance s2 which in its turn scales the unknown parameters a posteriori standard deviation the degree of freedom will also affect the expansion factor necessary to produce an error ellipse for a given confidence level Other formulations Edit Similar concepts are the equivalent degrees of freedom in non parametric regression 16 the degree of freedom of signal in atmospheric studies 17 18 and the non integer degree of freedom in geodesy 19 20 The residual sum of squares y H y 2 displaystyle y Hy 2 has a generalized chi squared distribution and the theory associated with this distribution 21 provides an alternative route to the answers provided above further explanation needed See also Edit Mathematics portalBessel s correction Chi squared per degree of freedom Pooled degrees of freedom Replication statistics Sample size Statistical model VarianceReferences Edit Degrees of Freedom Glossary of Statistical Terms Animated Software Retrieved 2008 08 21 Lane David M Degrees of Freedom HyperStat Online Statistics Solutions Retrieved 2008 08 21 Walker H M April 1940 Degrees of Freedom PDF Journal of Educational Psychology 31 4 253 269 doi 10 1037 h0054588 Student March 1908 The Probable Error of a Mean Biometrika 6 1 1 25 doi 10 2307 2331554 JSTOR 2331554 Fisher R A January 1922 On the Interpretation of x2 from Contingency Tables and the Calculation of P Journal of the Royal Statistical Society 85 1 87 94 doi 10 2307 2340521 JSTOR 2340521 Cortina J M Green J P Keeler K R amp Vandenberg R J 2017 Degrees of freedom in SEM Are we testing the models that we claim to test Organizational Research Methods 20 3 350 378 Christensen Ronald 2002 Plane Answers to Complex Questions The Theory of Linear Models Third ed New York Springer ISBN 0 387 95361 2 Trevor Hastie Robert Tibshirani Jerome H Friedman 2009 The elements of statistical learning data mining inference and prediction 2nd ed 746 p ISBN 978 0 387 84857 0 doi 10 1007 978 0 387 84858 7 1 eq 5 16 Fox J Sage Publications inc SAGE 2000 Nonparametric Simple Regression Smoothing Scatterplots Nonparametric Simple Regression Smoothing Scatterplots SAGE Publications p 58 ISBN 978 0 7619 1585 0 Retrieved 2020 08 28 a href Template Cite book html title Template Cite book cite book a first2 has generic name help Ye J 1998 On Measuring and Correcting the Effects of Data Mining and Model Selection Journal of the American Statistical Association 93 441 120 131 JSTOR 2669609 eq 7 Clive Loader 1999 Local regression and likelihood ISBN 978 0 387 98775 0 doi 10 1007 b98858 eq 2 18 p 30 a b Trevor Hastie Robert Tibshirani 1990 Generalized additive models CRC Press p 54 and eq B 1 p 305 Simon N Wood 2006 Generalized additive models an introduction with R CRC Press eq 4 14 p 172 David Ruppert M P Wand R J Carroll 2003 Semiparametric Regression Cambridge University Press eq 3 28 p 82 James S Hodges 2014 Richly Parameterized Linear Models CRC Press 2 Peter J Green B W Silverman 1994 Nonparametric regression and generalized linear models a roughness penalty approach CRC Press eq 3 15 p 37 Clive D Rodgers 2000 Inverse methods for atmospheric sounding theory and practice World Scientific eq 2 56 p 31 Adrian Doicu Thomas Trautmann Franz Schreier 2010 Numerical Regularization for Atmospheric Inverse Problems Springer eq 4 26 p 114 D Dong T A Herring and R W King 1997 Estimating regional deformation from a combination of space and terrestrial geodetic data J Geodesy 72 4 200 214 doi 10 1007 s001900050161 eq 27 p 205 H Theil 1963 On the Use of Incomplete Prior Information in Regression Analysis Journal of the American Statistical Association 58 302 401 414 JSTOR 2283275 eq 5 19 5 20 Jones D A 1983 Statistical analysis of empirical models fitted by optimisation Biometrika 70 1 67 88Further reading EditBowers David 1982 Statistics for Economists London Macmillan pp 175 178 ISBN 0 333 30110 2 Eisenhauer J G 2008 Degrees of Freedom Teaching Statistics 30 3 75 78 doi 10 1111 j 1467 9639 2008 00324 x Good I J 1973 What Are Degrees of Freedom The American Statistician 27 5 227 228 doi 10 1080 00031305 1973 10479042 JSTOR 3087407 Walker H W 1940 Degrees of Freedom Journal of Educational Psychology 31 4 253 269 doi 10 1037 h0054588 Transcription by C Olsen with errataExternal links EditYu Chong ho 1997 Illustrating degrees of freedom in terms of sample size and dimensionality Dallal GE 2003 Degrees of Freedom Retrieved from https en wikipedia org w index php title Degrees of freedom statistics amp oldid 1095839190 Effective degrees of freedom, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.