fbpx
Wikipedia

Generalized least squares

In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinary least squares and weighted least squares can be statistically inefficient, or even give misleading inferences. GLS was first described by Alexander Aitken in 1936.[1]

Method outline

In standard linear regression models we observe data   on n statistical units. The response values are placed in a vector  , and the predictor values are placed in the design matrix  , where   is a vector of the k predictor variables (including a constant) for the ith unit. The model forces the conditional mean of   given   to be a linear function of  , and assumes the conditional variance of the error term given   is a known nonsingular covariance matrix  . This is usually written as

 

Here   is a vector of unknown constants (known as “regression coefficients”) that must be estimated from the data.

Suppose   is a candidate estimate for  . Then the residual vector for   will be  . The generalized least squares method estimates   by minimizing the squared Mahalanobis length of this residual vector:

 

where the last two terms evaluate to scalars, resulting in

 

This objective is a quadratic form in  .

Taking the gradient of this quadratic form with respect to   and equating it to zero (when  ) gives

 

Therefore, the minimum of the objective function can be computed yielding the explicit formula:

 

The quantity   is known as the precision matrix (or dispersion matrix), a generalization of the diagonal weight matrix.

Properties

The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with   and  . GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor  , for instance using the Cholesky decomposition. Then if we pre-multiply both sides of the equation   by  , we get an equivalent linear model   where  ,  , and  . In this model  , where   is the identity matrix. Thus we can efficiently estimate   by applying Ordinary least squares (OLS) to the transformed data, which requires minimizing

 

This has the effect of standardizing the scale of the errors and “de-correlating” them. Since OLS is applied to data with homoscedastic errors, the Gauss–Markov theorem applies, and therefore the GLS estimate is the best linear unbiased estimator for β.

Weighted least squares

A special case of GLS called weighted least squares (WLS) occurs when all the off-diagonal entries of Ω are 0. This situation arises when the variances of the observed values are unequal (i.e. heteroscedasticity is present), but where no correlations exist among the observed variances. The weight for unit i is proportional to the reciprocal of the variance of the response for unit i.[2]

Feasible generalized least squares

If the covariance of the errors   is unknown, one can get a consistent estimate of  , say  ,[3] using an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator. In FGLS, modeling proceeds in two stages: (1) the model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors covariance matrix (to do so, one often needs to examine the model adding additional constraints, for example if the errors follow a time series process, a statistician generally needs some theoretical assumptions on this process to ensure that a consistent estimator is available); and (2) using the consistent estimator of the covariance matrix of the errors, one can implement GLS ideas.

Whereas GLS is more efficient than OLS under heteroscedasticity (also spelled heteroskedasticity) or autocorrelation, this is not true for FGLS. The feasible estimator is, provided the errors covariance matrix is consistently estimated, asymptotically more efficient, but for a small or medium size sample, it can be actually less efficient than OLS. This is why some authors prefer to use OLS, and reformulate their inferences by simply considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial autocorrelation. But for large samples FGLS is preferred over OLS under heteroskedasticity or serial correlation.[3][4] A cautionary note is that the FGLS estimator is not always consistent. One case in which FGLS might be inconsistent is if there are individual specific fixed effects.[5]

In general this estimator has different properties than GLS. For large samples (i.e., asymptotically) all properties are (under appropriate conditions) common with respect to GLS, but for finite samples the properties of FGLS estimators are unknown: they vary dramatically with each particular model, and as a general rule their exact distributions cannot be derived analytically. For finite samples, FGLS may be even less efficient than OLS in some cases. Thus, while GLS can be made feasible, it is not always wise to apply this method when the sample is small. A method sometimes used to improve the accuracy of the estimators in finite samples is to iterate, i.e. taking the residuals from FGLS to update the errors covariance estimator, and then updating the FGLS estimation, applying the same idea iteratively until the estimators vary less than some tolerance. But this method does not necessarily improve the efficiency of the estimator very much if the original sample was small. A reasonable option when samples are not too large is to apply OLS, but throwing away the classical variance estimator

 

(which is inconsistent in this framework) and using a HAC (Heteroskedasticity and Autocorrelation Consistent) estimator. For example, in autocorrelation context we can use the Bartlett estimator (often known as Newey–West estimator estimator since these authors popularized the use of this estimator among econometricians in their 1987 Econometrica article), and in heteroskedastic context we can use the Eicker–White estimator. This approach is much safer, and it is the appropriate path to take unless the sample is large, and "large" is sometimes a slippery issue (e.g. if the errors distribution is asymmetric the required sample would be much larger).

The ordinary least squares (OLS) estimator is calculated as usual by

 

and estimates of the residuals   are constructed.

For simplicity consider the model for heteroscedastic and not autocorrelated errors. Assume that the variance-covariance matrix   of the error vector is diagonal, or equivalently that errors from distinct observations are uncorrelated. Then each diagonal entry may be estimated by the fitted residuals   so   may be constructed by

 

It is important to notice that the squared residuals cannot be used in the previous expression; we need an estimator of the errors variances. To do so, we can use a parametric heteroskedasticity model, or a nonparametric estimator. Once this step is fulfilled, we can proceed:

Estimate   using   using[4] weighted least squares

 

The procedure can be iterated. The first iteration is given by

 
 
 

This estimation of   can be iterated to convergence.

Under regularity conditions any of the FGLS estimator (or that of any of its iterations, if we iterate a finite number of times) is asymptotically distributed as

 

where n is the sample size and

 

here p-lim means limit in probability.

See also

References

  1. ^ Aitken, A. C. (1936). "On Least-squares and Linear Combinations of Observations". Proceedings of the Royal Society of Edinburgh. 55: 42–48. doi:10.1017/S0370164600014346.
  2. ^ Strutz, T. (2016). Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond). Springer Vieweg. ISBN 978-3-658-11455-8., chapter 3
  3. ^ a b Baltagi, B. H. (2008). Econometrics (4th ed.). New York: Springer.
  4. ^ a b Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.
  5. ^ Hansen, Christian B. (2007). "Generalized Least Squares Inference in Panel and Multilevel Models with Serial Correlation and Fixed Effects". Journal of Econometrics. 140 (2): 670–694. doi:10.1016/j.jeconom.2006.07.011.

Further reading

  • Amemiya, Takeshi (1985). "Generalized Least Squares Theory". Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0.
  • Johnston, John (1972). "Generalized Least-squares:". Econometric Methods (Second ed.). New York: McGraw-Hill. pp. 208–242.
  • Kmenta, Jan (1986). "Generalized Linear Regression Model and Its Applications". Elements of Econometrics (Second ed.). New York: Macmillan. pp. 607–650. ISBN 0-472-10886-7.
  • Beck, Nathaniel; Katz, Jonathan N. (September 1995). "What To Do (and Not to Do) with Time-Series Cross-Section Data". American Political Science Review. 89 (3): 634–647. doi:10.2307/2082979. ISSN 1537-5943. JSTOR 2082979. S2CID 63222945.

generalized, least, squares, confused, with, generalized, linear, model, this, article, tone, style, reflect, encyclopedic, tone, used, wikipedia, wikipedia, guide, writing, better, articles, suggestions, april, 2023, learn, when, remove, this, template, messa. Not to be confused with generalized linear model This article s tone or style may not reflect the encyclopedic tone used on Wikipedia See Wikipedia s guide to writing better articles for suggestions April 2023 Learn how and when to remove this template message In statistics generalized least squares GLS is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model In these cases ordinary least squares and weighted least squares can be statistically inefficient or even give misleading inferences GLS was first described by Alexander Aitken in 1936 1 Contents 1 Method outline 1 1 Properties 2 Weighted least squares 3 Feasible generalized least squares 4 See also 5 References 6 Further readingMethod outline EditIn standard linear regression models we observe data y i x i j i 1 n j 2 k displaystyle y i x ij i 1 dots n j 2 dots k on n statistical units The response values are placed in a vector y y 1 y n T displaystyle mathbf y left y 1 dots y n right mathsf T and the predictor values are placed in the design matrix X x 1 T x n T T displaystyle mathbf X left mathbf x 1 mathsf T dots mathbf x n mathsf T right mathsf T where x i 1 x i 2 x i k displaystyle mathbf x i left 1 x i2 dots x ik right is a vector of the k predictor variables including a constant for the ith unit The model forces the conditional mean of y displaystyle mathbf y given X displaystyle mathbf X to be a linear function of X displaystyle mathbf X and assumes the conditional variance of the error term given X displaystyle mathbf X is a known nonsingular covariance matrix W displaystyle mathbf Omega This is usually written as y X b e E e X 0 Cov e X W displaystyle mathbf y mathbf X mathbf beta mathbf varepsilon qquad operatorname E varepsilon mid mathbf X 0 operatorname Cov varepsilon mid mathbf X mathbf Omega Here b R k displaystyle beta in mathbb R k is a vector of unknown constants known as regression coefficients that must be estimated from the data Suppose b displaystyle mathbf b is a candidate estimate for b displaystyle mathbf beta Then the residual vector for b displaystyle mathbf b will be y X b displaystyle mathbf y mathbf X mathbf b The generalized least squares method estimates b displaystyle mathbf beta by minimizing the squared Mahalanobis length of this residual vector b argmin b y X b T W 1 y X b argmin b y T W 1 y X b T W 1 X b y T W 1 X b X b T W 1 y displaystyle begin aligned mathbf hat beta amp underset b operatorname argmin mathbf y mathbf X mathbf b mathsf T mathbf Omega 1 mathbf y mathbf X mathbf b amp underset b operatorname argmin mathbf y mathsf T mathbf Omega 1 mathbf y mathbf X mathbf b mathsf T mathbf Omega 1 mathbf X mathbf b mathbf y mathsf T mathbf Omega 1 mathbf X mathbf b mathbf X mathbf b mathsf T mathbf Omega 1 mathbf y end aligned where the last two terms evaluate to scalars resulting in b argmin b y T W 1 y b T X T W 1 X b 2 b T X T W 1 y displaystyle mathbf hat beta underset b operatorname argmin mathbf y mathsf T mathbf Omega 1 mathbf y mathbf b mathsf T mathbf X mathsf T mathbf Omega 1 mathbf X mathbf b 2 mathbf b mathsf T mathbf X mathsf T mathbf Omega 1 mathbf y This objective is a quadratic form in b displaystyle mathbf b Taking the gradient of this quadratic form with respect to b displaystyle mathbf b and equating it to zero when b b displaystyle mathbf b hat beta gives 2 X T W 1 X b 2 X T W 1 y 0 displaystyle 2 mathbf X mathsf T mathbf Omega 1 mathbf X hat beta 2 mathbf X mathsf T mathbf Omega 1 mathbf y 0 Therefore the minimum of the objective function can be computed yielding the explicit formula b X T W 1 X 1 X T W 1 y displaystyle mathbf hat beta left mathbf X mathsf T mathbf Omega 1 mathbf X right 1 mathbf X mathsf T mathbf Omega 1 mathbf y The quantity W 1 displaystyle mathbf Omega 1 is known as the precision matrix or dispersion matrix a generalization of the diagonal weight matrix Properties Edit The GLS estimator is unbiased consistent efficient and asymptotically normal with E b X b displaystyle operatorname E hat beta mid mathbf X beta and Cov b X X T W 1 X 1 displaystyle operatorname Cov hat beta mid mathbf X mathbf X mathsf T Omega 1 mathbf X 1 GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data To see this factor W C C T displaystyle mathbf Omega mathbf C mathbf C mathsf T for instance using the Cholesky decomposition Then if we pre multiply both sides of the equation y X b e displaystyle mathbf y mathbf X mathbf beta mathbf varepsilon by C 1 displaystyle mathbf C 1 we get an equivalent linear model y X b e displaystyle mathbf y mathbf X mathbf beta mathbf varepsilon where y C 1 y displaystyle mathbf y mathbf C 1 mathbf y X C 1 X displaystyle mathbf X mathbf C 1 mathbf X and e C 1 e displaystyle mathbf varepsilon mathbf C 1 mathbf varepsilon In this model Var e X C 1 W C 1 T I displaystyle operatorname Var varepsilon mid mathbf X mathbf C 1 mathbf Omega left mathbf C 1 right mathsf T mathbf I where I displaystyle mathbf I is the identity matrix Thus we can efficiently estimate b displaystyle mathbf beta by applying Ordinary least squares OLS to the transformed data which requires minimizing y X b T y X b y X b T W 1 y X b displaystyle left mathbf y mathbf X mathbf beta right mathsf T mathbf y mathbf X mathbf beta mathbf y mathbf X mathbf b mathsf T mathbf Omega 1 mathbf y mathbf X mathbf b This has the effect of standardizing the scale of the errors and de correlating them Since OLS is applied to data with homoscedastic errors the Gauss Markov theorem applies and therefore the GLS estimate is the best linear unbiased estimator for b Weighted least squares EditMain article Weighted least squares A special case of GLS called weighted least squares WLS occurs when all the off diagonal entries of W are 0 This situation arises when the variances of the observed values are unequal i e heteroscedasticity is present but where no correlations exist among the observed variances The weight for unit i is proportional to the reciprocal of the variance of the response for unit i 2 Feasible generalized least squares EditIf the covariance of the errors W displaystyle Omega is unknown one can get a consistent estimate of W displaystyle Omega say W displaystyle widehat Omega 3 using an implementable version of GLS known as the feasible generalized least squares FGLS estimator In FGLS modeling proceeds in two stages 1 the model is estimated by OLS or another consistent but inefficient estimator and the residuals are used to build a consistent estimator of the errors covariance matrix to do so one often needs to examine the model adding additional constraints for example if the errors follow a time series process a statistician generally needs some theoretical assumptions on this process to ensure that a consistent estimator is available and 2 using the consistent estimator of the covariance matrix of the errors one can implement GLS ideas Whereas GLS is more efficient than OLS under heteroscedasticity also spelled heteroskedasticity or autocorrelation this is not true for FGLS The feasible estimator is provided the errors covariance matrix is consistently estimated asymptotically more efficient but for a small or medium size sample it can be actually less efficient than OLS This is why some authors prefer to use OLS and reformulate their inferences by simply considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial autocorrelation But for large samples FGLS is preferred over OLS under heteroskedasticity or serial correlation 3 4 A cautionary note is that the FGLS estimator is not always consistent One case in which FGLS might be inconsistent is if there are individual specific fixed effects 5 In general this estimator has different properties than GLS For large samples i e asymptotically all properties are under appropriate conditions common with respect to GLS but for finite samples the properties of FGLS estimators are unknown they vary dramatically with each particular model and as a general rule their exact distributions cannot be derived analytically For finite samples FGLS may be even less efficient than OLS in some cases Thus while GLS can be made feasible it is not always wise to apply this method when the sample is small A method sometimes used to improve the accuracy of the estimators in finite samples is to iterate i e taking the residuals from FGLS to update the errors covariance estimator and then updating the FGLS estimation applying the same idea iteratively until the estimators vary less than some tolerance But this method does not necessarily improve the efficiency of the estimator very much if the original sample was small A reasonable option when samples are not too large is to apply OLS but throwing away the classical variance estimator s 2 X X 1 displaystyle sigma 2 X X 1 which is inconsistent in this framework and using a HAC Heteroskedasticity and Autocorrelation Consistent estimator For example in autocorrelation context we can use the Bartlett estimator often known as Newey West estimator estimator since these authors popularized the use of this estimator among econometricians in their 1987 Econometrica article and in heteroskedastic context we can use the Eicker White estimator This approach is much safer and it is the appropriate path to take unless the sample is large and large is sometimes a slippery issue e g if the errors distribution is asymmetric the required sample would be much larger The ordinary least squares OLS estimator is calculated as usual by b OLS X X 1 X y displaystyle widehat beta text OLS X X 1 X y and estimates of the residuals u j Y X b OLS j displaystyle widehat u j Y X widehat beta text OLS j are constructed For simplicity consider the model for heteroscedastic and not autocorrelated errors Assume that the variance covariance matrix W displaystyle Omega of the error vector is diagonal or equivalently that errors from distinct observations are uncorrelated Then each diagonal entry may be estimated by the fitted residuals u j displaystyle widehat u j so W O L S displaystyle widehat Omega OLS may be constructed by W OLS diag s 1 2 s 2 2 s n 2 displaystyle widehat Omega text OLS operatorname diag widehat sigma 1 2 widehat sigma 2 2 dots widehat sigma n 2 It is important to notice that the squared residuals cannot be used in the previous expression we need an estimator of the errors variances To do so we can use a parametric heteroskedasticity model or a nonparametric estimator Once this step is fulfilled we can proceed Estimate b F G L S 1 displaystyle beta FGLS1 using W OLS displaystyle widehat Omega text OLS using 4 weighted least squares b F G L S 1 X W OLS 1 X 1 X W OLS 1 y displaystyle widehat beta FGLS1 X widehat Omega text OLS 1 X 1 X widehat Omega text OLS 1 y The procedure can be iterated The first iteration is given by u F G L S 1 Y X b F G L S 1 displaystyle widehat u FGLS1 Y X widehat beta FGLS1 W F G L S 1 diag s F G L S 1 1 2 s F G L S 1 2 2 s F G L S 1 n 2 displaystyle widehat Omega FGLS1 operatorname diag widehat sigma FGLS1 1 2 widehat sigma FGLS1 2 2 dots widehat sigma FGLS1 n 2 b F G L S 2 X W F G L S 1 1 X 1 X W F G L S 1 1 y displaystyle widehat beta FGLS2 X widehat Omega FGLS1 1 X 1 X widehat Omega FGLS1 1 y This estimation of W displaystyle widehat Omega can be iterated to convergence Under regularity conditions any of the FGLS estimator or that of any of its iterations if we iterate a finite number of times is asymptotically distributed as n b F G L S b d N 0 V displaystyle sqrt n hat beta FGLS beta xrightarrow d mathcal N left 0 V right where n is the sample size and V p l i m X W 1 X n displaystyle V operatorname p lim X Omega 1 X n here p lim means limit in probability See also EditConfidence region Effective degrees of freedom Prais Winsten estimationReferences Edit Aitken A C 1936 On Least squares and Linear Combinations of Observations Proceedings of the Royal Society of Edinburgh 55 42 48 doi 10 1017 S0370164600014346 Strutz T 2016 Data Fitting and Uncertainty A practical introduction to weighted least squares and beyond Springer Vieweg ISBN 978 3 658 11455 8 chapter 3 a b Baltagi B H 2008 Econometrics 4th ed New York Springer a b Greene W H 2003 Econometric Analysis 5th ed Upper Saddle River NJ Prentice Hall Hansen Christian B 2007 Generalized Least Squares Inference in Panel and Multilevel Models with Serial Correlation and Fixed Effects Journal of Econometrics 140 2 670 694 doi 10 1016 j jeconom 2006 07 011 Further reading EditAmemiya Takeshi 1985 Generalized Least Squares Theory Advanced Econometrics Harvard University Press ISBN 0 674 00560 0 Johnston John 1972 Generalized Least squares Econometric Methods Second ed New York McGraw Hill pp 208 242 Kmenta Jan 1986 Generalized Linear Regression Model and Its Applications Elements of Econometrics Second ed New York Macmillan pp 607 650 ISBN 0 472 10886 7 Beck Nathaniel Katz Jonathan N September 1995 What To Do and Not to Do with Time Series Cross Section Data American Political Science Review 89 3 634 647 doi 10 2307 2082979 ISSN 1537 5943 JSTOR 2082979 S2CID 63222945 Retrieved from https en wikipedia org w index php title Generalized least squares amp oldid 1152955107, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.