fbpx
Wikipedia

Gauss–Markov theorem

In statistics, the Gauss–Markov theorem (or simply Gauss theorem for some authors)[1] states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero.[2] The errors do not need to be normal for the theorem to apply, nor do they need to be independent and identically distributed (only uncorrelated with mean zero and homoscedastic with finite variance).

The requirement for unbiasedness cannot be dropped, since biased estimators exist with lower variance and mean squared error. For example, the James–Stein estimator (which also drops linearity) and ridge regression typically outperform ordinary least squares. In fact, ordinary least squares is rarely even an admissible estimator, as Stein's phenomenon shows--when estimating more than two unknown variables, ordinary least squares will always perform worse (in mean squared error) than Stein's estimator.

Moreover, the Gauss-Markov theorem does not apply when considering more principled loss functions, such as the assigned likelihood or Kullback–Leibler divergence, except in the limited case of normally-distributed errors.

As a result of these discoveries, statisticians typically motivate ordinary least squares by the principle of maximum likelihood instead, or by considering it as a kind of approximate Bayesian inference.

The theorem is named after Carl Friedrich Gauss and Andrey Markov. Gauss provided the original proof,[3] which was later substantially generalized by Markov.[4]

Scalar Case Statement edit

Suppose we are given two random variable vectors,  .

Suppose want to find the best linear estimator of Y given X, such that

  would be the estimator,

and parameters   such that it would be the best linear estimator of X.

Such an estimator   would have the same linear properties of Y,  .

Therefore, if the vector X has properties of  , the best linear estimator would be

 

since it has the same mean and variance as Y.

Statement edit

Suppose we have, in matrix notation, the linear relationship

 

expanding to,

 

where   are non-random but unobservable parameters,   are non-random and observable (called the "explanatory variables"),   are random, and so   are random. The random variables   are called the "disturbance", "noise" or simply "error" (will be contrasted with "residual" later in the article; see errors and residuals in statistics). Note that to include a constant in the model above, one can choose to introduce the constant as a variable   with a newly introduced last column of X being unity i.e.,   for all  . Note that though   as sample responses, are observable, the following statements and arguments including assumptions, proofs and the others assume under the only condition of knowing   but not  

The Gauss–Markov assumptions concern the set of error random variables,  :

  • They have mean zero:  
  • They are homoscedastic, that is all have the same finite variance:   for all   and
  • Distinct error terms are uncorrelated:  

A linear estimator of   is a linear combination

 

in which the coefficients   are not allowed to depend on the underlying coefficients  , since those are not observable, but are allowed to depend on the values  , since these data are observable. (The dependence of the coefficients on each   is typically nonlinear; the estimator is linear in each   and hence in each random   which is why this is "linear" regression.) The estimator is said to be unbiased if and only if

 

regardless of the values of  . Now, let   be some linear combination of the coefficients. Then the mean squared error of the corresponding estimation is

 

in other words, it is the expectation of the square of the weighted sum (across parameters) of the differences between the estimators and the corresponding parameters to be estimated. (Since we are considering the case in which all the parameter estimates are unbiased, this mean squared error is the same as the variance of the linear combination.) The best linear unbiased estimator (BLUE) of the vector   of parameters   is one with the smallest mean squared error for every vector   of linear combination parameters. This is equivalent to the condition that

 

is a positive semi-definite matrix for every other linear unbiased estimator  .

The ordinary least squares estimator (OLS) is the function

 

of   and   (where   denotes the transpose of  ) that minimizes the sum of squares of residuals (misprediction amounts):

 

The theorem now states that the OLS estimator is a best linear unbiased estimator (BLUE).

The main idea of the proof is that the least-squares estimator is uncorrelated with every linear unbiased estimator of zero, i.e., with every linear combination   whose coefficients do not depend upon the unobservable   but whose expected value is always zero.

Remark edit

Proof that the OLS indeed minimizes the sum of squares of residuals may proceed as follows with a calculation of the Hessian matrix and showing that it is positive definite.

The MSE function we want to minimize is

 
for a multiple regression model with p variables. The first derivative is
 
where   is the design matrix
 

The Hessian matrix of second derivatives is

 

Assuming the columns of   are linearly independent so that   is invertible, let  , then

 

Now let   be an eigenvector of  .

 

In terms of vector multiplication, this means

 
where   is the eigenvalue corresponding to  . Moreover,
 

Finally, as eigenvector   was arbitrary, it means all eigenvalues of   are positive, therefore   is positive definite. Thus,

 
is indeed a global minimum.

Or, just see that for all vectors  . So the Hessian is positive definite if full rank.

Proof edit

Let   be another linear estimator of   with   where   is a   non-zero matrix. As we're restricting to unbiased estimators, minimum mean squared error implies minimum variance. The goal is therefore to show that such an estimator has a variance no smaller than that of   the OLS estimator. We calculate:

 

Therefore, since   is unobservable,   is unbiased if and only if  . Then:

 

Since   is a positive semidefinite matrix,   exceeds   by a positive semidefinite matrix.

Remarks on the proof edit

As it has been stated before, the condition of   is a positive semidefinite matrix is equivalent to the property that the best linear unbiased estimator of   is   (best in the sense that it has minimum variance). To see this, let   another linear unbiased estimator of  .

 

Moreover, equality holds if and only if  . We calculate

 

This proves that the equality holds if and only if   which gives the uniqueness of the OLS estimator as a BLUE.

Generalized least squares estimator edit

The generalized least squares (GLS), developed by Aitken,[5] extends the Gauss–Markov theorem to the case where the error vector has a non-scalar covariance matrix.[6] The Aitken estimator is also a BLUE.

Gauss–Markov theorem as stated in econometrics edit

In most treatments of OLS, the regressors (parameters of interest) in the design matrix   are assumed to be fixed in repeated samples. This assumption is considered inappropriate for a predominantly nonexperimental science like econometrics.[7] Instead, the assumptions of the Gauss–Markov theorem are stated conditional on  .

Linearity edit

The dependent variable is assumed to be a linear function of the variables specified in the model. The specification must be linear in its parameters. This does not mean that there must be a linear relationship between the independent and dependent variables. The independent variables can take non-linear forms as long as the parameters are linear. The equation   qualifies as linear while   can be transformed to be linear by replacing   by another parameter, say  . An equation with a parameter dependent on an independent variable does not qualify as linear, for example  , where   is a function of  .

Data transformations are often used to convert an equation into a linear form. For example, the Cobb–Douglas function—often used in economics—is nonlinear:

 

But it can be expressed in linear form by taking the natural logarithm of both sides:[8]

 

This assumption also covers specification issues: assuming that the proper functional form has been selected and there are no omitted variables.

One should be aware, however, that the parameters that minimize the residuals of the transformed equation do not necessarily minimize the residuals of the original equation.

Strict exogeneity edit

For all   observations, the expectation—conditional on the regressors—of the error term is zero:[9]

 

where   is the data vector of regressors for the ith observation, and consequently   is the data matrix or design matrix.

Geometrically, this assumption implies that   and   are orthogonal to each other, so that their inner product (i.e., their cross moment) is zero.

 

This assumption is violated if the explanatory variables are measured with error, or are endogenous.[10] Endogeneity can be the result of simultaneity, where causality flows back and forth between both the dependent and independent variable. Instrumental variable techniques are commonly used to address this problem.

Full rank edit

The sample data matrix   must have full column rank.

 

Otherwise   is not invertible and the OLS estimator cannot be computed.

A violation of this assumption is perfect multicollinearity, i.e. some explanatory variables are linearly dependent. One scenario in which this will occur is called "dummy variable trap," when a base dummy variable is not omitted resulting in perfect correlation between the dummy variables and the constant term.[11]

Multicollinearity (as long as it is not "perfect") can be present resulting in a less efficient, but still unbiased estimate. The estimates will be less precise and highly sensitive to particular sets of data.[12] Multicollinearity can be detected from condition number or the variance inflation factor, among other tests.

Spherical errors edit

The outer product of the error vector must be spherical.

 

This implies the error term has uniform variance (homoscedasticity) and no serial correlation.[13] If this assumption is violated, OLS is still unbiased, but inefficient. The term "spherical errors" will describe the multivariate normal distribution: if   in the multivariate normal density, then the equation   is the formula for a ball centered at μ with radius σ in n-dimensional space.[14]

Heteroskedasticity occurs when the amount of error is correlated with an independent variable. For example, in a regression on food expenditure and income, the error is correlated with income. Low income people generally spend a similar amount on food, while high income people may spend a very large amount or as little as low income people spend. Heteroskedastic can also be caused by changes in measurement practices. For example, as statistical offices improve their data, measurement error decreases, so the error term declines over time.

This assumption is violated when there is autocorrelation. Autocorrelation can be visualized on a data plot when a given observation is more likely to lie above a fitted line if adjacent observations also lie above the fitted regression line. Autocorrelation is common in time series data where a data series may experience "inertia." If a dependent variable takes a while to fully absorb a shock. Spatial autocorrelation can also occur geographic areas are likely to have similar errors. Autocorrelation may be the result of misspecification such as choosing the wrong functional form. In these cases, correcting the specification is one possible way to deal with autocorrelation.

In the presence of spherical errors, the generalized least squares estimator can be shown to be BLUE.[6]

See also edit

Other unbiased statistics edit

References edit

  1. ^ See chapter 7 of Johnson, R.A.; Wichern, D.W. (2002). Applied multivariate statistical analysis. Vol. 5. Prentice hall.
  2. ^ Theil, Henri (1971). "Best Linear Unbiased Estimation and Prediction". Principles of Econometrics. New York: John Wiley & Sons. pp. 119–124. ISBN 0-471-85845-5.
  3. ^ Plackett, R. L. (1949). "A Historical Note on the Method of Least Squares". Biometrika. 36 (3/4): 458–460. doi:10.2307/2332682.
  4. ^ David, F. N.; Neyman, J. (1938). "Extension of the Markoff theorem on least squares". Statistical Research Memoirs. 2: 105–116. OCLC 4025782.
  5. ^ Aitken, A. C. (1935). "On Least Squares and Linear Combinations of Observations". Proceedings of the Royal Society of Edinburgh. 55: 42–48. doi:10.1017/S0370164600014346.
  6. ^ a b Huang, David S. (1970). Regression and Econometric Methods. New York: John Wiley & Sons. pp. 127–147. ISBN 0-471-41754-8.
  7. ^ Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 13. ISBN 0-691-01018-8.
  8. ^ Walters, A. A. (1970). An Introduction to Econometrics. New York: W. W. Norton. p. 275. ISBN 0-393-09931-8.
  9. ^ Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 7. ISBN 0-691-01018-8.
  10. ^ Johnston, John (1972). Econometric Methods (Second ed.). New York: McGraw-Hill. pp. 267–291. ISBN 0-07-032679-7.
  11. ^ Wooldridge, Jeffrey (2012). Introductory Econometrics (Fifth international ed.). South-Western. p. 220. ISBN 978-1-111-53439-4.
  12. ^ Johnston, John (1972). Econometric Methods (Second ed.). New York: McGraw-Hill. pp. 159–168. ISBN 0-07-032679-7.
  13. ^ Hayashi, Fumio (2000). Econometrics. Princeton University Press. p. 10. ISBN 0-691-01018-8.
  14. ^ Ramanathan, Ramu (1993). "Nonspherical Disturbances". Statistical Methods in Econometrics. Academic Press. pp. 330–351. ISBN 0-12-576830-3.

Further reading edit

  • Davidson, James (2000). "Statistical Analysis of the Regression Model". Econometric Theory. Oxford: Blackwell. pp. 17–36. ISBN 0-631-17837-6.
  • Goldberger, Arthur (1991). "Classical Regression". A Course in Econometrics. Cambridge: Harvard University Press. pp. 160–169. ISBN 0-674-17544-1.
  • Theil, Henri (1971). "Least Squares and the Standard Linear Model". Principles of Econometrics. New York: John Wiley & Sons. pp. 101–162. ISBN 0-471-85845-5.

External links edit

  • Earliest Known Uses of Some of the Words of Mathematics: G (brief history and explanation of the name)
  • Proof of the Gauss Markov theorem for multiple linear regression (makes use of matrix algebra)

gauss, markov, theorem, confused, with, gauss, markov, process, blue, redirects, here, queue, management, algorithm, blue, queue, management, algorithm, statistics, simply, gauss, theorem, some, authors, states, that, ordinary, least, squares, estimator, lowes. Not to be confused with Gauss Markov process BLUE redirects here For queue management algorithm see Blue queue management algorithm In statistics the Gauss Markov theorem or simply Gauss theorem for some authors 1 states that the ordinary least squares OLS estimator has the lowest sampling variance within the class of linear unbiased estimators if the errors in the linear regression model are uncorrelated have equal variances and expectation value of zero 2 The errors do not need to be normal for the theorem to apply nor do they need to be independent and identically distributed only uncorrelated with mean zero and homoscedastic with finite variance The requirement for unbiasedness cannot be dropped since biased estimators exist with lower variance and mean squared error For example the James Stein estimator which also drops linearity and ridge regression typically outperform ordinary least squares In fact ordinary least squares is rarely even an admissible estimator as Stein s phenomenon shows when estimating more than two unknown variables ordinary least squares will always perform worse in mean squared error than Stein s estimator Moreover the Gauss Markov theorem does not apply when considering more principled loss functions such as the assigned likelihood or Kullback Leibler divergence except in the limited case of normally distributed errors As a result of these discoveries statisticians typically motivate ordinary least squares by the principle of maximum likelihood instead or by considering it as a kind of approximate Bayesian inference The theorem is named after Carl Friedrich Gauss and Andrey Markov Gauss provided the original proof 3 which was later substantially generalized by Markov 4 Contents 1 Scalar Case Statement 2 Statement 2 1 Remark 3 Proof 3 1 Remarks on the proof 4 Generalized least squares estimator 5 Gauss Markov theorem as stated in econometrics 5 1 Linearity 5 2 Strict exogeneity 5 3 Full rank 5 4 Spherical errors 6 See also 6 1 Other unbiased statistics 7 References 8 Further reading 9 External linksScalar Case Statement editSuppose we are given two random variable vectors X Y where X Y R k displaystyle X text Y text where X Y in mathbb R k nbsp Suppose want to find the best linear estimator of Y given X such that Y a X m where a R and m R displaystyle hat Y alpha X mu text where alpha in mathbb R text and mu in mathbb R nbsp would be the estimator and parameters a m displaystyle alpha mu nbsp such that it would be the best linear estimator of X Such an estimator Y displaystyle hat Y nbsp would have the same linear properties of Y s Y s Y m Y m Y displaystyle sigma hat Y sigma Y mu hat Y mu Y nbsp Therefore if the vector X has properties of m x s x displaystyle mu x sigma x nbsp the best linear estimator would beY s y X m x s x m y displaystyle hat Y sigma y frac X mu x sigma x mu y nbsp since it has the same mean and variance as Y Statement editSuppose we have in matrix notation the linear relationship y X b e y e R n b R K and X R n K displaystyle y X beta varepsilon quad y varepsilon in mathbb R n beta in mathbb R K text and X in mathbb R n times K nbsp expanding to y i j 1 K b j X i j e i i 1 2 n displaystyle y i sum j 1 K beta j X ij varepsilon i quad forall i 1 2 ldots n nbsp where b j displaystyle beta j nbsp are non random but unobservable parameters X i j displaystyle X ij nbsp are non random and observable called the explanatory variables e i displaystyle varepsilon i nbsp are random and so y i displaystyle y i nbsp are random The random variables e i displaystyle varepsilon i nbsp are called the disturbance noise or simply error will be contrasted with residual later in the article see errors and residuals in statistics Note that to include a constant in the model above one can choose to introduce the constant as a variable b K 1 displaystyle beta K 1 nbsp with a newly introduced last column of X being unity i e X i K 1 1 displaystyle X i K 1 1 nbsp for all i displaystyle i nbsp Note that though y i displaystyle y i nbsp as sample responses are observable the following statements and arguments including assumptions proofs and the others assume under the only condition of knowing X i j displaystyle X ij nbsp but not y i displaystyle y i nbsp The Gauss Markov assumptions concern the set of error random variables e i displaystyle varepsilon i nbsp They have mean zero E e i 0 displaystyle operatorname E varepsilon i 0 nbsp They are homoscedastic that is all have the same finite variance Var e i s 2 lt displaystyle operatorname Var varepsilon i sigma 2 lt infty nbsp for all i displaystyle i nbsp and Distinct error terms are uncorrelated Cov e i e j 0 i j displaystyle text Cov varepsilon i varepsilon j 0 forall i neq j nbsp A linear estimator of b j displaystyle beta j nbsp is a linear combination b j c 1 j y 1 c n j y n displaystyle widehat beta j c 1j y 1 cdots c nj y n nbsp in which the coefficients c i j displaystyle c ij nbsp are not allowed to depend on the underlying coefficients b j displaystyle beta j nbsp since those are not observable but are allowed to depend on the values X i j displaystyle X ij nbsp since these data are observable The dependence of the coefficients on each X i j displaystyle X ij nbsp is typically nonlinear the estimator is linear in each y i displaystyle y i nbsp and hence in each random e displaystyle varepsilon nbsp which is why this is linear regression The estimator is said to be unbiased if and only if E b j b j displaystyle operatorname E left widehat beta j right beta j nbsp regardless of the values of X i j displaystyle X ij nbsp Now let j 1 K l j b j textstyle sum j 1 K lambda j beta j nbsp be some linear combination of the coefficients Then the mean squared error of the corresponding estimation is E j 1 K l j b j b j 2 displaystyle operatorname E left left sum j 1 K lambda j left widehat beta j beta j right right 2 right nbsp in other words it is the expectation of the square of the weighted sum across parameters of the differences between the estimators and the corresponding parameters to be estimated Since we are considering the case in which all the parameter estimates are unbiased this mean squared error is the same as the variance of the linear combination The best linear unbiased estimator BLUE of the vector b displaystyle beta nbsp of parameters b j displaystyle beta j nbsp is one with the smallest mean squared error for every vector l displaystyle lambda nbsp of linear combination parameters This is equivalent to the condition that Var b Var b displaystyle operatorname Var left widetilde beta right operatorname Var left widehat beta right nbsp is a positive semi definite matrix for every other linear unbiased estimator b displaystyle widetilde beta nbsp The ordinary least squares estimator OLS is the function b X T X 1 X T y displaystyle widehat beta X operatorname T X 1 X operatorname T y nbsp of y displaystyle y nbsp and X displaystyle X nbsp where X T displaystyle X operatorname T nbsp denotes the transpose of X displaystyle X nbsp that minimizes the sum of squares of residuals misprediction amounts i 1 n y i y i 2 i 1 n y i j 1 K b j X i j 2 displaystyle sum i 1 n left y i widehat y i right 2 sum i 1 n left y i sum j 1 K widehat beta j X ij right 2 nbsp The theorem now states that the OLS estimator is a best linear unbiased estimator BLUE The main idea of the proof is that the least squares estimator is uncorrelated with every linear unbiased estimator of zero i e with every linear combination a 1 y 1 a n y n displaystyle a 1 y 1 cdots a n y n nbsp whose coefficients do not depend upon the unobservable b displaystyle beta nbsp but whose expected value is always zero Remark edit Proof that the OLS indeed minimizes the sum of squares of residuals may proceed as follows with a calculation of the Hessian matrix and showing that it is positive definite The MSE function we want to minimize isf b 0 b 1 b p i 1 n y i b 0 b 1 x i 1 b p x i p 2 displaystyle f beta 0 beta 1 dots beta p sum i 1 n y i beta 0 beta 1 x i1 dots beta p x ip 2 nbsp for a multiple regression model with p variables The first derivative is d d b f 2 X T y X b 2 i 1 n y i b p x i p i 1 n x i 1 y i b p x i p i 1 n x i p y i b p x i p 0 p 1 displaystyle begin aligned frac d d boldsymbol beta f amp 2X operatorname T left mathbf y X boldsymbol beta right amp 2 begin bmatrix sum i 1 n y i dots beta p x ip sum i 1 n x i1 y i dots beta p x ip vdots sum i 1 n x ip y i dots beta p x ip end bmatrix amp mathbf 0 p 1 end aligned nbsp where X T displaystyle X operatorname T nbsp is the design matrix X 1 x 11 x 1 p 1 x 21 x 2 p 1 x n 1 x n p R n p 1 n p 1 displaystyle X begin bmatrix 1 amp x 11 amp cdots amp x 1p 1 amp x 21 amp cdots amp x 2p amp amp vdots 1 amp x n1 amp cdots amp x np end bmatrix in mathbb R n times p 1 qquad n geq p 1 nbsp The Hessian matrix of second derivatives isH 2 n i 1 n x i 1 i 1 n x i p i 1 n x i 1 i 1 n x i 1 2 i 1 n x i 1 x i p i 1 n x i p i 1 n x i p x i 1 i 1 n x i p 2 2 X T X displaystyle mathcal H 2 begin bmatrix n amp sum i 1 n x i1 amp cdots amp sum i 1 n x ip sum i 1 n x i1 amp sum i 1 n x i1 2 amp cdots amp sum i 1 n x i1 x ip vdots amp vdots amp ddots amp vdots sum i 1 n x ip amp sum i 1 n x ip x i1 amp cdots amp sum i 1 n x ip 2 end bmatrix 2X operatorname T X nbsp Assuming the columns of X displaystyle X nbsp are linearly independent so that X T X displaystyle X operatorname T X nbsp is invertible let X v 1 v 2 v p 1 displaystyle X begin bmatrix mathbf v 1 amp mathbf v 2 amp cdots amp mathbf v p 1 end bmatrix nbsp thenk 1 v 1 k p 1 v p 1 0 k 1 k p 1 0 displaystyle k 1 mathbf v 1 dots k p 1 mathbf v p 1 mathbf 0 iff k 1 dots k p 1 0 nbsp Now let k k 1 k p 1 T R p 1 1 displaystyle mathbf k k 1 dots k p 1 T in mathbb R p 1 times 1 nbsp be an eigenvector of H displaystyle mathcal H nbsp k 0 k 1 v 1 k p 1 v p 1 2 gt 0 displaystyle mathbf k neq mathbf 0 implies left k 1 mathbf v 1 dots k p 1 mathbf v p 1 right 2 gt 0 nbsp In terms of vector multiplication this means k 1 k p 1 v 1 v p 1 v 1 v p 1 k 1 k p 1 k T H k l k T k gt 0 displaystyle begin bmatrix k 1 amp cdots amp k p 1 end bmatrix begin bmatrix mathbf v 1 vdots mathbf v p 1 end bmatrix begin bmatrix mathbf v 1 amp cdots amp mathbf v p 1 end bmatrix begin bmatrix k 1 vdots k p 1 end bmatrix mathbf k operatorname T mathcal H mathbf k lambda mathbf k operatorname T mathbf k gt 0 nbsp where l displaystyle lambda nbsp is the eigenvalue corresponding to k displaystyle mathbf k nbsp Moreover k T k i 1 p 1 k i 2 gt 0 l gt 0 displaystyle mathbf k operatorname T mathbf k sum i 1 p 1 k i 2 gt 0 implies lambda gt 0 nbsp Finally as eigenvector k displaystyle mathbf k nbsp was arbitrary it means all eigenvalues of H displaystyle mathcal H nbsp are positive therefore H displaystyle mathcal H nbsp is positive definite Thus b X T X 1 X T Y displaystyle boldsymbol beta left X operatorname T X right 1 X operatorname T Y nbsp is indeed a global minimum Or just see that for all vectors v v T X T X v X v 2 0 displaystyle mathbf v mathbf v operatorname T X operatorname T X mathbf v mathbf X mathbf v 2 geq 0 nbsp So the Hessian is positive definite if full rank Proof editLet b C y displaystyle tilde beta Cy nbsp be another linear estimator of b displaystyle beta nbsp with C X T X 1 X T D displaystyle C X operatorname T X 1 X operatorname T D nbsp where D displaystyle D nbsp is a K n displaystyle K times n nbsp non zero matrix As we re restricting to unbiased estimators minimum mean squared error implies minimum variance The goal is therefore to show that such an estimator has a variance no smaller than that of b displaystyle widehat beta nbsp the OLS estimator We calculate E b E C y E X T X 1 X T D X b e X T X 1 X T D X b X T X 1 X T D E e X T X 1 X T D X b E e 0 X T X 1 X T X b D X b I K D X b displaystyle begin aligned operatorname E left tilde beta right amp operatorname E Cy amp operatorname E left left X operatorname T X 1 X operatorname T D right X beta varepsilon right amp left X operatorname T X 1 X operatorname T D right X beta left X operatorname T X 1 X operatorname T D right operatorname E varepsilon amp left X operatorname T X 1 X operatorname T D right X beta amp amp operatorname E varepsilon 0 amp X operatorname T X 1 X operatorname T X beta DX beta amp I K DX beta end aligned nbsp Therefore since b displaystyle beta nbsp is unobservable b displaystyle tilde beta nbsp is unbiased if and only if D X 0 displaystyle DX 0 nbsp Then Var b Var C y C Var y C T s 2 C C T s 2 X T X 1 X T D X X T X 1 D T s 2 X T X 1 X T X X T X 1 X T X 1 X T D T D X X T X 1 D D T s 2 X T X 1 s 2 X T X 1 D X T s 2 D X X T X 1 s 2 D D T s 2 X T X 1 s 2 D D T D X 0 Var b s 2 D D T s 2 X T X 1 Var b displaystyle begin aligned operatorname Var left tilde beta right amp operatorname Var Cy amp C text Var y C operatorname T amp sigma 2 CC operatorname T amp sigma 2 left X operatorname T X 1 X operatorname T D right left X X operatorname T X 1 D operatorname T right amp sigma 2 left X operatorname T X 1 X operatorname T X X operatorname T X 1 X operatorname T X 1 X operatorname T D operatorname T DX X operatorname T X 1 DD operatorname T right amp sigma 2 X operatorname T X 1 sigma 2 X operatorname T X 1 DX operatorname T sigma 2 DX X operatorname T X 1 sigma 2 DD operatorname T amp sigma 2 X operatorname T X 1 sigma 2 DD operatorname T amp amp DX 0 amp operatorname Var left widehat beta right sigma 2 DD operatorname T amp amp sigma 2 X operatorname T X 1 operatorname Var left widehat beta right end aligned nbsp Since D D T displaystyle DD operatorname T nbsp is a positive semidefinite matrix Var b displaystyle operatorname Var left tilde beta right nbsp exceeds Var b displaystyle operatorname Var left widehat beta right nbsp by a positive semidefinite matrix Remarks on the proof edit As it has been stated before the condition of Var b Var b displaystyle operatorname Var left tilde beta right operatorname Var left widehat beta right nbsp is a positive semidefinite matrix is equivalent to the property that the best linear unbiased estimator of ℓ T b displaystyle ell operatorname T beta nbsp is ℓ T b displaystyle ell operatorname T widehat beta nbsp best in the sense that it has minimum variance To see this let ℓ T b displaystyle ell operatorname T tilde beta nbsp another linear unbiased estimator of ℓ T b displaystyle ell operatorname T beta nbsp Var ℓ T b ℓ T Var b ℓ s 2 ℓ T X T X 1 ℓ ℓ T D D T ℓ Var ℓ T b D T ℓ t D T ℓ s 2 ℓ T X T X 1 ℓ Var ℓ T b Var ℓ T b D T ℓ Var ℓ T b displaystyle begin aligned operatorname Var left ell operatorname T tilde beta right amp ell operatorname T operatorname Var left tilde beta right ell amp sigma 2 ell operatorname T X operatorname T X 1 ell ell operatorname T DD operatorname T ell amp operatorname Var left ell operatorname T widehat beta right D operatorname T ell t D operatorname T ell amp amp sigma 2 ell operatorname T X operatorname T X 1 ell operatorname Var left ell operatorname T widehat beta right amp operatorname Var left ell operatorname T widehat beta right D operatorname T ell amp geq operatorname Var left ell operatorname T widehat beta right end aligned nbsp Moreover equality holds if and only if D T ℓ 0 displaystyle D operatorname T ell 0 nbsp We calculate ℓ T b ℓ T X T X 1 X T D Y from above ℓ T X T X 1 X T Y ℓ T D Y ℓ T b D T ℓ t Y ℓ T b D T ℓ 0 displaystyle begin aligned ell operatorname T tilde beta amp ell operatorname T left X operatorname T X 1 X operatorname T D Y right amp amp text from above amp ell operatorname T X operatorname T X 1 X operatorname T Y ell operatorname T DY amp ell operatorname T widehat beta D operatorname T ell t Y amp ell operatorname T widehat beta amp amp D operatorname T ell 0 end aligned nbsp This proves that the equality holds if and only if ℓ T b ℓ T b displaystyle ell operatorname T tilde beta ell operatorname T widehat beta nbsp which gives the uniqueness of the OLS estimator as a BLUE Generalized least squares estimator editThe generalized least squares GLS developed by Aitken 5 extends the Gauss Markov theorem to the case where the error vector has a non scalar covariance matrix 6 The Aitken estimator is also a BLUE Gauss Markov theorem as stated in econometrics editIn most treatments of OLS the regressors parameters of interest in the design matrix X displaystyle mathbf X nbsp are assumed to be fixed in repeated samples This assumption is considered inappropriate for a predominantly nonexperimental science like econometrics 7 Instead the assumptions of the Gauss Markov theorem are stated conditional on X displaystyle mathbf X nbsp Linearity edit The dependent variable is assumed to be a linear function of the variables specified in the model The specification must be linear in its parameters This does not mean that there must be a linear relationship between the independent and dependent variables The independent variables can take non linear forms as long as the parameters are linear The equation y b 0 b 1 x 2 displaystyle y beta 0 beta 1 x 2 nbsp qualifies as linear while y b 0 b 1 2 x displaystyle y beta 0 beta 1 2 x nbsp can be transformed to be linear by replacing b 1 2 displaystyle beta 1 2 nbsp by another parameter say g displaystyle gamma nbsp An equation with a parameter dependent on an independent variable does not qualify as linear for example y b 0 b 1 x x displaystyle y beta 0 beta 1 x cdot x nbsp where b 1 x displaystyle beta 1 x nbsp is a function of x displaystyle x nbsp Data transformations are often used to convert an equation into a linear form For example the Cobb Douglas function often used in economics is nonlinear Y A L a K 1 a e e displaystyle Y AL alpha K 1 alpha e varepsilon nbsp But it can be expressed in linear form by taking the natural logarithm of both sides 8 ln Y ln A a ln L 1 a ln K e b 0 b 1 ln L b 2 ln K e displaystyle ln Y ln A alpha ln L 1 alpha ln K varepsilon beta 0 beta 1 ln L beta 2 ln K varepsilon nbsp This assumption also covers specification issues assuming that the proper functional form has been selected and there are no omitted variables One should be aware however that the parameters that minimize the residuals of the transformed equation do not necessarily minimize the residuals of the original equation Strict exogeneity edit For all n displaystyle n nbsp observations the expectation conditional on the regressors of the error term is zero 9 E e i X E e i x 1 x n 0 displaystyle operatorname E varepsilon i mid mathbf X operatorname E varepsilon i mid mathbf x 1 dots mathbf x n 0 nbsp where x i x i 1 x i 2 x i k T displaystyle mathbf x i begin bmatrix x i1 amp x i2 amp cdots amp x ik end bmatrix operatorname T nbsp is the data vector of regressors for the ith observation and consequently X x 1 T x 2 T x n T T displaystyle mathbf X begin bmatrix mathbf x 1 operatorname T amp mathbf x 2 operatorname T amp cdots amp mathbf x n operatorname T end bmatrix operatorname T nbsp is the data matrix or design matrix Geometrically this assumption implies that x i displaystyle mathbf x i nbsp and e i displaystyle varepsilon i nbsp are orthogonal to each other so that their inner product i e their cross moment is zero E x j e i E x j 1 e i E x j 2 e i E x j k e i 0 for all i j n displaystyle operatorname E mathbf x j cdot varepsilon i begin bmatrix operatorname E x j1 cdot varepsilon i operatorname E x j2 cdot varepsilon i vdots operatorname E x jk cdot varepsilon i end bmatrix mathbf 0 quad text for all i j in n nbsp This assumption is violated if the explanatory variables are measured with error or are endogenous 10 Endogeneity can be the result of simultaneity where causality flows back and forth between both the dependent and independent variable Instrumental variable techniques are commonly used to address this problem Full rank edit The sample data matrix X displaystyle mathbf X nbsp must have full column rank rank X k displaystyle operatorname rank mathbf X k nbsp Otherwise X T X displaystyle mathbf X operatorname T mathbf X nbsp is not invertible and the OLS estimator cannot be computed A violation of this assumption is perfect multicollinearity i e some explanatory variables are linearly dependent One scenario in which this will occur is called dummy variable trap when a base dummy variable is not omitted resulting in perfect correlation between the dummy variables and the constant term 11 Multicollinearity as long as it is not perfect can be present resulting in a less efficient but still unbiased estimate The estimates will be less precise and highly sensitive to particular sets of data 12 Multicollinearity can be detected from condition number or the variance inflation factor among other tests Spherical errors edit The outer product of the error vector must be spherical E e e T X Var e X s 2 0 0 0 s 2 0 0 0 s 2 s 2 I with s 2 gt 0 displaystyle operatorname E boldsymbol varepsilon boldsymbol varepsilon operatorname T mid mathbf X operatorname Var boldsymbol varepsilon mid mathbf X begin bmatrix sigma 2 amp 0 amp cdots amp 0 0 amp sigma 2 amp cdots amp 0 vdots amp vdots amp ddots amp vdots 0 amp 0 amp cdots amp sigma 2 end bmatrix sigma 2 mathbf I quad text with sigma 2 gt 0 nbsp This implies the error term has uniform variance homoscedasticity and no serial correlation 13 If this assumption is violated OLS is still unbiased but inefficient The term spherical errors will describe the multivariate normal distribution if Var e X s 2 I displaystyle operatorname Var boldsymbol varepsilon mid mathbf X sigma 2 mathbf I nbsp in the multivariate normal density then the equation f e c displaystyle f varepsilon c nbsp is the formula for a ball centered at m with radius s in n dimensional space 14 Heteroskedasticity occurs when the amount of error is correlated with an independent variable For example in a regression on food expenditure and income the error is correlated with income Low income people generally spend a similar amount on food while high income people may spend a very large amount or as little as low income people spend Heteroskedastic can also be caused by changes in measurement practices For example as statistical offices improve their data measurement error decreases so the error term declines over time This assumption is violated when there is autocorrelation Autocorrelation can be visualized on a data plot when a given observation is more likely to lie above a fitted line if adjacent observations also lie above the fitted regression line Autocorrelation is common in time series data where a data series may experience inertia If a dependent variable takes a while to fully absorb a shock Spatial autocorrelation can also occur geographic areas are likely to have similar errors Autocorrelation may be the result of misspecification such as choosing the wrong functional form In these cases correcting the specification is one possible way to deal with autocorrelation In the presence of spherical errors the generalized least squares estimator can be shown to be BLUE 6 See also editIndependent and identically distributed random variables Linear regression Measurement uncertaintyOther unbiased statistics edit Best linear unbiased prediction BLUP Minimum variance unbiased estimator MVUE References edit See chapter 7 of Johnson R A Wichern D W 2002 Applied multivariate statistical analysis Vol 5 Prentice hall Theil Henri 1971 Best Linear Unbiased Estimation and Prediction Principles of Econometrics New York John Wiley amp Sons pp 119 124 ISBN 0 471 85845 5 Plackett R L 1949 A Historical Note on the Method of Least Squares Biometrika 36 3 4 458 460 doi 10 2307 2332682 David F N Neyman J 1938 Extension of the Markoff theorem on least squares Statistical Research Memoirs 2 105 116 OCLC 4025782 Aitken A C 1935 On Least Squares and Linear Combinations of Observations Proceedings of the Royal Society of Edinburgh 55 42 48 doi 10 1017 S0370164600014346 a b Huang David S 1970 Regression and Econometric Methods New York John Wiley amp Sons pp 127 147 ISBN 0 471 41754 8 Hayashi Fumio 2000 Econometrics Princeton University Press p 13 ISBN 0 691 01018 8 Walters A A 1970 An Introduction to Econometrics New York W W Norton p 275 ISBN 0 393 09931 8 Hayashi Fumio 2000 Econometrics Princeton University Press p 7 ISBN 0 691 01018 8 Johnston John 1972 Econometric Methods Second ed New York McGraw Hill pp 267 291 ISBN 0 07 032679 7 Wooldridge Jeffrey 2012 Introductory Econometrics Fifth international ed South Western p 220 ISBN 978 1 111 53439 4 Johnston John 1972 Econometric Methods Second ed New York McGraw Hill pp 159 168 ISBN 0 07 032679 7 Hayashi Fumio 2000 Econometrics Princeton University Press p 10 ISBN 0 691 01018 8 Ramanathan Ramu 1993 Nonspherical Disturbances Statistical Methods in Econometrics Academic Press pp 330 351 ISBN 0 12 576830 3 Further reading editDavidson James 2000 Statistical Analysis of the Regression Model Econometric Theory Oxford Blackwell pp 17 36 ISBN 0 631 17837 6 Goldberger Arthur 1991 Classical Regression A Course in Econometrics Cambridge Harvard University Press pp 160 169 ISBN 0 674 17544 1 Theil Henri 1971 Least Squares and the Standard Linear Model Principles of Econometrics New York John Wiley amp Sons pp 101 162 ISBN 0 471 85845 5 External links editEarliest Known Uses of Some of the Words of Mathematics G brief history and explanation of the name Proof of the Gauss Markov theorem for multiple linear regression makes use of matrix algebra A Proof of the Gauss Markov theorem using geometry Retrieved from https en wikipedia org w index php title Gauss Markov theorem amp oldid 1208856818, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.