fbpx
Wikipedia

Mallows's Cp

In statistics, Mallows's ,[1][2] named for Colin Lingwood Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. A small value of means that the model is relatively precise.

Mallows's Cp has been shown to be equivalent to Akaike information criterion in the special case of Gaussian linear regression.[3]

Definition and properties edit

Mallows's Cp addresses the issue of overfitting, in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model. Thus, if we aim to select the model giving the smallest residual sum of squares, the model including all variables would always be selected. Instead, the Cp statistic calculated on a sample of data estimates the sum squared prediction error (SSPE) as its population target

 

where   is the fitted value from the regression model for the ith case, E(Yi | Xi) is the expected value for the ith case, and σ2 is the error variance (assumed constant across the cases). The mean squared prediction error (MSPE) will not automatically get smaller as more variables are added. The optimum model under this criterion is a compromise influenced by the sample size, the effect sizes of the different predictors, and the degree of collinearity between them.

If P regressors are selected from a set of K > P, the Cp statistic for that particular set of regressors is defined as:

 

where

  •   is the error sum of squares for the model with P regressors,
  • Ypi is the predicted value of the ith observation of Y from the P regressors,
  • S2 is the estimation of residuals variance after regression on the complete set of K regressors and can be estimated by  ,[4]
  • and N is the sample size.

Alternative definition edit

Given a linear model such as:

 

where:

  •   are coefficients for predictor variables  
  •   represents error

An alternate version of Cp can also be defined as:[5]

 

where

  • RSS is the residual sum of squares on a training set of data
  • p is the number of predictors
  • and   refers to an estimate of the variance associated with each response in the linear model (estimated on a model containing all predictors)

Note that this version of the Cp does not give equivalent values to the earlier version, but the model with the smallest Cp from this definition will also be the same model with the smallest Cp from the earlier definition.

Limitations edit

The Cp criterion suffers from two main limitations[6]

  1. the Cp approximation is only valid for large sample size;
  2. the Cp cannot handle complex collections of models as in the variable selection (or feature selection) problem.[6]

Practical use edit

The Cp statistic is often used as a stopping rule for various forms of stepwise regression. Mallows proposed the statistic as a criterion for selecting among many alternative subset regressions. Under a model not suffering from appreciable lack of fit (bias), Cp has expectation nearly equal to P; otherwise the expectation is roughly P plus a positive bias term. Nevertheless, even though it has expectation greater than or equal to P, there is nothing to prevent Cp < P or even Cp < 0 in extreme cases. It is suggested that one should choose a subset that has Cp approaching P,[7] from above, for a list of subsets ordered by increasing P. In practice, the positive bias can be adjusted for by selecting a model from the ordered list of subsets, such that Cp < 2P.

Since the sample-based Cp statistic is an estimate of the MSPE, using Cp for model selection does not completely guard against overfitting. For instance, it is possible that the selected model will be one in which the sample Cp was a particularly severe underestimate of the MSPE.

Model selection statistics such as Cp are generally not used blindly, but rather information about the field of application, the intended use of the model, and any known biases in the data are taken into account in the process of model selection.

See also edit

References edit

  1. ^ Mallows, C. L. (1973). "Some Comments on CP". Technometrics. 15 (4): 661–675. doi:10.2307/1267380. JSTOR 1267380.
  2. ^ Gilmour, Steven G. (1996). "The interpretation of Mallows's Cp-statistic". Journal of the Royal Statistical Society, Series D. 45 (1): 49–56. JSTOR 2348411.
  3. ^ Boisbunon, Aurélie; Canu, Stephane; Fourdrinier, Dominique; Strawderman, William; Wells, Martin T. (2013). "AIC, Cp and estimators of loss for elliptically symmetric distributions". arXiv:1308.2766 [math.ST].
  4. ^ Mallows, C. L. (1973). "Some Comments on CP". Technometrics. 15 (4): 661–675. doi:10.2307/1267380. JSTOR 1267380.
  5. ^ James, Gareth; Witten; Hastie; Tibshirani (2013-06-24). An Introduction to Statistical Learning. Springer. ISBN 978-1-4614-7138-7.
  6. ^ a b Giraud, C. (2015), Introduction to high-dimensional statistics, Chapman & Hall/CRC, ISBN 9781482237948
  7. ^ Daniel, C.; Wood, F. (1980). Fitting Equations to Data (Rev. ed.). New York: Wiley & Sons, Inc.

Further reading edit

  • Chow, Gregory C. (1983). Econometrics. New York: McGraw-Hill. pp. 291–293. ISBN 978-0-07-010847-9.
  • Hocking, R. R. (1976). "The analysis and selection of variables in linear regression". Biometrics. 32 (1): 1–50. CiteSeerX 10.1.1.472.4742. doi:10.2307/2529336. JSTOR 2529336.
  • Judge, George G.; Griffiths, William E.; Hill, R. Carter; Lee, Tsoung-Chao (1980). The Theory and Practice of Econometrics. New York: Wiley. pp. 417–423. ISBN 978-0-471-05938-7.

mallows, statistics, mallows, textstyle, boldsymbol, named, colin, lingwood, mallows, used, assess, regression, model, that, been, estimated, using, ordinary, least, squares, applied, context, model, selection, where, number, predictor, variables, available, p. In statistics Mallows s Cp textstyle boldsymbol C p 1 2 named for Colin Lingwood Mallows is used to assess the fit of a regression model that has been estimated using ordinary least squares It is applied in the context of model selection where a number of predictor variables are available for predicting some outcome and the goal is to find the best model involving a subset of these predictors A small value of Cp textstyle C p means that the model is relatively precise Mallows s Cp has been shown to be equivalent to Akaike information criterion in the special case of Gaussian linear regression 3 Contents 1 Definition and properties 2 Alternative definition 3 Limitations 4 Practical use 5 See also 6 References 7 Further readingDefinition and properties editMallows s Cp addresses the issue of overfitting in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model Thus if we aim to select the model giving the smallest residual sum of squares the model including all variables would always be selected Instead the Cp statistic calculated on a sample of data estimates the sum squared prediction error SSPE as its population target E i Y i E Yi Xi 2 s2 displaystyle E sum i hat Y i E Y i mid X i 2 sigma 2 nbsp where Y i displaystyle hat Y i nbsp is the fitted value from the regression model for the ith case E Yi Xi is the expected value for the ith case and s2 is the error variance assumed constant across the cases The mean squared prediction error MSPE will not automatically get smaller as more variables are added The optimum model under this criterion is a compromise influenced by the sample size the effect sizes of the different predictors and the degree of collinearity between them If P regressors are selected from a set of K gt P the Cp statistic for that particular set of regressors is defined as Cp SSEpS2 N 2 P 1 displaystyle C p SSE p over S 2 N 2 P 1 nbsp where SSEp i 1N Yi Y pi 2 displaystyle SSE p sum i 1 N Y i hat Y pi 2 nbsp is the error sum of squares for the model with P regressors Ypi is the predicted value of the ith observation of Y from the P regressors S2 is the estimation of residuals variance after regression on the complete set of K regressors and can be estimated by 1N K i 1N Yi Y i 2 displaystyle 1 over N K sum i 1 N Y i hat Y i 2 nbsp 4 and N is the sample size Alternative definition editGiven a linear model such as Y b0 b1X1 bpXp e displaystyle Y beta 0 beta 1 X 1 cdots beta p X p varepsilon nbsp where b0 bp displaystyle beta 0 ldots beta p nbsp are coefficients for predictor variables X1 Xp displaystyle X 1 ldots X p nbsp e displaystyle varepsilon nbsp represents errorAn alternate version of Cp can also be defined as 5 Cp 1n RSS 2ps 2 displaystyle C p frac 1 n operatorname RSS 2p hat sigma 2 nbsp where RSS is the residual sum of squares on a training set of data p is the number of predictors and s 2 displaystyle hat sigma 2 nbsp refers to an estimate of the variance associated with each response in the linear model estimated on a model containing all predictors Note that this version of the Cp does not give equivalent values to the earlier version but the model with the smallest Cp from this definition will also be the same model with the smallest Cp from the earlier definition Limitations editThe Cp criterion suffers from two main limitations 6 the Cp approximation is only valid for large sample size the Cp cannot handle complex collections of models as in the variable selection or feature selection problem 6 Practical use editThe Cp statistic is often used as a stopping rule for various forms of stepwise regression Mallows proposed the statistic as a criterion for selecting among many alternative subset regressions Under a model not suffering from appreciable lack of fit bias Cp has expectation nearly equal to P otherwise the expectation is roughly P plus a positive bias term Nevertheless even though it has expectation greater than or equal to P there is nothing to prevent Cp lt P or even Cp lt 0 in extreme cases It is suggested that one should choose a subset that has Cp approaching P 7 from above for a list of subsets ordered by increasing P In practice the positive bias can be adjusted for by selecting a model from the ordered list of subsets such that Cp lt 2P Since the sample based Cp statistic is an estimate of the MSPE using Cp for model selection does not completely guard against overfitting For instance it is possible that the selected model will be one in which the sample Cp was a particularly severe underestimate of the MSPE Model selection statistics such as Cp are generally not used blindly but rather information about the field of application the intended use of the model and any known biases in the data are taken into account in the process of model selection See also editGoodness of fit Regression analysis Coefficient of determinationReferences edit Mallows C L 1973 Some Comments on CP Technometrics 15 4 661 675 doi 10 2307 1267380 JSTOR 1267380 Gilmour Steven G 1996 The interpretation of Mallows s Cp statistic Journal of the Royal Statistical Society Series D 45 1 49 56 JSTOR 2348411 Boisbunon Aurelie Canu Stephane Fourdrinier Dominique Strawderman William Wells Martin T 2013 AIC Cp and estimators of loss for elliptically symmetric distributions arXiv 1308 2766 math ST Mallows C L 1973 Some Comments on CP Technometrics 15 4 661 675 doi 10 2307 1267380 JSTOR 1267380 James Gareth Witten Hastie Tibshirani 2013 06 24 An Introduction to Statistical Learning Springer ISBN 978 1 4614 7138 7 a b Giraud C 2015 Introduction to high dimensional statistics Chapman amp Hall CRC ISBN 9781482237948 Daniel C Wood F 1980 Fitting Equations to Data Rev ed New York Wiley amp Sons Inc Further reading editChow Gregory C 1983 Econometrics New York McGraw Hill pp 291 293 ISBN 978 0 07 010847 9 Hocking R R 1976 The analysis and selection of variables in linear regression Biometrics 32 1 1 50 CiteSeerX 10 1 1 472 4742 doi 10 2307 2529336 JSTOR 2529336 Judge George G Griffiths William E Hill R Carter Lee Tsoung Chao 1980 The Theory and Practice of Econometrics New York Wiley pp 417 423 ISBN 978 0 471 05938 7 Retrieved from https en wikipedia org w index php title Mallows 27s Cp amp oldid 1215857551, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.