fbpx
Wikipedia

Working–Hotelling procedure

In statistics, particularly regression analysis, the Working–Hotelling procedure, named after Holbrook Working and Harold Hotelling, is a method of simultaneous estimation in linear regression models. One of the first developments in simultaneous inference, it was devised by Working and Hotelling for the simple linear regression model in 1929.[1] It provides a confidence region for multiple mean responses, that is, it gives the upper and lower bounds of more than one value of a dependent variable at several levels of the independent variables at a certain confidence level. The resulting confidence bands are known as the Working–Hotelling–Scheffé confidence bands.

Like the closely related Scheffé's method in the analysis of variance, which considers all possible contrasts, the Working–Hotelling procedure considers all possible values of the independent variables; that is, in a particular regression model, the probability that all the Working–Hotelling confidence intervals cover the true value of the mean response is the confidence coefficient. As such, when only a small subset of the possible values of the independent variable is considered, it is more conservative and yields wider intervals than competitors like the Bonferroni correction at the same level of confidence. It outperforms the Bonferroni correction as more values are considered.

Statement

Simple linear regression

Consider a simple linear regression model  , where   is the response variable and   the explanatory variable, and let   and   be the least-squares estimates of   and   respectively. Then the least-squares estimate of the mean response   at the level   is  . It can then be shown, assuming that the errors independently and identically follow the normal distribution, that an   confidence interval of the mean response at a certain level of   is as follows:

 

where   is the mean squared error and   denotes the upper   percentile of Student's t-distribution with   degrees of freedom.

However, as multiple mean responses are estimated, the confidence level declines rapidly. To fix the confidence coefficient at  , the Working–Hotelling approach employs an F-statistic:[2][3]

 

where   and   denotes the upper   percentile of the F-distribution with   degrees of freedom. The confidence level of is   over all values of  , i.e.  .

Multiple linear regression

The Working–Hotelling confidence bands can be easily generalised to multiple linear regression. Consider a general linear model as defined in the linear regressions article, that is,

 

where

 

Again, it can be shown that the least-squares estimate of the mean response   is  , where   consists of least-square estimates of the entries in  , i.e.  . Likewise, it can be shown that a   confidence interval for a single mean response estimate is as follows:[4]

 

where   is the observed value of the mean squared error  .

The Working–Hotelling approach to multiple estimations is similar to that of simple linear regression, with only a change in the degrees of freedom:[3]

 

where  .

Graphical representation

In the simple linear regression case, Working–Hotelling–Scheffé confidence bands, drawn by connecting the upper and lower limits of the mean response at every level, take the shape of hyperbolas. In drawing, they are sometimes approximated by the Graybill–Bowden confidence bands, which are linear and hence easier to graph:[2]

 

where  denotes the upper   percentile of the Studentized maximum modulus distribution with two means and   degrees of freedom.

 
The simple linear regression model with a Working–Hotelling confidence band.

Numerical example

The same data in ordinary least squares are utilised in this example:

Height (m) 1.47 1.50 1.52 1.55 1.57 1.60 1.63 1.65 1.68 1.70 1.73 1.75 1.78 1.80 1.83
Weight (kg) 52.21 53.12 54.48 55.84 57.20 58.57 59.93 61.29 63.11 64.47 66.28 68.10 69.92 72.19 74.46

A simple linear regression model is fit to this data. The values of   and   have been found to be −39.06 and 61.27 respectively. The goal is to estimate the mean mass of women given their heights at the 95% confidence level. The value of   was found to be  . It was also found that  ,  ,   and  . Then, to predict the mean mass of all women of a particular height, the following Working–Hotelling–Scheffé band has been derived:

 

which results in the graph on the left.

Comparison with other methods

 
Bonferroni bands for the same linear regression model, based on estimating the response variable given the observed values of X. The confidence bands are noticeably tighter.

The Working–Hotelling approach may give tighter or looser confidence limits compared to the Bonferroni correction. In general, for small families of statements, the Bonferroni bounds may be tighter, but when the number of estimated values increases, the Working–Hotelling procedure will yield narrower limits. This is because the confidence level of Working–Hotelling–Scheffé bounds is exactly   when all values of the independent variables, i.e.  , are considered. Alternatively, from an algebraic perspective, the critical value   remains constant as the number estimates of increases, whereas the corresponding values in Bonferonni estimates,  , will be increasingly divergent as the number   of estimates increases. Therefore, the Working–Hotelling method is more suited for large-scale comparisons, whereas Bonferroni is preferred if only a few mean responses are to be estimated. In practice, both methods are usually used first and the narrower interval chosen.[4]

Another alternative to the Working–Hotelling–Scheffé band is the Gavarian band, which is used when a confidence band is needed that maintains equal widths at all levels.[5]

The Working–Hotelling procedure is based on the same principles as Scheffé's method, which gives family confidence intervals for all possible contrasts.[6] Their proofs are almost identical.[5] This is because both methods estimate linear combinations of mean response at all factor levels. However, the Working–Hotelling procedure does not deal with contrasts but with different levels of the independent variable, so there is no requirement that the coefficients of the parameters sum up to zero. Therefore, it has one more degree of freedom.[6]

See also

Footnotes

  1. ^ Miller (1966), p. 1
  2. ^ a b Miller (2014)
  3. ^ a b Neter, Wasserman and Kutner, pp. 163–165
  4. ^ a b Neter, Wasserman and Kutner, pp. 244–245
  5. ^ a b Miller (1966), pp. 123–127
  6. ^ a b Westfall, Tobias and Wolfinger, pp. 277–280

Bibliography

  • Graybill, Franklin A.; Bowden, David C. (1967-06-01). "Linear Segment Confidence Bands for Simple Linear Models". Journal of the American Statistical Association. 62 (318): 403–408. doi:10.1080/01621459.1967.10482917. ISSN 0162-1459.
  • Miller, Rupert G. (1966). Simultaneous Statistical Inference. New York: Springer-Verlag. ISBN 978-1-4613-8124-2.
  • Miller, R. (2014). "Multiple Comparisons I". Encyclopedia of Statistical Sciences. doi:10.1002/0471667196. hdl:11693/51057. ISBN 9780471667193.
  • Neter, John; Wasserman, William; Kutner, Michael (1990). Applied Linear Statistical Models. Tokyo: Richard D Irwin, Inc. ISBN 978-0-256-08338-5.
  • Westfall, Peter H; Tobias, R D; Wolfinger, Russell Dean (2011). Multiple comparisons and multiple tests using SAS. Cary, N.C.: SAS Pub. ISBN 9781607648857.
  • Working, Holbrook; Hotelling, Harold (1929-03-01). "Applications of the Theory of Error to the Interpretation of Trends". Journal of the American Statistical Association. 24 (165A): 73–85. doi:10.1080/01621459.1929.10506274. ISSN 0162-1459.

working, hotelling, procedure, statistics, particularly, regression, analysis, named, after, holbrook, working, harold, hotelling, method, simultaneous, estimation, linear, regression, models, first, developments, simultaneous, inference, devised, working, hot. In statistics particularly regression analysis the Working Hotelling procedure named after Holbrook Working and Harold Hotelling is a method of simultaneous estimation in linear regression models One of the first developments in simultaneous inference it was devised by Working and Hotelling for the simple linear regression model in 1929 1 It provides a confidence region for multiple mean responses that is it gives the upper and lower bounds of more than one value of a dependent variable at several levels of the independent variables at a certain confidence level The resulting confidence bands are known as the Working Hotelling Scheffe confidence bands Like the closely related Scheffe s method in the analysis of variance which considers all possible contrasts the Working Hotelling procedure considers all possible values of the independent variables that is in a particular regression model the probability that all the Working Hotelling confidence intervals cover the true value of the mean response is the confidence coefficient As such when only a small subset of the possible values of the independent variable is considered it is more conservative and yields wider intervals than competitors like the Bonferroni correction at the same level of confidence It outperforms the Bonferroni correction as more values are considered Contents 1 Statement 1 1 Simple linear regression 1 2 Multiple linear regression 2 Graphical representation 3 Numerical example 4 Comparison with other methods 5 See also 6 Footnotes 7 BibliographyStatement EditSimple linear regression Edit Consider a simple linear regression model Y b 0 b 1 X e displaystyle Y beta 0 beta 1 X varepsilon where Y displaystyle Y is the response variable and X displaystyle X the explanatory variable and let b 0 displaystyle b 0 and b 1 displaystyle b 1 be the least squares estimates of b 0 displaystyle beta 0 and b 1 displaystyle beta 1 respectively Then the least squares estimate of the mean response E Y i displaystyle E Y i at the level X x i displaystyle X x i is Y i b 0 b 1 x i displaystyle hat Y i b 0 b 1 x i It can then be shown assuming that the errors independently and identically follow the normal distribution that an 1 a displaystyle 1 alpha confidence interval of the mean response at a certain level of X displaystyle X is as follows y i b 0 b 1 x i t a 2 df n 2 1 n 2 j 1 n e j 2 1 n x i x 2 j 1 n x j x 2 displaystyle hat y i in left b 0 b 1 x i pm t alpha 2 text df n 2 sqrt left frac 1 n 2 sum j 1 n e j 2 right cdot left frac 1 n frac x i bar x 2 sum j 1 n x j bar x 2 right right where 1 n 2 j 1 n e j 2 displaystyle left frac 1 n 2 sum j 1 n e j 2 right is the mean squared error and t a 2 df n 2 displaystyle t alpha 2 text df n 2 denotes the upper a 2 th displaystyle frac alpha 2 text th percentile of Student s t distribution with n 2 displaystyle n 2 degrees of freedom However as multiple mean responses are estimated the confidence level declines rapidly To fix the confidence coefficient at 1 a displaystyle 1 alpha the Working Hotelling approach employs an F statistic 2 3 y i b 0 b 1 x i W 1 n 2 j 1 n e j 2 1 n x i x 2 j 1 n x j x 2 displaystyle hat y i in left b 0 b 1 x i pm W sqrt left frac 1 n 2 sum j 1 n e j 2 right cdot left frac 1 n frac x i bar x 2 sum j 1 n x j bar x 2 right right where W 2 2 F a df 2 n 2 displaystyle W 2 2F alpha text df 2 n 2 and F displaystyle F denotes the upper a th displaystyle alpha text th percentile of the F distribution with 2 n 2 displaystyle 2 n 2 degrees of freedom The confidence level of is 1 a displaystyle 1 alpha over all values of X displaystyle X i e x i R displaystyle x i in mathbb R Multiple linear regression Edit The Working Hotelling confidence bands can be easily generalised to multiple linear regression Consider a general linear model as defined in the linear regressions article that is Y X b e displaystyle mathbf Y mathbf X boldsymbol beta boldsymbol varepsilon where Y Y 1 Y 2 Y n X x 1 T x 2 T x n T x 11 x 1 p x 21 x 2 p x n 1 x n p b b 1 b 2 b p e e 1 e 2 e n displaystyle mathbf Y begin pmatrix Y 1 Y 2 vdots Y n end pmatrix quad mathbf X begin pmatrix mathbf x 1 rm T mathbf x 2 rm T vdots mathbf x n rm T end pmatrix begin pmatrix x 11 amp cdots amp x 1p x 21 amp cdots amp x 2p vdots amp ddots amp vdots x n1 amp cdots amp x np end pmatrix boldsymbol beta begin pmatrix beta 1 beta 2 vdots beta p end pmatrix quad boldsymbol varepsilon begin pmatrix varepsilon 1 varepsilon 2 vdots varepsilon n end pmatrix Again it can be shown that the least squares estimate of the mean response E Y i x i T b displaystyle E Y i mathbf x i rm T boldsymbol beta is Y i x i T b displaystyle hat Y i mathbf x i rm T mathbf b where b displaystyle mathbf b consists of least square estimates of the entries in b displaystyle boldsymbol beta i e b X T X 1 X T Y displaystyle mathbf b mathbf X rm T mathbf X 1 mathbf X rm T mathbf Y Likewise it can be shown that a 1 a displaystyle 1 alpha confidence interval for a single mean response estimate is as follows 4 y i x i T b t a 2 df n p MSE x i T X T X 1 x i displaystyle hat y i in left mathbf x i rm T mathbf b pm t alpha 2 text df n p sqrt operatorname MSE mathbf x i rm T mathbf X rm T mathbf X 1 mathbf x i right where MSE displaystyle operatorname MSE is the observed value of the mean squared error Y T Y b T X T Y displaystyle Y rm T Y mathbf b rm T X rm T Y The Working Hotelling approach to multiple estimations is similar to that of simple linear regression with only a change in the degrees of freedom 3 y i x i T b W MSE x i T X T X 1 x i displaystyle hat y i in left mathbf x i rm T mathbf b pm W sqrt operatorname MSE mathbf x i rm T mathbf X rm T mathbf X 1 mathbf x i right where W 2 2 F a df p n p displaystyle W 2 2F alpha text df p n p Graphical representation EditIn the simple linear regression case Working Hotelling Scheffe confidence bands drawn by connecting the upper and lower limits of the mean response at every level take the shape of hyperbolas In drawing they are sometimes approximated by the Graybill Bowden confidence bands which are linear and hence easier to graph 2 b 0 b 1 x i x b 0 b 1 x i x m a 2 df n 2 1 n x i x j 1 n x j x displaystyle beta 0 beta 1 x i bar x in left b 0 b 1 x i bar x pm m alpha 2 text df n 2 cdot left frac 1 sqrt n frac x i bar x sqrt sum j 1 n x j bar x right right where m a 2 df n 2 displaystyle m alpha 2 text df n 2 denotes the upper a th displaystyle alpha text th percentile of the Studentized maximum modulus distribution with two means and n 2 displaystyle n 2 degrees of freedom The simple linear regression model with a Working Hotelling confidence band Numerical example EditThe same data in ordinary least squares are utilised in this example Height m 1 47 1 50 1 52 1 55 1 57 1 60 1 63 1 65 1 68 1 70 1 73 1 75 1 78 1 80 1 83Weight kg 52 21 53 12 54 48 55 84 57 20 58 57 59 93 61 29 63 11 64 47 66 28 68 10 69 92 72 19 74 46A simple linear regression model is fit to this data The values of b 0 displaystyle b 0 and b 1 displaystyle b 1 have been found to be 39 06 and 61 27 respectively The goal is to estimate the mean mass of women given their heights at the 95 confidence level The value of W 2 displaystyle W 2 was found to be F 0 95 df 2 15 2 2 758828 displaystyle F 0 95 text df 2 15 2 2 758828 It was also found that x 1 651 displaystyle bar x 1 651 j 1 n e j 2 7 490558 displaystyle sum j 1 n e j 2 7 490558 MSE 0 5761968 displaystyle operatorname MSE 0 5761968 and j 1 n x j x 2 693 3726 displaystyle sum j 1 n x j bar x 2 693 3726 Then to predict the mean mass of all women of a particular height the following Working Hotelling Scheffe band has been derived y i 39 06 61 27 x i 2 758828 0 5761968 1 15 x i 1 651 2 693 3726 displaystyle hat y i in left 39 06 61 27x i pm sqrt 2 758828 cdot 0 5761968 cdot left frac 1 15 frac x i 1 651 2 693 3726 right right which results in the graph on the left Comparison with other methods Edit Bonferroni bands for the same linear regression model based on estimating the response variable given the observed values of X The confidence bands are noticeably tighter The Working Hotelling approach may give tighter or looser confidence limits compared to the Bonferroni correction In general for small families of statements the Bonferroni bounds may be tighter but when the number of estimated values increases the Working Hotelling procedure will yield narrower limits This is because the confidence level of Working Hotelling Scheffe bounds is exactly 1 a displaystyle 1 alpha when all values of the independent variables i e x i R displaystyle x i in mathbb R are considered Alternatively from an algebraic perspective the critical value W displaystyle pm sqrt W remains constant as the number estimates of increases whereas the corresponding values in Bonferonni estimates t 1 a g df n p displaystyle pm t 1 alpha g text df n p will be increasingly divergent as the number g displaystyle g of estimates increases Therefore the Working Hotelling method is more suited for large scale comparisons whereas Bonferroni is preferred if only a few mean responses are to be estimated In practice both methods are usually used first and the narrower interval chosen 4 Another alternative to the Working Hotelling Scheffe band is the Gavarian band which is used when a confidence band is needed that maintains equal widths at all levels 5 The Working Hotelling procedure is based on the same principles as Scheffe s method which gives family confidence intervals for all possible contrasts 6 Their proofs are almost identical 5 This is because both methods estimate linear combinations of mean response at all factor levels However the Working Hotelling procedure does not deal with contrasts but with different levels of the independent variable so there is no requirement that the coefficients of the parameters sum up to zero Therefore it has one more degree of freedom 6 See also EditMultiple comparisonsFootnotes Edit Miller 1966 p 1 a b Miller 2014 a b Neter Wasserman and Kutner pp 163 165 a b Neter Wasserman and Kutner pp 244 245 a b Miller 1966 pp 123 127 a b Westfall Tobias and Wolfinger pp 277 280Bibliography EditGraybill Franklin A Bowden David C 1967 06 01 Linear Segment Confidence Bands for Simple Linear Models Journal of the American Statistical Association 62 318 403 408 doi 10 1080 01621459 1967 10482917 ISSN 0162 1459 Miller Rupert G 1966 Simultaneous Statistical Inference New York Springer Verlag ISBN 978 1 4613 8124 2 Miller R 2014 Multiple Comparisons I Encyclopedia of Statistical Sciences doi 10 1002 0471667196 hdl 11693 51057 ISBN 9780471667193 Neter John Wasserman William Kutner Michael 1990 Applied Linear Statistical Models Tokyo Richard D Irwin Inc ISBN 978 0 256 08338 5 Westfall Peter H Tobias R D Wolfinger Russell Dean 2011 Multiple comparisons and multiple tests using SAS Cary N C SAS Pub ISBN 9781607648857 Working Holbrook Hotelling Harold 1929 03 01 Applications of the Theory of Error to the Interpretation of Trends Journal of the American Statistical Association 24 165A 73 85 doi 10 1080 01621459 1929 10506274 ISSN 0162 1459 Retrieved from https en wikipedia org w index php title Working Hotelling procedure amp oldid 1005893718, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.