fbpx
Wikipedia

Deming regression

In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional data set. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.

Deming regression. The red lines show the error in both x and y. This is different from the traditional least squares method, which measures error parallel to the y axis. The case shown, with deviations measured perpendicularly, arises when errors in x and y have equal variances.

Deming regression is equivalent to the maximum likelihood estimation of an errors-in-variables model in which the errors for the two variables are assumed to be independent and normally distributed, and the ratio of their variances, denoted δ, is known.[1] In practice, this ratio might be estimated from related data-sources; however the regression procedure takes no account for possible errors in estimating this ratio.

The Deming regression is only slightly more difficult to compute than the simple linear regression. Most statistical software packages used in clinical chemistry offer Deming regression.

The model was originally introduced by Adcock (1878) who considered the case δ = 1, and then more generally by Kummell (1879) with arbitrary δ. However their ideas remained largely unnoticed for more than 50 years, until they were revived by Koopmans (1936) and later propagated even more by Deming (1943). The latter book became so popular in clinical chemistry and related fields that the method was even dubbed Deming regression in those fields.[2]

Specification edit

Assume that the available data (yi, xi) are measured observations of the "true" values (yi*, xi*), which lie on the regression line:

 

where errors ε and η are independent and the ratio of their variances is assumed to be known:

 

In practice, the variances of the   and   parameters are often unknown, which complicates the estimate of  . Note that when the measurement method for   and   is the same, these variances are likely to be equal, so   for this case.

We seek to find the line of "best fit"

 

such that the weighted sum of squared residuals of the model is minimized:[3]

 

See Jensen (2007) for a full derivation.

Solution edit

The solution can be expressed in terms of the second-degree sample moments. That is, we first calculate the following quantities (all sums go from i = 1 to n):

 

Finally, the least-squares estimates of model's parameters will be[4]

 

Orthogonal regression edit

For the case of equal error variances, i.e., when  , Deming regression becomes orthogonal regression: it minimizes the sum of squared perpendicular distances from the data points to the regression line. In this case, denote each observation as a point   in the complex plane (i.e., the point   where   is the imaginary unit). Denote as   the sum of the squared differences of the data points from the centroid   (also denoted in complex coordinates), which is the point whose horizontal and vertical locations are the averages of those of the data points. Then:[5]

  • If  , then every line through the centroid is a line of best orthogonal fit.
  • If  , the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to  .

A trigonometric representation of the orthogonal regression line was given by Coolidge in 1913.[6]

Application edit

In the case of three non-collinear points in the plane, the triangle with these points as its vertices has a unique Steiner inellipse that is tangent to the triangle's sides at their midpoints. The major axis of this ellipse falls on the orthogonal regression line for the three vertices.[7] The quantification of a biological cell's intrinsic cellular noise can be quantified upon applying Deming regression to the observed behavior of a two reporter synthetic biological circuit.[8]

When humans are asked to draw a linear regression on a scatterplot by guessing, their answers are closer to orthogonal regression than to ordinary least squares regression.[9]

York regression edit

The York regression extends Deming regression by allowing correlated errors in x and y.[10]

See also edit

References edit

Notes
  1. ^ Linnet 1993.
  2. ^ Cornbleet & Gochman 1979.
  3. ^ Fuller 1987, Ch. 1.3.3.
  4. ^ Glaister 2001.
  5. ^ Minda & Phelps 2008, Theorem 2.3.
  6. ^ Coolidge 1913.
  7. ^ Minda & Phelps 2008, Corollary 2.4.
  8. ^ Quarton 2020.
  9. ^ Ciccione, Lorenzo; Dehaene, Stanislas (August 2021). "Can humans perform mental regression on a graph? Accuracy and bias in the perception of scatterplots". Cognitive Psychology. 128: 101406. doi:10.1016/j.cogpsych.2021.101406.
  10. ^ York, D., Evensen, N. M., Martınez, M. L., and Delgado, J. D. B.: Unified equations for the slope, intercept, and standard errors of the best straight line, Am. J. Phys., 72, 367–375, https://doi.org/10.1119/1.1632486, 2004.
Bibliography
  • Adcock, R. J. (1878). "A problem in least squares". The Analyst. 5 (2): 53–54. doi:10.2307/2635758. JSTOR 2635758.
  • Coolidge, J. L. (1913). "Two geometrical applications of the mathematics of least squares". The American Mathematical Monthly. 20 (6): 187–190. doi:10.2307/2973072. JSTOR 2973072.
  • Cornbleet, P.J.; Gochman, N. (1979). "Incorrect Least–Squares Regression Coefficients". Clinical Chemistry. 25 (3): 432–438. doi:10.1093/clinchem/25.3.432. PMID 262186.
  • Deming, W. E. (1943). Statistical adjustment of data. Wiley, NY (Dover Publications edition, 1985). ISBN 0-486-64685-8.
  • Fuller, Wayne A. (1987). Measurement error models. John Wiley & Sons, Inc. ISBN 0-471-86187-1.
  • Glaister, P. (2001). "Least squares revisited". The Mathematical Gazette. 85: 104–107. doi:10.2307/3620485. JSTOR 3620485. S2CID 125949467.
  • Jensen, Anders Christian (2007). "Deming regression, MethComp package" (PDF). Gentofte, Denmark: Steno Diabetes Center.
  • Koopmans, T. C. (1936). Linear regression analysis of economic time series. DeErven F. Bohn, Haarlem, Netherlands.
  • Kummell, C. H. (1879). "Reduction of observation equations which contain more than one observed quantity". The Analyst. 6 (4): 97–105. doi:10.2307/2635646. JSTOR 2635646.
  • Linnet, K. (1993). "Evaluation of regression procedures for method comparison studies". Clinical Chemistry. 39 (3): 424–432. doi:10.1093/clinchem/39.3.424. PMID 8448852.
  • Minda, D.; Phelps, S. (2008). "Triangles, ellipses, and cubic polynomials". American Mathematical Monthly. 115 (8): 679–689. doi:10.1080/00029890.2008.11920581. MR 2456092. S2CID 15049234.
  • Quarton, T. G. (2020). "Uncoupling gene expression noise along the central dogma using genome engineered human cell lines". Nucleic Acids Research. 48 (16): 9406–9413. doi:10.1093/nar/gkaa668. PMC 7498316. PMID 32810265.

deming, regression, statistics, named, after, edwards, deming, errors, variables, model, that, tries, find, line, best, dimensional, data, differs, from, simple, linear, regression, that, accounts, errors, observations, both, axis, special, case, total, least,. In statistics Deming regression named after W Edwards Deming is an errors in variables model that tries to find the line of best fit for a two dimensional data set It differs from the simple linear regression in that it accounts for errors in observations on both the x and the y axis It is a special case of total least squares which allows for any number of predictors and a more complicated error structure Deming regression The red lines show the error in both x and y This is different from the traditional least squares method which measures error parallel to the y axis The case shown with deviations measured perpendicularly arises when errors in x and y have equal variances Deming regression is equivalent to the maximum likelihood estimation of an errors in variables model in which the errors for the two variables are assumed to be independent and normally distributed and the ratio of their variances denoted d is known 1 In practice this ratio might be estimated from related data sources however the regression procedure takes no account for possible errors in estimating this ratio The Deming regression is only slightly more difficult to compute than the simple linear regression Most statistical software packages used in clinical chemistry offer Deming regression The model was originally introduced by Adcock 1878 who considered the case d 1 and then more generally by Kummell 1879 with arbitrary d However their ideas remained largely unnoticed for more than 50 years until they were revived by Koopmans 1936 and later propagated even more by Deming 1943 The latter book became so popular in clinical chemistry and related fields that the method was even dubbed Deming regression in those fields 2 Contents 1 Specification 2 Solution 3 Orthogonal regression 3 1 Application 4 York regression 5 See also 6 ReferencesSpecification editAssume that the available data yi xi are measured observations of the true values yi xi which lie on the regression line yi yi ei xi xi hi displaystyle begin aligned y i amp y i varepsilon i x i amp x i eta i end aligned nbsp where errors e and h are independent and the ratio of their variances is assumed to be known d se2sh2 displaystyle delta frac sigma varepsilon 2 sigma eta 2 nbsp In practice the variances of the x displaystyle x nbsp and y displaystyle y nbsp parameters are often unknown which complicates the estimate of d displaystyle delta nbsp Note that when the measurement method for x displaystyle x nbsp and y displaystyle y nbsp is the same these variances are likely to be equal so d 1 displaystyle delta 1 nbsp for this case We seek to find the line of best fit y b0 b1x displaystyle y beta 0 beta 1 x nbsp such that the weighted sum of squared residuals of the model is minimized 3 SSR i 1n ei2se2 hi2sh2 1sϵ2 i 1n yi b0 b1xi 2 d xi xi 2 minb0 b1 x1 xn SSR displaystyle SSR sum i 1 n bigg frac varepsilon i 2 sigma varepsilon 2 frac eta i 2 sigma eta 2 bigg frac 1 sigma epsilon 2 sum i 1 n Big y i beta 0 beta 1 x i 2 delta x i x i 2 Big to min beta 0 beta 1 x 1 ldots x n SSR nbsp See Jensen 2007 for a full derivation Solution editThe solution can be expressed in terms of the second degree sample moments That is we first calculate the following quantities all sums go from i 1 to n x 1n xiy 1n yi sxx 1n xi x 2 x2 x 2 sxy 1n xi x yi y xy x y syy 1n yi y 2 y2 y 2 displaystyle begin aligned overline x amp tfrac 1 n sum x i amp overline y amp tfrac 1 n sum y i s xx amp tfrac 1 n sum x i overline x 2 amp amp overline x 2 overline x 2 s xy amp tfrac 1 n sum x i overline x y i overline y amp amp overline xy overline x overline y s yy amp tfrac 1 n sum y i overline y 2 amp amp overline y 2 overline y 2 end aligned nbsp Finally the least squares estimates of model s parameters will be 4 b 1 syy dsxx syy dsxx 2 4dsxy22sxy b 0 y b 1x x i xi b 1b 12 d yi b 0 b 1xi displaystyle begin aligned amp hat beta 1 frac s yy delta s xx sqrt s yy delta s xx 2 4 delta s xy 2 2s xy amp hat beta 0 overline y hat beta 1 overline x amp hat x i x i frac hat beta 1 hat beta 1 2 delta y i hat beta 0 hat beta 1 x i end aligned nbsp Orthogonal regression editFor the case of equal error variances i e when d 1 displaystyle delta 1 nbsp Deming regression becomes orthogonal regression it minimizes the sum of squared perpendicular distances from the data points to the regression line In this case denote each observation as a point zj xj iyj displaystyle z j x j iy j nbsp in the complex plane i e the point xj yj displaystyle x j y j nbsp where i displaystyle i nbsp is the imaginary unit Denote as S zj z 2 displaystyle S sum z j overline z 2 nbsp the sum of the squared differences of the data points from the centroid z 1n zj displaystyle overline z tfrac 1 n sum z j nbsp also denoted in complex coordinates which is the point whose horizontal and vertical locations are the averages of those of the data points Then 5 If S 0 displaystyle S 0 nbsp then every line through the centroid is a line of best orthogonal fit If S 0 displaystyle S neq 0 nbsp the orthogonal regression line goes through the centroid and is parallel to the vector from the origin to S displaystyle sqrt S nbsp A trigonometric representation of the orthogonal regression line was given by Coolidge in 1913 6 Application edit In the case of three non collinear points in the plane the triangle with these points as its vertices has a unique Steiner inellipse that is tangent to the triangle s sides at their midpoints The major axis of this ellipse falls on the orthogonal regression line for the three vertices 7 The quantification of a biological cell s intrinsic cellular noise can be quantified upon applying Deming regression to the observed behavior of a two reporter synthetic biological circuit 8 When humans are asked to draw a linear regression on a scatterplot by guessing their answers are closer to orthogonal regression than to ordinary least squares regression 9 York regression editThe York regression extends Deming regression by allowing correlated errors in x and y 10 See also editLine fitting Regression dilutionReferences editNotes Linnet 1993 Cornbleet amp Gochman 1979 Fuller 1987 Ch 1 3 3 Glaister 2001 Minda amp Phelps 2008 Theorem 2 3 Coolidge 1913 Minda amp Phelps 2008 Corollary 2 4 Quarton 2020 Ciccione Lorenzo Dehaene Stanislas August 2021 Can humans perform mental regression on a graph Accuracy and bias in the perception of scatterplots Cognitive Psychology 128 101406 doi 10 1016 j cogpsych 2021 101406 York D Evensen N M Martinez M L and Delgado J D B Unified equations for the slope intercept and standard errors of the best straight line Am J Phys 72 367 375 https doi org 10 1119 1 1632486 2004 BibliographyAdcock R J 1878 A problem in least squares The Analyst 5 2 53 54 doi 10 2307 2635758 JSTOR 2635758 Coolidge J L 1913 Two geometrical applications of the mathematics of least squares The American Mathematical Monthly 20 6 187 190 doi 10 2307 2973072 JSTOR 2973072 Cornbleet P J Gochman N 1979 Incorrect Least Squares Regression Coefficients Clinical Chemistry 25 3 432 438 doi 10 1093 clinchem 25 3 432 PMID 262186 Deming W E 1943 Statistical adjustment of data Wiley NY Dover Publications edition 1985 ISBN 0 486 64685 8 Fuller Wayne A 1987 Measurement error models John Wiley amp Sons Inc ISBN 0 471 86187 1 Glaister P 2001 Least squares revisited The Mathematical Gazette 85 104 107 doi 10 2307 3620485 JSTOR 3620485 S2CID 125949467 Jensen Anders Christian 2007 Deming regression MethComp package PDF Gentofte Denmark Steno Diabetes Center Koopmans T C 1936 Linear regression analysis of economic time series DeErven F Bohn Haarlem Netherlands Kummell C H 1879 Reduction of observation equations which contain more than one observed quantity The Analyst 6 4 97 105 doi 10 2307 2635646 JSTOR 2635646 Linnet K 1993 Evaluation of regression procedures for method comparison studies Clinical Chemistry 39 3 424 432 doi 10 1093 clinchem 39 3 424 PMID 8448852 Minda D Phelps S 2008 Triangles ellipses and cubic polynomials American Mathematical Monthly 115 8 679 689 doi 10 1080 00029890 2008 11920581 MR 2456092 S2CID 15049234 Quarton T G 2020 Uncoupling gene expression noise along the central dogma using genome engineered human cell lines Nucleic Acids Research 48 16 9406 9413 doi 10 1093 nar gkaa668 PMC 7498316 PMID 32810265 Retrieved from https en wikipedia org w index php title Deming regression amp oldid 1216958516 Orthogonal regression, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.