fbpx
Wikipedia

Root mean square deviation

The root mean square deviation (RMSD) or root mean square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on the other.

RMSD of a sample edit

The RMSD of a sample is the quadratic mean of the differences between the observed values and predicted ones. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation (and are therefore always in reference to an estimate) and are called errors (or prediction errors) when computed out-of-sample (aka on the full set, referencing a true value rather than an estimate). The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.[1]

RMSD is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSD is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used.

RMSD is the square root of the average of squared errors. The effect of each error on RMSD is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSD. Consequently, RMSD is sensitive to outliers.[2][3]

Formulas edit

Estimator edit

The RMSD of an estimator   with respect to an estimated parameter   is defined as the square root of the mean squared error:

 

For an unbiased estimator, the RMSD is the square root of the variance, known as the standard deviation.

Samples edit

If X1, ..., Xn is a sample of a population with true mean value  , then the RMSD of the sample is

 .

The RMSD of predicted values   for times t of a regression's dependent variable   with variables observed over T times, is computed for T different predictions as the square root of the mean of the squares of the deviations:

 

(For regressions on cross-sectional data, the subscript t is replaced by i and T is replaced by n.)

In some disciplines, the RMSD is used to compare differences between two things that may vary, neither of which is accepted as the "standard". For example, when measuring the average difference between two time series   and  , the formula becomes

 

Normalization edit

Normalizing the RMSD facilitates the comparison between datasets or models with different scales. Though there is no consistent means of normalization in the literature, common choices are the mean or the range (defined as the maximum value minus the minimum value) of the measured data:[4]

  or  .

This value is commonly referred to as the normalized root mean square deviation or error (NRMSD or NRMSE), and often expressed as a percentage, where lower values indicate less residual variance. This is also called Coefficient of Variation or Percent RMS. In many cases, especially for smaller samples, the sample range is likely to be affected by the size of sample which would hamper comparisons.

Another possible method to make the RMSD a more useful comparison measure is to divide the RMSD by the interquartile range (IQR). When dividing the RMSD with the IQR the normalized value gets less sensitive for extreme values in the target variable.

  where  

with   and   where CDF−1 is the quantile function.

When normalizing by the mean value of the measurements, the term coefficient of variation of the RMSD, CV(RMSD) may be used to avoid ambiguity.[5] This is analogous to the coefficient of variation with the RMSD taking the place of the standard deviation.

 

Mean absolute error edit

Some researchers[who?] have recommended[where?] the use of the mean absolute error (MAE) instead of the root mean square deviation. MAE possesses advantages in interpretability over RMSD. MAE is the average of the absolute values of the errors. MAE is fundamentally easier to understand than the square root of the average of squared errors. Furthermore, each error influences MAE in direct proportion to the absolute value of the error, which is not the case for RMSD.[2]

Applications edit

  • In meteorology, to see how effectively a mathematical model predicts the behavior of the atmosphere.
  • In bioinformatics, the root mean square deviation of atomic positions is the measure of the average distance between the atoms of superimposed proteins.
  • In structure based drug design, the RMSD is a measure of the difference between a crystal conformation of the ligand conformation and a docking prediction.
  • In economics, the RMSD is used to determine whether an economic model fits economic indicators. Some experts have argued that RMSD is less reliable than Relative Absolute Error.[6]
  • In experimental psychology, the RMSD is used to assess how well mathematical or computational models of behavior explain the empirically observed behavior.
  • In GIS, the RMSD is one measure used to assess the accuracy of spatial analysis and remote sensing.
  • In hydrogeology, RMSD and NRMSD are used to evaluate the calibration of a groundwater model.[7]
  • In imaging science, the RMSD is part of the peak signal-to-noise ratio, a measure used to assess how well a method to reconstruct an image performs relative to the original image.
  • In computational neuroscience, the RMSD is used to assess how well a system learns a given model.[8]
  • In protein nuclear magnetic resonance spectroscopy, the RMSD is used as a measure to estimate the quality of the obtained bundle of structures.
  • Submissions for the Netflix Prize were judged using the RMSD from the test dataset's undisclosed "true" values.
  • In the simulation of energy consumption of buildings, the RMSE and CV(RMSE) are used to calibrate models to measured building performance.[9]
  • In X-ray crystallography, RMSD (and RMSZ) is used to measure the deviation of the molecular internal coordinates deviate from the restraints library values.
  • In control theory, the RMSE is used as a quality measure to evaluate the performance of a state observer.[10]
  • In fluid dynamics, normalized root mean square deviation (NRMSD), coefficient of variation (CV), and percent RMS are used to quantify the uniformity of flow behavior such as velocity profile, temperature distribution, or gas species concentration. The value is compared to industry standards to optimize the design of flow and thermal equipment and processes.

See also edit

References edit

  1. ^ Hyndman, Rob J.; Koehler, Anne B. (2006). "Another look at measures of forecast accuracy". International Journal of Forecasting. 22 (4): 679–688. CiteSeerX 10.1.1.154.9771. doi:10.1016/j.ijforecast.2006.03.001. S2CID 15947215.
  2. ^ a b Pontius, Robert; Thontteh, Olufunmilayo; Chen, Hao (2008). "Components of information for multiple resolution comparison between maps that share a real variable" (PDF). Environmental Ecological Statistics. 15 (2): 111–142. Bibcode:2008EnvES..15..111P. doi:10.1007/s10651-007-0043-y. S2CID 21427573.
  3. ^ Willmott, Cort; Matsuura, Kenji (2006). "On the use of dimensioned measures of error to evaluate the performance of spatial interpolators". International Journal of Geographical Information Science. 20 (1): 89–102. Bibcode:2006IJGIS..20...89W. doi:10.1080/13658810500286976. S2CID 15407960.
  4. ^ "Coastal Inlets Research Program (CIRP) Wiki - Statistics". Retrieved 4 February 2015.
  5. ^ "FAQ: What is the coefficient of variation?". Retrieved 19 February 2019.
  6. ^ Armstrong, J. Scott; Collopy, Fred (1992). "Error Measures For Generalizing About Forecasting Methods: Empirical Comparisons" (PDF). International Journal of Forecasting. 8 (1): 69–80. CiteSeerX 10.1.1.423.508. doi:10.1016/0169-2070(92)90008-w. S2CID 11034360.
  7. ^ Anderson, M.P.; Woessner, W.W. (1992). Applied Groundwater Modeling: Simulation of Flow and Advective Transport (2nd ed.). Academic Press.
  8. ^ Ensemble Neural Network Model
  9. ^ ANSI/BPI-2400-S-2012: Standard Practice for Standardized Qualification of Whole-House Energy Savings Predictions by Calibration to Energy Use History
  10. ^ https://kalman-filter.com/root-mean-square-error

root, mean, square, deviation, bioinformatics, concept, atomic, positions, root, mean, square, deviation, rmsd, root, mean, square, error, rmse, either, closely, related, frequently, used, measures, differences, between, true, predicted, values, hand, observed. For the bioinformatics concept see Root mean square deviation of atomic positions The root mean square deviation RMSD or root mean square error RMSE is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an estimator on the other Contents 1 RMSD of a sample 2 Formulas 2 1 Estimator 2 2 Samples 3 Normalization 4 Mean absolute error 5 Applications 6 See also 7 ReferencesRMSD of a sample editThe RMSD of a sample is the quadratic mean of the differences between the observed values and predicted ones These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are therefore always in reference to an estimate and are called errors or prediction errors when computed out of sample aka on the full set referencing a true value rather than an estimate The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power RMSD is a measure of accuracy to compare forecasting errors of different models for a particular dataset and not between datasets as it is scale dependent 1 RMSD is always non negative and a value of 0 almost never achieved in practice would indicate a perfect fit to the data In general a lower RMSD is better than a higher one However comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used RMSD is the square root of the average of squared errors The effect of each error on RMSD is proportional to the size of the squared error thus larger errors have a disproportionately large effect on RMSD Consequently RMSD is sensitive to outliers 2 3 Formulas editEstimator edit The RMSD of an estimator 8 displaystyle hat theta nbsp with respect to an estimated parameter 8 displaystyle theta nbsp is defined as the square root of the mean squared error RMSD 8 MSE 8 E 8 8 2 displaystyle operatorname RMSD hat theta sqrt operatorname MSE hat theta sqrt operatorname E hat theta theta 2 nbsp For an unbiased estimator the RMSD is the square root of the variance known as the standard deviation Samples edit If X1 Xn is a sample of a population with true mean value x 0 displaystyle x 0 nbsp then the RMSD of the sample is RMSD 1 n i 1 n X i x 0 2 displaystyle operatorname RMSD sqrt frac 1 n sum i 1 n X i x 0 2 nbsp The RMSD of predicted values y t displaystyle hat y t nbsp for times t of a regression s dependent variable y t displaystyle y t nbsp with variables observed over T times is computed for T different predictions as the square root of the mean of the squares of the deviations RMSD t 1 T y t y t 2 T displaystyle operatorname RMSD sqrt frac sum t 1 T y t hat y t 2 T nbsp For regressions on cross sectional data the subscript t is replaced by i and T is replaced by n In some disciplines the RMSD is used to compare differences between two things that may vary neither of which is accepted as the standard For example when measuring the average difference between two time series x 1 t displaystyle x 1 t nbsp and x 2 t displaystyle x 2 t nbsp the formula becomes RMSD t 1 T x 1 t x 2 t 2 T displaystyle operatorname RMSD sqrt frac sum t 1 T x 1 t x 2 t 2 T nbsp Normalization editNormalizing the RMSD facilitates the comparison between datasets or models with different scales Though there is no consistent means of normalization in the literature common choices are the mean or the range defined as the maximum value minus the minimum value of the measured data 4 N R M S D R M S D y max y min displaystyle mathrm NRMSD frac mathrm RMSD y max y min nbsp or N R M S D R M S D y displaystyle mathrm NRMSD frac mathrm RMSD bar y nbsp This value is commonly referred to as the normalized root mean square deviation or error NRMSD or NRMSE and often expressed as a percentage where lower values indicate less residual variance This is also called Coefficient of Variation or Percent RMS In many cases especially for smaller samples the sample range is likely to be affected by the size of sample which would hamper comparisons Another possible method to make the RMSD a more useful comparison measure is to divide the RMSD by the interquartile range IQR When dividing the RMSD with the IQR the normalized value gets less sensitive for extreme values in the target variable R M S D I Q R R M S D I Q R displaystyle mathrm RMSDIQR frac mathrm RMSD IQR nbsp where I Q R Q 3 Q 1 displaystyle IQR Q 3 Q 1 nbsp with Q 1 CDF 1 0 25 displaystyle Q 1 text CDF 1 0 25 nbsp and Q 3 CDF 1 0 75 displaystyle Q 3 text CDF 1 0 75 nbsp where CDF 1 is the quantile function When normalizing by the mean value of the measurements the term coefficient of variation of the RMSD CV RMSD may be used to avoid ambiguity 5 This is analogous to the coefficient of variation with the RMSD taking the place of the standard deviation C V R M S D R M S D y displaystyle mathrm CV RMSD frac mathrm RMSD bar y nbsp Mean absolute error editSome researchers who have recommended where the use of the mean absolute error MAE instead of the root mean square deviation MAE possesses advantages in interpretability over RMSD MAE is the average of the absolute values of the errors MAE is fundamentally easier to understand than the square root of the average of squared errors Furthermore each error influences MAE in direct proportion to the absolute value of the error which is not the case for RMSD 2 Applications editIn meteorology to see how effectively a mathematical model predicts the behavior of the atmosphere In bioinformatics the root mean square deviation of atomic positions is the measure of the average distance between the atoms of superimposed proteins In structure based drug design the RMSD is a measure of the difference between a crystal conformation of the ligand conformation and a docking prediction In economics the RMSD is used to determine whether an economic model fits economic indicators Some experts have argued that RMSD is less reliable than Relative Absolute Error 6 In experimental psychology the RMSD is used to assess how well mathematical or computational models of behavior explain the empirically observed behavior In GIS the RMSD is one measure used to assess the accuracy of spatial analysis and remote sensing In hydrogeology RMSD and NRMSD are used to evaluate the calibration of a groundwater model 7 In imaging science the RMSD is part of the peak signal to noise ratio a measure used to assess how well a method to reconstruct an image performs relative to the original image In computational neuroscience the RMSD is used to assess how well a system learns a given model 8 In protein nuclear magnetic resonance spectroscopy the RMSD is used as a measure to estimate the quality of the obtained bundle of structures Submissions for the Netflix Prize were judged using the RMSD from the test dataset s undisclosed true values In the simulation of energy consumption of buildings the RMSE and CV RMSE are used to calibrate models to measured building performance 9 In X ray crystallography RMSD and RMSZ is used to measure the deviation of the molecular internal coordinates deviate from the restraints library values In control theory the RMSE is used as a quality measure to evaluate the performance of a state observer 10 In fluid dynamics normalized root mean square deviation NRMSD coefficient of variation CV and percent RMS are used to quantify the uniformity of flow behavior such as velocity profile temperature distribution or gas species concentration The value is compared to industry standards to optimize the design of flow and thermal equipment and processes See also editRoot mean square Mean absolute error Average absolute deviation Mean signed deviation Mean squared deviation Squared deviations Errors and residuals in statistics Coefficient of VariationReferences edit Hyndman Rob J Koehler Anne B 2006 Another look at measures of forecast accuracy International Journal of Forecasting 22 4 679 688 CiteSeerX 10 1 1 154 9771 doi 10 1016 j ijforecast 2006 03 001 S2CID 15947215 a b Pontius Robert Thontteh Olufunmilayo Chen Hao 2008 Components of information for multiple resolution comparison between maps that share a real variable PDF Environmental Ecological Statistics 15 2 111 142 Bibcode 2008EnvES 15 111P doi 10 1007 s10651 007 0043 y S2CID 21427573 Willmott Cort Matsuura Kenji 2006 On the use of dimensioned measures of error to evaluate the performance of spatial interpolators International Journal of Geographical Information Science 20 1 89 102 Bibcode 2006IJGIS 20 89W doi 10 1080 13658810500286976 S2CID 15407960 Coastal Inlets Research Program CIRP Wiki Statistics Retrieved 4 February 2015 FAQ What is the coefficient of variation Retrieved 19 February 2019 Armstrong J Scott Collopy Fred 1992 Error Measures For Generalizing About Forecasting Methods Empirical Comparisons PDF International Journal of Forecasting 8 1 69 80 CiteSeerX 10 1 1 423 508 doi 10 1016 0169 2070 92 90008 w S2CID 11034360 Anderson M P Woessner W W 1992 Applied Groundwater Modeling Simulation of Flow and Advective Transport 2nd ed Academic Press Ensemble Neural Network Model ANSI BPI 2400 S 2012 Standard Practice for Standardized Qualification of Whole House Energy Savings Predictions by Calibration to Energy Use History https kalman filter com root mean square error Retrieved from https en wikipedia org w index php title Root mean square deviation amp oldid 1222075287, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.