fbpx
Wikipedia

Missing data

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.

Missing data can occur because of nonresponse: no information is provided for one or more items or for a whole unit ("subject"). Some items are more likely to generate a nonresponse than others: for example items about private subjects such as income. Attrition is a type of missingness that can occur in longitudinal studies—for instance studying development where a measurement is repeated after a certain period of time. Missingness occurs when participants drop out before the test ends and one or more measurements are missing.

Data often are missing in research in economics, sociology, and political science because governments or private entities choose not to, or fail to, report critical statistics,[1] or because the information is not available. Sometimes missing values are caused by the researcher—for example, when data collection is done improperly or mistakes are made in data entry.[2]

These forms of missingness take different types, with different impacts on the validity of conclusions from research: Missing completely at random, missing at random, and missing not at random. Missing data can be handled similarly as censored data.

Types

Understanding the reasons why data are missing is important for handling the remaining data correctly. If values are missing completely at random, the data sample is likely still representative of the population. But if the values are missing systematically, analysis may be biased. For example, in a study of the relation between IQ and income, if participants with an above-average IQ tend to skip the question ‘What is your salary?’, analyses that do not take into account this missing at random (MAR pattern (see below)) may falsely fail to find a positive association between IQ and salary. Because of these problems, methodologists routinely advise researchers to design studies to minimize the occurrence of missing values.[2] Graphical models can be used to describe the missing data mechanism in detail.[3][4]

 
The graph shows the probability distributions of the estimations of the expected intensity of depression in the population. The number of cases is 60. Let the true population be a standardised normal distribution and the non-response probability be a logistic function of the intensity of depression. The conclusion is: The more data is missing (MNAR), the more biased are the estimations. We underestimate the intensity of depression in the population.

Missing completely at random

Values in a data set are missing completely at random (MCAR) if the events that lead to any particular data-item being missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random.[5] When data are MCAR, the analysis performed on the data is unbiased; however, data are rarely MCAR.

In the case of MCAR, the missingness of data is unrelated to any study variable: thus, the participants with completely observed data are in effect a random sample of all the participants assigned a particular intervention. With MCAR, the random assignment of treatments is assumed to be preserved, but that is usually an unrealistically strong assumption in practice.[6]

Missing at random

Missing at random (MAR) occurs when the missingness is not random, but where missingness can be fully accounted for by variables where there is complete information.[7] Since MAR is an assumption that is impossible to verify statistically, we must rely on its substantive reasonableness.[8] An example is that males are less likely to fill in a depression survey but this has nothing to do with their level of depression, after accounting for maleness. Depending on the analysis method, these data can still induce parameter bias in analyses due to the contingent emptiness of cells (male, very high depression may have zero entries). However, if the parameter is estimated with Full Information Maximum Likelihood, MAR will provide asymptotically unbiased estimates.[citation needed]

Missing not at random

Missing not at random (MNAR) (also known as nonignorable nonresponse) is data that is neither MAR nor MCAR (i.e. the value of the variable that's missing is related to the reason it's missing).[5] To extend the previous example, this would occur if men failed to fill in a depression survey because of their level of depression.

Samuelson and Spirer (1992) discussed how missing and/or distorted data about demographics, law enforcement, and health could be indicators of patterns of human rights violations. They gave several fairly well documented examples.[9]

Techniques of dealing with missing data

Missing data reduces the representativeness of the sample and can therefore distort inferences about the population. Generally speaking, there are three main approaches to handle missing data: (1) Imputation—where values are filled in the place of missing data, (2) omission—where samples with invalid data are discarded from further analysis and (3) analysis—by directly applying methods unaffected by the missing values. One systematic review addressing the prevention and handling of missing data for patient-centered outcomes research identified 10 standards as necessary for the prevention and handling of missing data. These include standards for study design, study conduct, analysis, and reporting.[10]

In some practical application, the experimenters can control the level of missingness, and prevent missing values before gathering the data. For example, in computer questionnaires, it is often not possible to skip a question. A question has to be answered, otherwise one cannot continue to the next. So missing values due to the participant are eliminated by this type of questionnaire, though this method may not be permitted by an ethics board overseeing the research. In survey research, it is common to make multiple efforts to contact each individual in the sample, often sending letters to attempt to persuade those who have decided not to participate to change their minds.[11]: 161–187  However, such techniques can either help or hurt in terms of reducing the negative inferential effects of missing data, because the kind of people who are willing to be persuaded to participate after initially refusing or not being home are likely to be significantly different from the kinds of people who will still refuse or remain unreachable after additional effort.[11]: 188–198 

In situations where missing values are likely to occur, the researcher is often advised on planning to use methods of data analysis methods that are robust to missingness. An analysis is robust when we are confident that mild to moderate violations of the technique's key assumptions will produce little or no bias, or distortion in the conclusions drawn about the population.

Imputation

Some data analysis techniques are not robust to missingness, and require to "fill in", or impute the missing data. Rubin (1987) argued that repeating imputation even a few times (5 or less) enormously improves the quality of estimation.[2] For many practical purposes, 2 or 3 imputations capture most of the relative efficiency that could be captured with a larger number of imputations. However, a too-small number of imputations can lead to a substantial loss of statistical power, and some scholars now recommend 20 to 100 or more.[12] Any multiply-imputed data analysis must be repeated for each of the imputed data sets and, in some cases, the relevant statistics must be combined in a relatively complicated way.[2] Multiple imputation is not conducted in specific disciplines, as there is a lack of training or misconceptions about them[13]. Methods such as listwise deletion have been used to impute data but it has been found to introduce additional bias[14] . There is a beginner guide that provides a step-by-step instruction how to impute data[15].  

The expectation-maximization algorithm is an approach in which values of the statistics which would be computed if a complete dataset were available are estimated (imputed), taking into account the pattern of missing data. In this approach, values for individual missing data-items are not usually imputed.

Interpolation

In the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points.

In the comparison of two paired samples with missing data, a test statistic that uses all available data without the need for imputation is the partially overlapping samples t-test.[16] This is valid under normality and assuming MCAR

Partial deletion

Methods which involve reducing the data available to a dataset having no missing values include:

Full analysis

Methods which take full account of all information available, without the distortion resulting from using imputed values as if they were actually observed:

Partial identification methods may also be used.[19]

Model-based techniques

Model based techniques, often using graphs, offer additional tools for testing missing data types (MCAR, MAR, MNAR) and for estimating parameters under missing data conditions. For example, a test for refuting MAR/MCAR reads as follows:

For any three variables X,Y, and Z where Z is fully observed and X and Y partially observed, the data should satisfy:  .

In words, the observed portion of X should be independent on the missingness status of Y, conditional on every value of Z. Failure to satisfy this condition indicates that the problem belongs to the MNAR category.[20]

(Remark: These tests are necessary for variable-based MAR which is a slight variation of event-based MAR.[21][22][23])

When data falls into MNAR category techniques are available for consistently estimating parameters when certain conditions hold in the model.[3] For example, if Y explains the reason for missingness in X and Y itself has missing values, the joint probability distribution of X and Y can still be estimated if the missingness of Y is random. The estimand in this case will be:

 

where   and   denote the observed portions of their respective variables.

Different model structures may yield different estimands and different procedures of estimation whenever consistent estimation is possible. The preceding estimand calls for first estimating   from complete data and multiplying it by   estimated from cases in which Y is observed regardless of the status of X. Moreover, in order to obtain a consistent estimate it is crucial that the first term be   as opposed to  .

In many cases model based techniques permit the model structure to undergo refutation tests.[23] Any model which implies the independence between a partially observed variable X and the missingness indicator of another variable Y (i.e.  ), conditional on   can be submitted to the following refutation test:  .

Finally, the estimands that emerge from these techniques are derived in closed form and do not require iterative procedures such as Expectation Maximization that are susceptible to local optima.[24]

A special class of problems appears when the probability of the missingness depends on time. For example, in the trauma databases the probability to lose data about the trauma outcome depends on the day after trauma. In these cases various non-stationary Markov chain models are applied. [25]

See also

References

  1. ^ Messner SF (1992). "Exploring the Consequences of Erratic Data Reporting for Cross-National Research on Homicide". Journal of Quantitative Criminology. 8 (2): 155–173. doi:10.1007/bf01066742. S2CID 133325281.
  2. ^ a b c d Hand, David J.; Adèr, Herman J.; Mellenbergh, Gideon J. (2008). Advising on Research Methods: A Consultant's Companion. Huizen, Netherlands: Johannes van Kessel. pp. 305–332. ISBN 978-90-79418-01-5.
  3. ^ a b Mohan, Karthika; Pearl, Judea; Tian, Jin (2013). "Graphical Models for Inference with Missing Data". Advances in Neural Information Processing Systems 26. pp. 1277–1285.
  4. ^ Karvanen, Juha (2015). "Study design in causal models". Scandinavian Journal of Statistics. 42 (2): 361–377. arXiv:1211.2958. doi:10.1111/sjos.12110. S2CID 53642701.
  5. ^ a b Polit DF Beck CT (2012). Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed. Philadelphia, USA: Wolters Klower Health, Lippincott Williams & Wilkins.
  6. ^ Deng (2012-10-05). . Archived from the original on 15 March 2016. Retrieved 13 May 2016.
  7. ^ "Home". from the original on 2015-09-10. Retrieved 2015-08-01.
  8. ^ Little, Roderick J. A.; Rubin, Donald B. (2002), Statistical Analysis with Missing Data (2nd ed.), Wiley.
  9. ^ Samuelson, Douglas A.; Spirer, Herbert F. (1992-12-31), "Chapter 3. Use of Incomplete and Distorted Data in Inference About Human Rights Violations", Human Rights and Statistics, University of Pennsylvania Press, pp. 62–78, doi:10.9783/9781512802863-006, ISBN 9781512802863, retrieved 2022-08-18
  10. ^ Li, Tianjing; Hutfless, Susan; Scharfstein, Daniel O.; Daniels, Michael J.; Hogan, Joseph W.; Little, Roderick J.A.; Roy, Jason A.; Law, Andrew H.; Dickersin, Kay (2014). "Standards should be applied in the prevention and handling of missing data for patient-centered outcomes research: a systematic review and expert consensus". Journal of Clinical Epidemiology. 67 (1): 15–32. doi:10.1016/j.jclinepi.2013.08.013. PMC 4631258. PMID 24262770.
  11. ^ a b Stoop, I.; Billiet, J.; Koch, A.; Fitzgerald, R. (2010). Reducing Survey Nonresponse: Lessons Learned from the European Social Survey. Oxford: Wiley-Blackwell. ISBN 978-0-470-51669-0.
  12. ^ Graham J.W.; Olchowski A.E.; Gilreath T.D. (2007). "How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory". Preventative Science. 8 (3): 208–213. CiteSeerX 10.1.1.595.7125. doi:10.1007/s11121-007-0070-9. PMID 17549635. S2CID 24566076.
  13. ^ van Ginkel, Joost R.; Linting, Marielle; Rippe, Ralph C. A.; van der Voort, Anja (2020-05-03). "Rebutting Existing Misconceptions About Multiple Imputation as a Method for Handling Missing Data". Journal of Personality Assessment. 102 (3): 297–308. doi:10.1080/00223891.2018.1530680. ISSN 0022-3891. PMID 30657714.
  14. ^ van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). CRC Press.
  15. ^ Woods, Adrienne D.; Gerasimova, Daria; Van Dusen, Ben; Nissen, Jayson; Bainter, Sierra; Uzdavines, Alex; Davis‐Kean, Pamela E.; Halvorson, Max; King, Kevin M.; Logan, Jessica A. R.; Xu, Menglin; Vasilev, Martin R.; Clay, James M.; Moreau, David; Joyal‐Desmarais, Keven (2023-02-23). "Best practices for addressing missing data through multiple imputation". Infant and Child Development. doi:10.1002/icd.2407. ISSN 1522-7227.
  16. ^ Derrick, B; Russ, B; Toher, D; White, P (2017). "Test Statistics for the Comparison of Means for Two Samples That Include Both Paired and Independent Observations". Journal of Modern Applied Statistical Methods. 16 (1): 137–157. doi:10.22237/jmasm/1493597280.
  17. ^ Chechik, Gal; Heitz, Geremy; Elidan, Gal; Abbeel, Pieter; Koller, Daphne (2008-06-01). "Max-margin Classification of incomplete data" (PDF). Neural Information Processing Systems: 233–240.
  18. ^ Chechik, Gal; Heitz, Geremy; Elidan, Gal; Abbeel, Pieter; Koller, Daphne (2008-06-01). "Max-margin Classification of Data with Absent Features". The Journal of Machine Learning Research. 9: 1–21. ISSN 1532-4435.
  19. ^ Tamer, Elie (2010). "Partial Identification in Econometrics". Annual Review of Economics. 2 (1): 167–195. doi:10.1146/annurev.economics.050708.143401.
  20. ^ Mohan, Karthika; Pearl, Judea (2014). "On the testability of models with missing data". Proceedings of AISTAT-2014, Forthcoming.
  21. ^ Darwiche, Adnan (2009). Modeling and Reasoning with Bayesian Networks. Cambridge University Press.
  22. ^ Potthoff, R.F.; Tudor, G.E.; Pieper, K.S.; Hasselblad, V. (2006). "Can one assess whether missing data are missing at random in medical studies?". Statistical Methods in Medical Research. 15 (3): 213–234. doi:10.1191/0962280206sm448oa. PMID 16768297. S2CID 12882831.
  23. ^ a b Pearl, Judea; Mohan, Karthika (2013). Recoverability and Testability of Missing data: Introduction and Summary of Results (PDF) (Technical report). UCLA Computer Science Department, R-417.
  24. ^ Mohan, K.; Van den Broeck, G.; Choi, A.; Pearl, J. (2014). "An Efficient Method for Bayesian Network Parameter Learning from Incomplete Data". Presented at Causal Modeling and Machine Learning Workshop, ICML-2014.
  25. ^ Mirkes, E.M.; Coats, T.J.; Levesley, J.; Gorban, A.N. (2016). "Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes". Computers in Biology and Medicine. 75: 203–216. arXiv:1604.00627. Bibcode:2016arXiv160400627M. doi:10.1016/j.compbiomed.2016.06.004. PMID 27318570. S2CID 5874067. from the original on 2016-08-05.

Further reading

  • Acock AC (2005), "Working with missing values", Journal of Marriage and Family, 67 (4): 1012–28, doi:10.1111/j.1741-3737.2005.00191.x, archived from the original on 2013-01-05
  • Allison, Paul D. (2001), Missing Data, SAGE Publishing
  • Bouza-Herrera, Carlos N. (2013), Handling Missing Data in Ranked Set Sampling, Springer
  • Enders, Craig K. (2010), Applied Missing Data Analysis, Guilford Press
  • Graham, John W. (2012), Missing Data, Springer
  • Molenberghs, Geert; Fitzmaurice, Garrett; Kenward, Michael G.; Tsiatis, Anastasios; Verbeke, Geert, eds. (2015), Handbook of Missing Data Methodology, Chapman & Hall
  • Raghunathan, Trivellore (2016), Missing Data Analysis in Practice, Chapman & Hall
  • Little, Roderick J. A.; Rubin, Donald B. (2002), Statistical Analysis with Missing Data (2nd ed.), Wiley
  • Tsiatis, Anastasios A. (2006), Semiparametric Theory and Missing Data, Springer
  • Van den Broeck J, Cunningham SA, Eeckels R, Herbst K (2005), "Data cleaning: detecting, diagnosing, and editing data abnormalities", PLOS Medicine, 2 (10): e267, doi:10.1371/journal.pmed.0020267, PMC 1198040, PMID 16138788, S2CID 5667073
  • Zarate LE, Nogueira BM, Santos TR, Song MA (2006). "Techniques for Missing Value Recovering in Imbalanced Databases: Application in a marketing database with massive missing data". IEEE International Conference on Systems, Man and Cybernetics, 2006. SMC '06. Vol. 3. pp. 2658–2664. doi:10.1109/ICSMC.2006.385265.

External links

Background

  • Missing Data, Department of Medical Statistics, London School of Hygiene & Tropical Medicine
  • Spatial and temporal Trend Analysis of Long Term rainfall records in data-poor catchments with missing data, a case study of Lower Shire floodplain in Malawi for the period 1953–2010.
  • R-miss-tastic, A unified platform for missing values methods and workflows.
  • Missing values-envision

Software

  • Mplus
  • PROC MI and PROC MIANALYZE - SAS
  • SPSS

missing, data, statistics, missing, data, missing, values, occur, when, data, value, stored, variable, observation, common, occurrence, have, significant, effect, conclusions, that, drawn, from, data, occur, because, nonresponse, information, provided, more, i. In statistics missing data or missing values occur when no data value is stored for the variable in an observation Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data Missing data can occur because of nonresponse no information is provided for one or more items or for a whole unit subject Some items are more likely to generate a nonresponse than others for example items about private subjects such as income Attrition is a type of missingness that can occur in longitudinal studies for instance studying development where a measurement is repeated after a certain period of time Missingness occurs when participants drop out before the test ends and one or more measurements are missing Data often are missing in research in economics sociology and political science because governments or private entities choose not to or fail to report critical statistics 1 or because the information is not available Sometimes missing values are caused by the researcher for example when data collection is done improperly or mistakes are made in data entry 2 These forms of missingness take different types with different impacts on the validity of conclusions from research Missing completely at random missing at random and missing not at random Missing data can be handled similarly as censored data Contents 1 Types 1 1 Missing completely at random 1 2 Missing at random 1 3 Missing not at random 2 Techniques of dealing with missing data 2 1 Imputation 2 1 1 Interpolation 2 2 Partial deletion 2 3 Full analysis 3 Model based techniques 4 See also 5 References 6 Further reading 7 External links 7 1 Background 7 2 SoftwareTypes EditUnderstanding the reasons why data are missing is important for handling the remaining data correctly If values are missing completely at random the data sample is likely still representative of the population But if the values are missing systematically analysis may be biased For example in a study of the relation between IQ and income if participants with an above average IQ tend to skip the question What is your salary analyses that do not take into account this missing at random MAR pattern see below may falsely fail to find a positive association between IQ and salary Because of these problems methodologists routinely advise researchers to design studies to minimize the occurrence of missing values 2 Graphical models can be used to describe the missing data mechanism in detail 3 4 The graph shows the probability distributions of the estimations of the expected intensity of depression in the population The number of cases is 60 Let the true population be a standardised normal distribution and the non response probability be a logistic function of the intensity of depression The conclusion is The more data is missing MNAR the more biased are the estimations We underestimate the intensity of depression in the population Missing completely at random Edit Values in a data set are missing completely at random MCAR if the events that lead to any particular data item being missing are independent both of observable variables and of unobservable parameters of interest and occur entirely at random 5 When data are MCAR the analysis performed on the data is unbiased however data are rarely MCAR In the case of MCAR the missingness of data is unrelated to any study variable thus the participants with completely observed data are in effect a random sample of all the participants assigned a particular intervention With MCAR the random assignment of treatments is assumed to be preserved but that is usually an unrealistically strong assumption in practice 6 Missing at random Edit Missing at random MAR occurs when the missingness is not random but where missingness can be fully accounted for by variables where there is complete information 7 Since MAR is an assumption that is impossible to verify statistically we must rely on its substantive reasonableness 8 An example is that males are less likely to fill in a depression survey but this has nothing to do with their level of depression after accounting for maleness Depending on the analysis method these data can still induce parameter bias in analyses due to the contingent emptiness of cells male very high depression may have zero entries However if the parameter is estimated with Full Information Maximum Likelihood MAR will provide asymptotically unbiased estimates citation needed Missing not at random Edit Missing not at random MNAR also known as nonignorable nonresponse is data that is neither MAR nor MCAR i e the value of the variable that s missing is related to the reason it s missing 5 To extend the previous example this would occur if men failed to fill in a depression survey because of their level of depression Samuelson and Spirer 1992 discussed how missing and or distorted data about demographics law enforcement and health could be indicators of patterns of human rights violations They gave several fairly well documented examples 9 Techniques of dealing with missing data EditMissing data reduces the representativeness of the sample and can therefore distort inferences about the population Generally speaking there are three main approaches to handle missing data 1 Imputation where values are filled in the place of missing data 2 omission where samples with invalid data are discarded from further analysis and 3 analysis by directly applying methods unaffected by the missing values One systematic review addressing the prevention and handling of missing data for patient centered outcomes research identified 10 standards as necessary for the prevention and handling of missing data These include standards for study design study conduct analysis and reporting 10 In some practical application the experimenters can control the level of missingness and prevent missing values before gathering the data For example in computer questionnaires it is often not possible to skip a question A question has to be answered otherwise one cannot continue to the next So missing values due to the participant are eliminated by this type of questionnaire though this method may not be permitted by an ethics board overseeing the research In survey research it is common to make multiple efforts to contact each individual in the sample often sending letters to attempt to persuade those who have decided not to participate to change their minds 11 161 187 However such techniques can either help or hurt in terms of reducing the negative inferential effects of missing data because the kind of people who are willing to be persuaded to participate after initially refusing or not being home are likely to be significantly different from the kinds of people who will still refuse or remain unreachable after additional effort 11 188 198 In situations where missing values are likely to occur the researcher is often advised on planning to use methods of data analysis methods that are robust to missingness An analysis is robust when we are confident that mild to moderate violations of the technique s key assumptions will produce little or no bias or distortion in the conclusions drawn about the population Imputation Edit Main article Imputation statistics Some data analysis techniques are not robust to missingness and require to fill in or impute the missing data Rubin 1987 argued that repeating imputation even a few times 5 or less enormously improves the quality of estimation 2 For many practical purposes 2 or 3 imputations capture most of the relative efficiency that could be captured with a larger number of imputations However a too small number of imputations can lead to a substantial loss of statistical power and some scholars now recommend 20 to 100 or more 12 Any multiply imputed data analysis must be repeated for each of the imputed data sets and in some cases the relevant statistics must be combined in a relatively complicated way 2 Multiple imputation is not conducted in specific disciplines as there is a lack of training or misconceptions about them 13 Methods such as listwise deletion have been used to impute data but it has been found to introduce additional bias 14 There is a beginner guide that provides a step by step instruction how to impute data 15 The expectation maximization algorithm is an approach in which values of the statistics which would be computed if a complete dataset were available are estimated imputed taking into account the pattern of missing data In this approach values for individual missing data items are not usually imputed Interpolation Edit Main article Interpolation In the mathematical field of numerical analysis interpolation is a method of constructing new data points within the range of a discrete set of known data points In the comparison of two paired samples with missing data a test statistic that uses all available data without the need for imputation is the partially overlapping samples t test 16 This is valid under normality and assuming MCAR Partial deletion Edit Methods which involve reducing the data available to a dataset having no missing values include Listwise deletion casewise deletion Pairwise deletionFull analysis Edit Methods which take full account of all information available without the distortion resulting from using imputed values as if they were actually observed Generative approaches The expectation maximization algorithm full information maximum likelihood estimation Discriminative approaches Max margin classification of data with absent features 17 18 Partial identification methods may also be used 19 Model based techniques EditModel based techniques often using graphs offer additional tools for testing missing data types MCAR MAR MNAR and for estimating parameters under missing data conditions For example a test for refuting MAR MCAR reads as follows For any three variables X Y and Z where Z is fully observed and X and Y partially observed the data should satisfy X R y R x Z displaystyle X perp perp R y R x Z In words the observed portion of X should be independent on the missingness status of Y conditional on every value of Z Failure to satisfy this condition indicates that the problem belongs to the MNAR category 20 Remark These tests are necessary for variable based MAR which is a slight variation of event based MAR 21 22 23 When data falls into MNAR category techniques are available for consistently estimating parameters when certain conditions hold in the model 3 For example if Y explains the reason for missingness in X and Y itself has missing values the joint probability distribution of X and Y can still be estimated if the missingness of Y is random The estimand in this case will be P X Y P X Y P Y P X Y R x 0 R y 0 P Y R y 0 displaystyle begin aligned P X Y amp P X Y P Y amp P X Y R x 0 R y 0 P Y R y 0 end aligned where R x 0 displaystyle R x 0 and R y 0 displaystyle R y 0 denote the observed portions of their respective variables Different model structures may yield different estimands and different procedures of estimation whenever consistent estimation is possible The preceding estimand calls for first estimating P X Y displaystyle P X Y from complete data and multiplying it by P Y displaystyle P Y estimated from cases in which Y is observed regardless of the status of X Moreover in order to obtain a consistent estimate it is crucial that the first term be P X Y displaystyle P X Y as opposed to P Y X displaystyle P Y X In many cases model based techniques permit the model structure to undergo refutation tests 23 Any model which implies the independence between a partially observed variable X and the missingness indicator of another variable Y i e R y displaystyle R y conditional on R x displaystyle R x can be submitted to the following refutation test X R y R x 0 displaystyle X perp perp R y R x 0 Finally the estimands that emerge from these techniques are derived in closed form and do not require iterative procedures such as Expectation Maximization that are susceptible to local optima 24 A special class of problems appears when the probability of the missingness depends on time For example in the trauma databases the probability to lose data about the trauma outcome depends on the day after trauma In these cases various non stationary Markov chain models are applied 25 See also EditCensoring Expectation maximization algorithm Imputation Indicator variable Inverse probability weighting Latent variable Matrix completionReferences Edit Messner SF 1992 Exploring the Consequences of Erratic Data Reporting for Cross National Research on Homicide Journal of Quantitative Criminology 8 2 155 173 doi 10 1007 bf01066742 S2CID 133325281 a b c d Hand David J Ader Herman J Mellenbergh Gideon J 2008 Advising on Research Methods A Consultant s Companion Huizen Netherlands Johannes van Kessel pp 305 332 ISBN 978 90 79418 01 5 a b Mohan Karthika Pearl Judea Tian Jin 2013 Graphical Models for Inference with Missing Data Advances in Neural Information Processing Systems 26 pp 1277 1285 Karvanen Juha 2015 Study design in causal models Scandinavian Journal of Statistics 42 2 361 377 arXiv 1211 2958 doi 10 1111 sjos 12110 S2CID 53642701 a b Polit DF Beck CT 2012 Nursing Research Generating and Assessing Evidence for Nursing Practice 9th ed Philadelphia USA Wolters Klower Health Lippincott Williams amp Wilkins Deng 2012 10 05 On Biostatistics and Clinical Trials Archived from the original on 15 March 2016 Retrieved 13 May 2016 Home Archived from the original on 2015 09 10 Retrieved 2015 08 01 Little Roderick J A Rubin Donald B 2002 Statistical Analysis with Missing Data 2nd ed Wiley Samuelson Douglas A Spirer Herbert F 1992 12 31 Chapter 3 Use of Incomplete and Distorted Data in Inference About Human Rights Violations Human Rights and Statistics University of Pennsylvania Press pp 62 78 doi 10 9783 9781512802863 006 ISBN 9781512802863 retrieved 2022 08 18 Li Tianjing Hutfless Susan Scharfstein Daniel O Daniels Michael J Hogan Joseph W Little Roderick J A Roy Jason A Law Andrew H Dickersin Kay 2014 Standards should be applied in the prevention and handling of missing data for patient centered outcomes research a systematic review and expert consensus Journal of Clinical Epidemiology 67 1 15 32 doi 10 1016 j jclinepi 2013 08 013 PMC 4631258 PMID 24262770 a b Stoop I Billiet J Koch A Fitzgerald R 2010 Reducing Survey Nonresponse Lessons Learned from the European Social Survey Oxford Wiley Blackwell ISBN 978 0 470 51669 0 Graham J W Olchowski A E Gilreath T D 2007 How Many Imputations Are Really Needed Some Practical Clarifications of Multiple Imputation Theory Preventative Science 8 3 208 213 CiteSeerX 10 1 1 595 7125 doi 10 1007 s11121 007 0070 9 PMID 17549635 S2CID 24566076 van Ginkel Joost R Linting Marielle Rippe Ralph C A van der Voort Anja 2020 05 03 Rebutting Existing Misconceptions About Multiple Imputation as a Method for Handling Missing Data Journal of Personality Assessment 102 3 297 308 doi 10 1080 00223891 2018 1530680 ISSN 0022 3891 PMID 30657714 van Buuren S 2018 Flexible imputation of missing data 2nd ed CRC Press Woods Adrienne D Gerasimova Daria Van Dusen Ben Nissen Jayson Bainter Sierra Uzdavines Alex Davis Kean Pamela E Halvorson Max King Kevin M Logan Jessica A R Xu Menglin Vasilev Martin R Clay James M Moreau David Joyal Desmarais Keven 2023 02 23 Best practices for addressing missing data through multiple imputation Infant and Child Development doi 10 1002 icd 2407 ISSN 1522 7227 Derrick B Russ B Toher D White P 2017 Test Statistics for the Comparison of Means for Two Samples That Include Both Paired and Independent Observations Journal of Modern Applied Statistical Methods 16 1 137 157 doi 10 22237 jmasm 1493597280 Chechik Gal Heitz Geremy Elidan Gal Abbeel Pieter Koller Daphne 2008 06 01 Max margin Classification of incomplete data PDF Neural Information Processing Systems 233 240 Chechik Gal Heitz Geremy Elidan Gal Abbeel Pieter Koller Daphne 2008 06 01 Max margin Classification of Data with Absent Features The Journal of Machine Learning Research 9 1 21 ISSN 1532 4435 Tamer Elie 2010 Partial Identification in Econometrics Annual Review of Economics 2 1 167 195 doi 10 1146 annurev economics 050708 143401 Mohan Karthika Pearl Judea 2014 On the testability of models with missing data Proceedings of AISTAT 2014 Forthcoming Darwiche Adnan 2009 Modeling and Reasoning with Bayesian Networks Cambridge University Press Potthoff R F Tudor G E Pieper K S Hasselblad V 2006 Can one assess whether missing data are missing at random in medical studies Statistical Methods in Medical Research 15 3 213 234 doi 10 1191 0962280206sm448oa PMID 16768297 S2CID 12882831 a b Pearl Judea Mohan Karthika 2013 Recoverability and Testability of Missing data Introduction and Summary of Results PDF Technical report UCLA Computer Science Department R 417 Mohan K Van den Broeck G Choi A Pearl J 2014 An Efficient Method for Bayesian Network Parameter Learning from Incomplete Data Presented at Causal Modeling and Machine Learning Workshop ICML 2014 Mirkes E M Coats T J Levesley J Gorban A N 2016 Handling missing data in large healthcare dataset A case study of unknown trauma outcomes Computers in Biology and Medicine 75 203 216 arXiv 1604 00627 Bibcode 2016arXiv160400627M doi 10 1016 j compbiomed 2016 06 004 PMID 27318570 S2CID 5874067 Archived from the original on 2016 08 05 Further reading EditAcock AC 2005 Working with missing values Journal of Marriage and Family 67 4 1012 28 doi 10 1111 j 1741 3737 2005 00191 x archived from the original on 2013 01 05 Allison Paul D 2001 Missing Data SAGE Publishing Bouza Herrera Carlos N 2013 Handling Missing Data in Ranked Set Sampling Springer Enders Craig K 2010 Applied Missing Data Analysis Guilford Press Graham John W 2012 Missing Data Springer Molenberghs Geert Fitzmaurice Garrett Kenward Michael G Tsiatis Anastasios Verbeke Geert eds 2015 Handbook of Missing Data Methodology Chapman amp Hall Raghunathan Trivellore 2016 Missing Data Analysis in Practice Chapman amp Hall Little Roderick J A Rubin Donald B 2002 Statistical Analysis with Missing Data 2nd ed Wiley Tsiatis Anastasios A 2006 Semiparametric Theory and Missing Data Springer Van den Broeck J Cunningham SA Eeckels R Herbst K 2005 Data cleaning detecting diagnosing and editing data abnormalities PLOS Medicine 2 10 e267 doi 10 1371 journal pmed 0020267 PMC 1198040 PMID 16138788 S2CID 5667073 Zarate LE Nogueira BM Santos TR Song MA 2006 Techniques for Missing Value Recovering in Imbalanced Databases Application in a marketing database with massive missing data IEEE International Conference on Systems Man and Cybernetics 2006 SMC 06 Vol 3 pp 2658 2664 doi 10 1109 ICSMC 2006 385265 External links EditBackground Edit Missing Data Department of Medical Statistics London School of Hygiene amp Tropical Medicine Spatial and temporal Trend Analysis of Long Term rainfall records in data poor catchments with missing data a case study of Lower Shire floodplain in Malawi for the period 1953 2010 R miss tastic A unified platform for missing values methods and workflows Missing values envisionSoftware Edit Mplus PROC MI and PROC MIANALYZE SAS SPSS Retrieved from https en wikipedia org w index php title Missing data amp oldid 1142134154, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.