Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

doi:10.1186/1471-2288-10-7

Comparative Study

. 2010 Jan 19:10:7.

doi: 10.1186/1471-2288-10-7.

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Andrea Marshall¹, Douglas G Altman, Patrick Royston, Roger L Holder

Affiliations

PMID: 20085642
PMCID: PMC2824146
DOI: 10.1186/1471-2288-10-7

Comparative Study

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Andrea Marshall et al. BMC Med Res Methodol. 2010.

. 2010 Jan 19:10:7.

doi: 10.1186/1471-2288-10-7.

Authors

Andrea Marshall¹, Douglas G Altman, Patrick Royston, Roger L Holder

Affiliation

¹ Centre for Statistics in Medicine, University of Oxford, Oxford, UK. andrea.marshall@warwick.ac.uk

PMID: 20085642
PMCID: PMC2824146
DOI: 10.1186/1471-2288-10-7

Abstract

Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.

Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.

Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.

Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR.

PubMed Disclaimer

Figures

**Figure 1**
**a: Distribution of the covariates for the German breast cancer dataset; b: Distribution of the transformed continuous covariates in the German breast cancer dataset**.

**Figure 2**
**Regression coefficient estimates for different missing data methods for increasing percentage of MAR missingness**.

**Figure 3**
Average standard error (SE) estimates for different missing data methods for increasing percentage of MAR missingness

**Figure 4**
**Coverage of the regression coefficient estimates for different missing data methods for increasing percentage of MAR missingness**.

**Figure 5**
**Significance of the covariates in the prognostic model for different missing data methods and increasing percentage of MAR missingness**.

**Figure 6**
**Model performance measures for different missing data methods for increasing percentage of MAR missingness**. a) Likelihood ratio test, b) Nagelkerke R2 statistic, c) Prognostic separation D statistic and d) Predicted 2-year survival from Cox model.

**Figure 7**
**Comparison of the regression coefficient estimates for the different MI methods after imposing MAR and MNAR mechanisms**.

**Figure 8**
**Comparison of coverage estimates for the different MI methods after imposing MAR and MNAR mechanisms**.

See this image and copyright information in PMC

Cited by

External Validation of a Referral Rule for Axial Spondyloarthritis in Primary Care Patients with Chronic Low Back Pain.
van Hoeven L, Vergouwe Y, de Buck PD, Luime JJ, Hazes JM, Weel AE. van Hoeven L, et al. PLoS One. 2015 Jul 22;10(7):e0131963. doi: 10.1371/journal.pone.0131963. eCollection 2015. PLoS One. 2015. PMID: 26200904 Free PMC article.
Seven-day mortality can be predicted in medical patients by blood pressure, age, respiratory rate, loss of independence, and peripheral oxygen saturation (the PARIS score): a prospective cohort study with external validation.
Brabrand M, Lassen AT, Knudsen T, Hallas J. Brabrand M, et al. PLoS One. 2015 Apr 13;10(4):e0122480. doi: 10.1371/journal.pone.0122480. eCollection 2015. PLoS One. 2015. PMID: 25867881 Free PMC article.
Survival analysis of gastric cancer patients with incomplete data.
Moghimbeigi A, Tapak L, Roshanaei G, Mahjub H. Moghimbeigi A, et al. J Gastric Cancer. 2014 Dec;14(4):259-65. doi: 10.5230/jgc.2014.14.4.259. Epub 2014 Dec 26. J Gastric Cancer. 2014. PMID: 25580358 Free PMC article.
Utility of neuron-specific enolase in traumatic brain injury; relations to S100B levels, outcome, and extracranial injury severity.
Thelin EP, Jeppsson E, Frostell A, Svensson M, Mondello S, Bellander BM, Nelson DW. Thelin EP, et al. Crit Care. 2016 Sep 8;20:285. doi: 10.1186/s13054-016-1450-y. Crit Care. 2016. PMID: 27604350 Free PMC article.
Recovery of information from multiple imputation: a simulation study.
Lee KJ, Carlin JB. Lee KJ, et al. Emerg Themes Epidemiol. 2012 Jun 13;9(1):3. doi: 10.1186/1742-7622-9-3. Emerg Themes Epidemiol. 2012. PMID: 22695083 Free PMC article.

See all "Cited by" articles

References

1. Burton A, Altman DG. Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004;91(1):4–8. doi: 10.1038/sj.bjc.6601907. - DOI - PMC - PubMed
1. Vach W, Blettner M, Armitage P, Colton T. Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. Missing data in epidemiologic studies; pp. 2641–2654.
1. Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA. Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003;22(4):545–557. doi: 10.1002/sim.1340. - DOI - PubMed
1. Lipsitz SR, Ibrahim JG. Using the EM-algorithm for survival data with incomplete categorical covariates. Lifetime Data Analysis. 1996;2(1):5–14. doi: 10.1007/BF00128467. - DOI - PubMed
1. Lipsitz SR, Ibrahim JG. Estimating equations with incomplete categorical covariates in the Cox model. Biometrics. 1998;54(3):1002–1013. doi: 10.2307/2533852. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect
Miscellaneous
- NCI CPTAC Assay Portal

[1] Burton A, Altman DG. Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004;91(1):4–8. doi: 10.1038/sj.bjc.6601907. - DOI - PMC - PubMed

[2] Burton A, Altman DG. Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. British Journal of Cancer. 2004;91(1):4–8. doi: 10.1038/sj.bjc.6601907. - DOI - PMC - PubMed

[3] Vach W, Blettner M, Armitage P, Colton T. Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. Missing data in epidemiologic studies; pp. 2641–2654.

[4] Vach W, Blettner M, Armitage P, Colton T. Encyclopedia of Biostatistics. New York: John Wiley & Sons; 1998. Missing data in epidemiologic studies; pp. 2641–2654.

[5] Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA. Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003;22(4):545–557. doi: 10.1002/sim.1340. - DOI - PubMed

[6] Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA. Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Statistics in Medicine. 2003;22(4):545–557. doi: 10.1002/sim.1340. - DOI - PubMed

[7] Lipsitz SR, Ibrahim JG. Using the EM-algorithm for survival data with incomplete categorical covariates. Lifetime Data Analysis. 1996;2(1):5–14. doi: 10.1007/BF00128467. - DOI - PubMed

[8] Lipsitz SR, Ibrahim JG. Using the EM-algorithm for survival data with incomplete categorical covariates. Lifetime Data Analysis. 1996;2(1):5–14. doi: 10.1007/BF00128467. - DOI - PubMed

[9] Lipsitz SR, Ibrahim JG. Estimating equations with incomplete categorical covariates in the Cox model. Biometrics. 1998;54(3):1002–1013. doi: 10.2307/2533852. - DOI - PubMed

[10] Lipsitz SR, Ibrahim JG. Estimating equations with incomplete categorical covariates in the Cox model. Biometrics. 1998;54(3):1002–1013. doi: 10.2307/2533852. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Affiliation

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous