Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases
- PMID: 29862536
- DOI: 10.1002/sim.7820
Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases
Abstract
There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies.
Keywords: health care databases; heterogeneous treatment effects; machine learning; propensity score; simulation.
Copyright © 2018 John Wiley & Sons, Ltd.
Similar articles
-
Some methods for heterogeneous treatment effect estimation in high dimensions.Stat Med. 2018 May 20;37(11):1767-1787. doi: 10.1002/sim.7623. Epub 2018 Mar 6. Stat Med. 2018. PMID: 29508417 Free PMC article.
-
Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.Am J Epidemiol. 2017 Jan 1;185(1):65-73. doi: 10.1093/aje/kww165. Epub 2016 Dec 9. Am J Epidemiol. 2017. PMID: 27941068
-
High-dimensional propensity score algorithm in comparative effectiveness research with time-varying interventions.Stat Med. 2015 Feb 28;34(5):753-81. doi: 10.1002/sim.6377. Epub 2014 Dec 8. Stat Med. 2015. PMID: 25488047
-
Methods for Variable Selection and Treatment Effect Estimation in Nonrandomized Studies with Few Outcome Events and Many Confounders [Internet].Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2018 Mar. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2018 Mar. PMID: 37184181 Free Books & Documents. Review.
-
Combining propensity score-based stratification and weighting to improve causal inference in the evaluation of health care interventions.J Eval Clin Pract. 2014 Dec;20(6):1065-71. doi: 10.1111/jep.12254. Epub 2014 Sep 29. J Eval Clin Pract. 2014. PMID: 25266868 Review.
Cited by
-
Replicator degrees of freedom allow publication of misleading failures to replicate.Proc Natl Acad Sci U S A. 2019 Dec 17;116(51):25535-25545. doi: 10.1073/pnas.1910951116. Epub 2019 Nov 25. Proc Natl Acad Sci U S A. 2019. PMID: 31767750 Free PMC article.
-
Human-centered Design of a Health Recommender System for Orthopaedic Shoulder Treatment.Res Sq [Preprint]. 2024 May 21:rs.3.rs-4359437. doi: 10.21203/rs.3.rs-4359437/v1. Res Sq. 2024. Update in: BMC Med Inform Decis Mak. 2025 Jan 10;25(1):17. doi: 10.1186/s12911-025-02850-x. PMID: 38826294 Free PMC article. Updated. Preprint.
-
Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity.BMC Med Res Methodol. 2024 Mar 13;24(1):66. doi: 10.1186/s12874-024-02187-5. BMC Med Res Methodol. 2024. PMID: 38481139 Free PMC article.
-
Estimation and Validation of Ratio-based Conditional Average Treatment Effects Using Observational Data.J Am Stat Assoc. 2021;116(533):335-352. doi: 10.1080/01621459.2020.1772080. Epub 2020 Jul 7. J Am Stat Assoc. 2021. PMID: 33767517 Free PMC article.
-
Evaluation of the health impacts of the 1990 Clean Air Act Amendments using causal inference and machine learning.J Am Stat Assoc. 2020 Sep 16;1:1-12. doi: 10.1080/01621459.2020.1803883. J Am Stat Assoc. 2020. PMID: 33424062 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Other Literature Sources