Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;110(510):583-598.
doi: 10.1080/01621459.2014.937488.

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Affiliations

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Ying-Qi Zhao et al. J Am Stat Assoc. 2015.

Abstract

Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation.

Keywords: Classification; Dynamic treatment regimes; Personalized medicine; Q-learning; Reinforcement learning; Risk Bound; Support vector machine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Left panel: the nonsmooth indicator function 1(Z1 > 0, Z2 > 0); Right panel: the smooth concave surrogate min(Z1 − 1, Z2 − 1, 0) + 1.
Figure 2
Figure 2
Smoothed Histograms of Values of Estimated DTRs for Scenario 1. The optimal value is V* = 6.695.
Figure 3
Figure 3
Smoothed Histograms of Values of Estimated DTRs for Scenario 2. The optimal value is V* = 3.667.
Figure 4
Figure 4
Smoothed Histograms of Values of Estimated DTRs for Scenario 3. The optimal value is V* = 20.
Figure 5
Figure 5
Selected Percentages of Two-Stage Treatments using Different Methods Note: The estimated values using different methods are: 0.835 by Q-learning (QL), 0.863 by L2Q-learning (L2QL), 0.933 by A-learning (AL), 1.096 by BOWL, 1.019 by IOWL and 0.999 by SOWL. Stage 1 treatment denoted by 1 or −1 represents a highly tailored story or the opposite. Stage 2 treatment denoted by 1 or −1 indicates a treatment or not.

Similar articles

Cited by

References

    1. Bartlett PL, Jordan MI, McAuliffe JD. Convexity, Classification, and Risk Bounds. JASA. 2006;101(473):138–156.
    1. Bellman R. Dynamic Programming. Princeton: Princeton Univeristy Press; 1957.
    1. Blanchard G, Bousquet O, Massart P. Statistical Performance of Support Vector Machines. The Annals of Statistics. 2008;36:489–531.
    1. Blatt D, Murphy SA, Zhu J. A-learning for approximate planning. 2004. Unpublished Manuscript.
    1. Bradley PS, Mangasarian OL. Feature Selection via Concave Minimization and Support Vector Machines. Proc. 15th International Conf. on Machine Learning; San Francisco, CA, USA: Morgan Kaufmann Publishers Inc; 1998.

LinkOut - more resources