Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 10;21(1):156.
doi: 10.1186/s13063-020-4076-y.

Machine learning analysis plans for randomised controlled trials: detecting treatment effect heterogeneity with strict control of type I error

Affiliations

Machine learning analysis plans for randomised controlled trials: detecting treatment effect heterogeneity with strict control of type I error

James A Watson et al. Trials. .

Abstract

Background: Retrospective exploratory analyses of randomised controlled trials (RCTs) seeking to identify treatment effect heterogeneity (TEH) are prone to bias and false positives. Yet the desire to learn all we can from exhaustive data measurements on trial participants motivates the inclusion of such analyses within RCTs. Moreover, widespread advances in machine learning (ML) methods hold potential to utilise such data to identify subjects exhibiting heterogeneous treatment response.

Methods: We present a novel analysis strategy for detecting TEH in randomised data using ML methods, whilst ensuring proper control of the false positive discovery rate. Our approach uses random data partitioning with statistical or ML-based prediction on held-out data. This method can test for both crossover TEH (switch in optimal treatment) and non-crossover TEH (systematic variation in benefit across patients). The former is done via a two-sample hypothesis test measuring overall predictive performance. The latter is done via 'stacking' the ML predictors alongside a classical statistical model to formally test the added benefit of the ML algorithm. An adaptation of recent statistical theory allows for the construction of a valid aggregate p value. This testing strategy is independent of the choice of ML method.

Results: We demonstrate our approach with a re-analysis of the SEAQUAMAT trial, which compared quinine to artesunate for the treatment of severe malaria in Asian adults. We find no evidence for any subgroup who would benefit from a change in treatment from the current standard of care, artesunate, but strong evidence for significant TEH within the artesunate treatment group. In particular, we find that artesunate provides a differential benefit to patients with high numbers of circulating ring stage parasites.

Conclusions: ML analysis plans using computational notebooks (documents linked to a programming language that capture the model parameter settings, data processing choices, and evaluation criteria) along with version control can improve the robustness and transparency of RCT exploratory analyses. A data-partitioning algorithm allows researchers to apply the latest ML techniques safe in the knowledge that any declared associations are statistically significant at a user-defined level.

Keywords: Heterogeneous treatment effects; Machine learning; Randomised trials; Subgroup statistical analysis plan.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Illustrative example of hypothesis testing in exploratory subgroup discovery using 1000 iterations of twofold cross-prediction. The example considers a primary RCT with two arms where a null hypothesis of ‘no improvement from the experimental treatment’ is not rejected; i.e. there is no significant evidence of the experimental treatment providing improvement over the standard of care. Each random division results in a corresponding p value against the null hypothesis of no benefitting subgroup. The p values are then aggregated for the overall test (Eq. 1)
Fig. 2
Fig. 2
Graphical visualisation and validation of treatment heterogeneity defined by non-crossover interactions in the SEAQUAMAT trial. Panels a and b show the univariate relationships to the individual predicted treatment effect for total parasite biomass and base deficit, respectively. The thick blue lines show spline fits to the data. Panel c shows the cumulative distribution of the p values for the added benefit of the ML model obtained by repeated data-splitting and stacking of the standard model alongside the ML model. Significance (at the 5% level) is obtained if the black line crosses above the red boundary. Panel d summarises the overall non-crossover interaction found by the random forest model with a pruned regression tree model fitted to the individual treatment effects. The leaves of the tree in panel d show the mean treatment effect (difference in mortality between artesunate and quinine)

Similar articles

Cited by

References

    1. Rothwell P. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet. 2005;365(9454):176–86. doi: 10.1016/S0140-6736(05)17709-5. - DOI - PubMed
    1. Altman D. Clinical trials: subgroup analyses in randomized trials – more rigour needed. Nat Rev Clin Oncol. 2015;12(9):506–7. doi: 10.1038/nrclinonc.2015.133. - DOI - PubMed
    1. Brown D. The press-release conviction of a biotech CEO and its impact on scientific research. Wash Post. 2013. https://www.washingtonpost.com/national/health-science/the-press-release....
    1. Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author) Stat Sci. 2001;16(3):199–231. doi: 10.1214/ss/1009213726. - DOI
    1. Murphy S. J R Stat Soc Ser B (Stat Methodol). 2003; 65(2):331–55.

LinkOut - more resources