Supplementary code for "Predicting onset, progression, and clinical subtypes of Parkinson disease using machine learning"
Faraz Faghri1,2, Sayed Hadi Hashemi2, Hampton Leonard1, Sonja W. Scholz3,4, Roy H. Campbell2, Mike A. Nalls1,5, Andrew B. Singleton1
1 - Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
2 - Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, IL, 61820, USA
3 - Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
4 - Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
5 - Data Tecnica International, Glen Echo, MD, 20812, USA
The clinical manifestations of Parkinson disease are characterized by considerable heterogeneity in age at onset, disease duration, rate of progression, and constellation of motor versus nonmotor features. Due to these variable presentations, counseling of patients about their individual risks and prognosis is limited. There is an unmet need for predictive tests that allow early detection and characterization of distinct disease subgroups as well as improved, individualized predictions of the disease course. The emergence of machine learning to detect hidden patterns in complex, multi-dimensional datasets provides unparalleled opportunities to address this critical need. The development of such predictive models could have radical implications for clinical trial design, allocation of healthcare resources, patient counseling, and ultimately treatment. Here, we describe the implementation of machine learning techniques for the identification of distinct subgroups in Parkinson disease. Using machine learning on comprehensive, longitudinal clinical data from the Parkinson Disease Progression Marker Initiative (PPMI), we are able to delineate three distinct disease subgroups with highly predictable progression rates, corresponding to slow, moderate and fast progressors. We achieved highly accurate projections of disease progression three years from initial diagnosis (area under the curve of 0.956). We also validated our model in an independent cohort from the Parkinson Disease Biomarker Program (PDBP) and demonstrate robust prediction of subtypes in this replication step (area under the curve 0.787). These data-driven techniques enable clinicians to deconstruct the clinical heterogeneity within their patient cohorts. This knowledge could have immediate clinical implications by improving the detection of significant clinical outcomes that might have been masked by cohort heterogeneity. We anticipate that machine learning models, such as the ones described in this article, will improve patient counseling, clinical trial design and ultimately individualized clinical care.
To accompany the manuscript, and to help with replication and extension of our work, we have made the code publicly available under GPLv3 (https://github.com/ffaghri1/PD-progression-ML)