Machine learning-based reproducible prediction of type 2 diabetes subtypes

doi:10.1007/s00125-024-06248-8

. 2024 Nov;67(11):2446-2458.

doi: 10.1007/s00125-024-06248-8. Epub 2024 Aug 21.

Machine learning-based reproducible prediction of type 2 diabetes subtypes

Hayato Tanabe^{1

2}, Masahiro Sato¹, Akimitsu Miyake³, Yoshinori Shimajiri⁴, Takafumi Ojima^{3

5}, Akira Narita⁶, Haruka Saito¹, Kenichi Tanaka⁷, Hiroaki Masuzaki⁸, Junichiro J Kazama⁷, Hideki Katagiri², Gen Tamiya^{3

6}, Eiryo Kawakami^{9

10}, Michio Shimabukuro¹¹

Affiliations

¹ Department of Diabetes, Endocrinology, and Metabolism, Fukushima Medical University School of Medicine, Fukushima, Japan.
² Department of Diabetes, Metabolism and Endocrinology, Tohoku University Graduate School of Medicine, Miyagi, Japan.
³ Department of AI and Innovative Medicine, Tohoku University School of Medicine, Miyagi, Japan.
⁴ Shimajiri Kinsermae Diabetes Care Clinic, Okinawa, Japan.
⁵ Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan.
⁶ Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan.
⁷ Department of Nephrology and Hypertension, Fukushima Medical University School of Medicine, Fukushima, Japan.
⁸ Division of Endocrinology and Metabolism, Second Department of Internal Medicine, University of the Ryukyus Graduate School of Medicine, Okinawa, Japan.
⁹ Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Japan. eiryo.kawakami@chiba-u.jp.
¹⁰ Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Yokohama, Japan. eiryo.kawakami@chiba-u.jp.
¹¹ Department of Diabetes, Endocrinology, and Metabolism, Fukushima Medical University School of Medicine, Fukushima, Japan. mshimabukuro-ur@umin.ac.jp.

PMID: 39168869
PMCID: PMC11519166
DOI: 10.1007/s00125-024-06248-8

Machine learning-based reproducible prediction of type 2 diabetes subtypes

Hayato Tanabe et al. Diabetologia. 2024 Nov.

. 2024 Nov;67(11):2446-2458.

doi: 10.1007/s00125-024-06248-8. Epub 2024 Aug 21.

Authors

Affiliations

¹ Department of Diabetes, Endocrinology, and Metabolism, Fukushima Medical University School of Medicine, Fukushima, Japan.
² Department of Diabetes, Metabolism and Endocrinology, Tohoku University Graduate School of Medicine, Miyagi, Japan.
³ Department of AI and Innovative Medicine, Tohoku University School of Medicine, Miyagi, Japan.
⁴ Shimajiri Kinsermae Diabetes Care Clinic, Okinawa, Japan.
⁵ Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan.
⁶ Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan.
⁷ Department of Nephrology and Hypertension, Fukushima Medical University School of Medicine, Fukushima, Japan.
⁸ Division of Endocrinology and Metabolism, Second Department of Internal Medicine, University of the Ryukyus Graduate School of Medicine, Okinawa, Japan.
⁹ Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Japan. eiryo.kawakami@chiba-u.jp.
¹⁰ Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Yokohama, Japan. eiryo.kawakami@chiba-u.jp.
¹¹ Department of Diabetes, Endocrinology, and Metabolism, Fukushima Medical University School of Medicine, Fukushima, Japan. mshimabukuro-ur@umin.ac.jp.

PMID: 39168869
PMCID: PMC11519166
DOI: 10.1007/s00125-024-06248-8

Abstract

Aims/hypothesis: Clustering-based subclassification of type 2 diabetes, which reflects pathophysiology and genetic predisposition, is a promising approach for providing personalised and effective therapeutic strategies. Ahlqvist's classification is currently the most vigorously validated method because of its superior ability to predict diabetes complications but it does not have strong consistency over time and requires HOMA2 indices, which are not routinely available in clinical practice and standard cohort studies. We developed a machine learning (ML) model to classify individuals with type 2 diabetes into Ahlqvist's subtypes consistently over time.

Methods: Cohort 1 dataset comprised 619 Japanese individuals with type 2 diabetes who were divided into training and test sets for ML models in a 7:3 ratio. Cohort 2 dataset, comprising 597 individuals with type 2 diabetes, was used for external validation. Participants were pre-labelled (T2D_kmeans) by unsupervised k-means clustering based on Ahlqvist's variables (age at diagnosis, BMI, HbA_1c, HOMA2-B and HOMA2-IR) to four subtypes: severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD). We adopted 15 variables for a multiclass classification random forest (RF) algorithm to predict type 2 diabetes subtypes (T2D_RF15). The proximity matrix computed by RF was visualised using a uniform manifold approximation and projection. Finally, we used a putative subset with missing insulin-related variables to test the predictive performance of the validation cohort, consistency of subtypes over time and prediction ability of diabetes complications.

Results: T2D_RF15 demonstrated a 94% accuracy for predicting T2D_kmeans type 2 diabetes subtypes (AUCs ≥0.99 and F1 score [an indicator calculated by harmonic mean from precision and recall] ≥0.9) and retained the predictive performance in the external validation cohort (86.3%). T2D_RF15 showed an accuracy of 82.9% for detecting T2D_kmeans, also in a putative subset with missing insulin-related variables, when used with an imputation algorithm. In Kaplan-Meier analysis, the diabetes clusters of T2D_RF15 demonstrated distinct accumulation risks of diabetic retinopathy in SIDD and that of chronic kidney disease in SIRD during a median observation period of 11.6 (4.5-18.3) years, similarly to the subtypes using T2D_kmeans. The predictive accuracy was improved after excluding individuals with low predictive probability, who were categorised as an 'undecidable' cluster. T2D_RF15, after excluding undecidable individuals, showed higher consistency (100% for SIDD, 68.6% for SIRD, 94.4% for MOD and 97.9% for MARD) than T2D_kmeans.

Conclusions/interpretation: The new ML model for predicting Ahlqvist's subtypes of type 2 diabetes has great potential for application in clinical practice and cohort studies because it can classify individuals with missing HOMA2 indices and predict glycaemic control, diabetic complications and treatment outcomes with long-term consistency by using readily available variables. Future studies are needed to assess whether our approach is applicable to research and/or clinical practice in multiethnic populations.

Keywords: Clustering; Diabetes subtypes; Machine learning; Random forest; Type 2 diabetes.

PubMed Disclaimer

Figures

**Fig. 1**
Predictive performance of type 2 diabetes subtypes using an RF algorithm based on 15 features (T2D_RF15) for estimating T2D_kmeans in the test dataset of Cohort 1. (a) ROC curve showing the diagnostic performance of T2D_RF15, the RF model using Boruta-selected 15 features, to predict the T2D_kmeans. (b) Feature importance of Boruta-selected 15 variables fed into T2D_RF15. ALT, aspartate aminotransferase; FPG, fasting plasma glucose; γGT, γ-glutamyl transferase; HDL-C, HDL-cholesterol; TG, triacylglycerols

**Fig. 2**
Predictive performance of type 2 diabetes subtypes using an RF algorithm based on 15 features (T2D_RF15) for estimating T2D_kmeans in the external validation dataset of Cohort 2. ROC curve showing the diagnostic ability of T2D_RF15 to predict the subtypes pre-labelled by k-means clustering (T2D_kmeans) were calculated in original Cohort 2 dataset (a) and in a putative Cohort 2 dataset with missing insulin-related variables (b)

**Fig. 3**
Proximity matrix representing the similarity between participants calculated using the RF. (a) Two-dimensional visualisation of the proximity matrix between all participants included in the training and test data. Colours indicate differences in subtype assignment using k-means clustering (T2D_kmeans). (b) Proximity matrix with embedded labels for type 2 diabetes subtypes predicted by the RF algorithm based on 15 variables (T2D_RF15). Participants with low predictive probability were newly defined as the undecidable cluster. (c) Proximity matrix with T2D_RF15 labels embedded after excluding participants in the undecidable cluster; the remaining participants could be clearly divided into four clusters

**Fig. 4**
Sankey diagram showing the subtype redistribution and migration pattern of the study participants in Cohort 1 from baseline to 5 year follow-up. (a) Type 2 diabetes subtypes labelled by k-means clustering (T2D_kmeans). (b) Type 2 diabetes subtypes predicted by an RF algorithm based on 15 variables (T2D_RF15). (c) Migration pattern of T2D_RF15 excluding the undecidable cluster. (d) Type 2 diabetes subtypes predicted by an RF algorithm based on 15 variables from the dataset where insulin-related variables have been imputed (T2D_RF15)

**Fig. 5**
Kaplan–Meier curves for the cumulative incidence of retinopathy (a), CKD (eGFR <60 ml/min per 1.73 m²) (b), proteinuria (c) and coronary artery disease (d) in type 2 diabetes subtypes predicted by RF based on 15 variables (T2D_RF15) in the putative dataset in Cohort 1 with missing insulin-related variables

See this image and copyright information in PMC

References

1. ElSayed NA, Aleppo G, Aroda VR et al (2023) Classification and diagnosis of diabetes: standards of care in diabetes-2023. Diabetes Care 46(Suppl 1):S19-s40. 10.2337/dc23-S002 - PMC - PubMed
1. Redondo MJ, Hagopian WA, Oram R et al (2020) The clinical consequences of heterogeneity within and between different diabetes types. Diabetologia 63(10):2040–2048. 10.1007/s00125-020-05211-7 - PMC - PubMed
1. Inzucchi SE, Bergenstal RM, Buse JB et al (2012) Management of hyperglycemia in type 2 diabetes: a patient-centered approach: position statement of the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 35(6):1364–1379. 10.2337/dc12-0413 - PMC - PubMed
1. Davies MJ, Aroda VR, Collins BS et al (2022) Management of hyperglycemia in type 2 diabetes, 2022. A consensus report by the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 45(11):2753–2786. 10.2337/dci22-0034 - PMC - PubMed
1. Pearson ER (2019) Type 2 diabetes: a multifaceted disease. Diabetologia 62(7):1107–1112. 10.1007/s00125-019-4909-y - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

[1] ElSayed NA, Aleppo G, Aroda VR et al (2023) Classification and diagnosis of diabetes: standards of care in diabetes-2023. Diabetes Care 46(Suppl 1):S19-s40. 10.2337/dc23-S002 - PMC - PubMed

[2] ElSayed NA, Aleppo G, Aroda VR et al (2023) Classification and diagnosis of diabetes: standards of care in diabetes-2023. Diabetes Care 46(Suppl 1):S19-s40. 10.2337/dc23-S002 - PMC - PubMed

[3] Redondo MJ, Hagopian WA, Oram R et al (2020) The clinical consequences of heterogeneity within and between different diabetes types. Diabetologia 63(10):2040–2048. 10.1007/s00125-020-05211-7 - PMC - PubMed

[4] Redondo MJ, Hagopian WA, Oram R et al (2020) The clinical consequences of heterogeneity within and between different diabetes types. Diabetologia 63(10):2040–2048. 10.1007/s00125-020-05211-7 - PMC - PubMed

[5] Inzucchi SE, Bergenstal RM, Buse JB et al (2012) Management of hyperglycemia in type 2 diabetes: a patient-centered approach: position statement of the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 35(6):1364–1379. 10.2337/dc12-0413 - PMC - PubMed

[6] Inzucchi SE, Bergenstal RM, Buse JB et al (2012) Management of hyperglycemia in type 2 diabetes: a patient-centered approach: position statement of the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 35(6):1364–1379. 10.2337/dc12-0413 - PMC - PubMed

[7] Davies MJ, Aroda VR, Collins BS et al (2022) Management of hyperglycemia in type 2 diabetes, 2022. A consensus report by the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 45(11):2753–2786. 10.2337/dci22-0034 - PMC - PubMed

[8] Davies MJ, Aroda VR, Collins BS et al (2022) Management of hyperglycemia in type 2 diabetes, 2022. A consensus report by the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD). Diabetes Care 45(11):2753–2786. 10.2337/dci22-0034 - PMC - PubMed

[9] Pearson ER (2019) Type 2 diabetes: a multifaceted disease. Diabetologia 62(7):1107–1112. 10.1007/s00125-019-4909-y - PMC - PubMed

[10] Pearson ER (2019) Type 2 diabetes: a multifaceted disease. Diabetologia 62(7):1107–1112. 10.1007/s00125-019-4909-y - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning-based reproducible prediction of type 2 diabetes subtypes

Affiliations

Machine learning-based reproducible prediction of type 2 diabetes subtypes

Authors

Affiliations

Abstract

Figures

Similar articles

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous

Abstract

Figures

Similar articles

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous