Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 9:3:1131021.
doi: 10.3389/fbinf.2023.1131021. eCollection 2023.

Multimodal AI for prediction of distant metastasis in carcinoma patients

Affiliations

Multimodal AI for prediction of distant metastasis in carcinoma patients

Isaac Olatunji et al. Front Bioinform. .

Abstract

Metastasis of cancer is directly related to death in almost all cases, however a lot is yet to be understood about this process. Despite advancements in the available radiological investigation techniques, not all cases of Distant Metastasis (DM) are diagnosed at initial clinical presentation. Also, there are currently no standard biomarkers of metastasis. Early, accurate diagnosis of DM is however crucial for clinical decision making, and planning of appropriate management strategies. Previous works have achieved little success in attempts to predict DM from either clinical, genomic, radiology, or histopathology data. In this work we attempt a multimodal approach to predict the presence of DM in cancer patients by combining gene expression data, clinical data and histopathology images. We tested a novel combination of Random Forest (RF) algorithm with an optimization technique for gene selection, and investigated if gene expression pattern in the primary tissues of three cancer types (Bladder Carcinoma, Pancreatic Adenocarcinoma, and Head and Neck Squamous Carcinoma) with DM are similar or different. Gene expression biomarkers of DM identified by our proposed method outperformed Differentially Expressed Genes (DEGs) identified by the DESeq2 software package in the task of predicting presence or absence of DM. Genes involved in DM tend to be more cancer type specific rather than general across all cancers. Our results also indicate that multimodal data is more predictive of metastasis than either of the three unimodal data tested, and genomic data provides the highest contribution by a wide margin. The results re-emphasize the importance for availability of sufficient image data when a weakly supervised training technique is used. Code is made available at: https://github.com/rit-cui-lab/Multimodal-AI-for-Prediction-of-Distant-Metastasis-in-Carcinoma-Patients.

Keywords: cancer; deep learning; gene expression; histopathology; machine learning; metastasis; multimodal.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Workflow for analysis of genomic data. On the right, gene expression datasets of HNSC, PAAD, and BLCA are separately passed through the (RF + Optimization) gene selection algorithm, and a parallel DGE Analysis. Separate ML models are trained on highest ranked selected genes, and DEGs to predict DM samples. On the top left, the same processes carried out on ALL3 (combined dataset of HNSC, PAAD, and BLCA) dataset. On the lower left, genes selected from ALL3 dataset are used as features to train new models on each of the cancer types (i.e., other groups- HNSC, PAAD, BLCA) dataset. Cancer images source: National Cancer Institute.
FIGURE 2
FIGURE 2
Binary classifiers trained on the same patients’ records but with different data types for comparisons between unimodal, and multimodal data models. (A) Additional Dense layers trained on DenseNet121, and KimiaNet CNN for features extraction from histopathology images. Classifiers are trained on image features to predict presence or absence of DM. (B) Classifiers are trained on only clinical variables to predict presence or absence of DM. (C) Classifiers are trained on genes selected by the (RF + Optimization) algorithm to predict presence or absence of DM (D) Image features are fused with selected genes, and clinical features to train multimodal models for prediction of DM.
FIGURE 3
FIGURE 3
Functional classification of selected genes in (A) BLCA, (B) PAAD, and (C) HNSC.
FIGURE 4
FIGURE 4
(A, E, I, M) Five-fold cross validation ROC curves, and mean ROC curve when 15 highest ranked genes selected by (RF + Optimization) method from each of the groups are used to predict presence or absence of DM. These are higher than other predictions within the same study group. (B, F, J, N) Five-fold cross validation ROC curves, and mean ROC curve when 15 highest ranked DEGs (p adjusted value = 0.05) are used to predict presence or absence of DM. (C, G, K) Five-fold cross validation ROC curves, and mean ROC curve when 15 highest ranked genes selected by (RF + Optimization) method from ALL3 group are used to predict presence or absence of DM in other (BLCA, PAAD, HNSC) study groups. (D, H, L) Five-fold cross validation ROC curves, and mean ROC curve when a union of the 15 highest ranked genes selected by (RF + Optimization) method from each of BLCA, PAAD, and HNSC groups (total = 45) are used to predict presence or absence of DM in each cancer type.
FIGURE 5
FIGURE 5
There was no overlap between genes selected using the (RF + Optimization) method in either of the three cancer types- BLCA, PAAD, and HNSC. Six genes selected in PAAD, and seven genes in BLCA were also present in the list of genes selected from the ALL3 group.
FIGURE 6
FIGURE 6
(A) SVM classifier clearly shows superiority of the multimodal (Clinical + Genomic + Image) model in the ALL3 dataset. (B) The performance of the (Clinical + Genomic + Image) model appears to be on par with the bimodal models when a MLP classifier is used. Comparing prediction metrics of the best performing bimodal (Image + Genomic) model in each group to those of unimodal models. Results from the bimodal model from the (C) ALL3 dataset are clearly better than those from unimodal (D) BLCA dataset, and (E) PAAD dataset.

Similar articles

Cited by

References

    1. Ali B., Mubarik F., Zahid N., Sattar A. K. (2020). Clinicopathologic features predictive of distant metastasis in patients diagnosed with invasive breast cancer. JCO Glob. Oncol. 6, 1346–1351. 10.1200/GO.20.00257 - DOI - PMC - PubMed
    1. Alla V., Engelmann D., An N., Pahnke J., Schmidt A., Kunz M., et al. (2010). E2F1 in melanoma progression and metastasis. JNCI J. Natl. Cancer Inst. 102 (2), 127–133. 10.1093/jnci/djp458 - DOI - PubMed
    1. Bednarek R., Selmi A., Wojkowska D., Karolczak K., Popielarski M., Stasiak M., et al. (2020). Functional inhibition of F11 receptor (F11R/junctional adhesion molecule-A/JAM-A) activity by a F11R-derived peptide in breast cancer and its microenvironment. Breast Cancer Res. Treat. 179 (2), 325–335. 10.1007/s10549-019-05471-x - DOI - PMC - PubMed
    1. Bitter E. E., Morris R. M., Mortimer T., Barlow K., Schekall A., Townsend M. H., et al. (2022). “The potential effects of thymidine kinase 1 on breast cancer invasion,” in Proceedings of the American association for cancer research annual meeting 2022 (Philadelphia (PA): AACR; Cancer Res; ).
    1. Brinker T. J., Kiehl L., Schmitt M., Jutzi T. B., Krieghoff-Henning E. I., Krahl D., et al. (2021). Deep learning approach to predict sentinel lymph node status directly from routine histology of primary melanoma tumours. Eur. J. Cancer 154, 227–234. 10.1016/j.ejca.2021.05.026 - DOI - PubMed