Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 31;12(5):1254-1269.
doi: 10.21037/tcr-22-2700. Epub 2023 Apr 10.

Construction of diagnostic and prognostic models based on gene signatures of nasopharyngeal carcinoma by machine learning methods

Affiliations

Construction of diagnostic and prognostic models based on gene signatures of nasopharyngeal carcinoma by machine learning methods

Yiren Wang et al. Transl Cancer Res. .

Abstract

Background: Diagnostic models based on gene signatures of nasopharyngeal carcinoma (NPC) were constructed by random forest (RF) and artificial neural network (ANN) algorithms. Least absolute shrinkage and selection operator (Lasso)-Cox regression was used to select and build prognostic models based on gene signatures. This study contributes to the early diagnosis and treatment, prognosis, and molecular mechanisms associated with NPC.

Methods: Two gene expression datasets were downloaded from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) associated with NPC were identified by gene expression differential analysis. Subsequently, significant DEGs were identified by a RF algorithm. ANN were used to construct a diagnostic model for NPC. The performance of the diagnostic model was evaluated by area under the curve (AUC) values using a validation set. Lasso-Cox regression examined gene signatures associated with prognosis. Overall survival (OS) and disease-free survival (DFS) prediction models were constructed and validated from The Cancer Genome Atlas (TCGA) database and the International Cancer Genome Consortium (ICGC) database.

Results: A total of 582 DEGs associated with NPC were identified, and 14 significant genes were identified by the RF algorithm. A diagnostic model for NPC was successfully constructed using ANN, and the validity of the model was confirmed on the training set AUC =0.947 [95% confidence interval (CI): 0.911-0.969] and the validation set AUC =0.864 (95% CI: 0.828-0.901). The 24-gene signatures associated with prognosis were identified by Lasso-Cox regression, and prediction models for OS and DFS of NPC were constructed on the training set. Finally, the ability of the model was validated on the validation set.

Conclusions: Several potential gene signatures associated with NPC were identified, and a high-performance predictive model for early diagnosis of NPC and a prognostic prediction model with robust performance were successfully developed. The results of this study provide valuable references for early diagnosis, screening, treatment and molecular mechanism research of NPC in the future.

Keywords: Nasopharyngeal carcinoma (NPC); bioinformatics; diagnostic model; disease markers; machine learning.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-22-2700/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Flowchart of this study. GEO, Gene Expression Omnibus; ICGC, International Cancer Genome Consortium; NPC, nasopharyngeal carcinoma; TCGA, The Cancer Genome Atlas.
Figure 2
Figure 2
Heat map and volcano plot of DEGs. (A) Heat map of DEGs between NPC patients and controls, group1 is control group, group2 is NPC patient group, red is up-regulated gene expression, blue is down-regulated, darker color indicates higher or lower gene expression. (B) Volcano map of DEGs, red is up-regulated gene expression, blue is down-regulated gene expression. DEGs, differentially gene expressions; NPC, nasopharyngeal carcinoma.
Figure 3
Figure 3
Identification of key genes in NPC using RF. (A) Effect of number of decision trees on error rate, X-axis is the number of decision trees and Y-axis is the error rate. (B) Output of Gini coefficient method in RF model, X-axis is the importance index and Y-axis is the gene name. (C) Visual heat map of gene expression of 14 key genes between two groups of samples, group1 is normal control group and group2 is NPC group. NPC, nasopharyngeal carcinoma; RF, random forest.
Figure 4
Figure 4
Visualization of ANN diagnostic prediction models. Neural network topology with 14 input layers consisting of key genes; with 5 hidden layers; and 2 output layers (1= NPC group/0= control group). ANN, artificial neural network; NPC, nasopharyngeal carcinoma.
Figure 5
Figure 5
ROC curves of ANN diagnostic NPC. (A) AUC validation results of ANN model on the training dataset. (B) AUC validation results of ANN model on the validation dataset. ANN, artificial neural network; AUC, area under the curve; CI, confidence interval; FPR, false positive rate; NPC, nasopharyngeal carcinoma; TPR, true positive rate.
Figure 6
Figure 6
Lasso regression analysis results. Lasso regression analysis and partial likelihood deviance for the Lasso regression. Lasso, least absolute shrinkage and selection operator.
Figure 7
Figure 7
Kaplan-Meier survival analysis, risk score analysis and time-dependent ROC analysis of 24-gene signatures in the OS training dataset. 1-year, AUC =0.751, 95% CI: 0.698–0.804; 3-year, AUC =0.769, 95% CI: 0.716–0.823; 5-year, AUC =0.731, 95% CI: 0.649–0.813. AUC, area under the curve; CI, confidence interval; HR, hazard ratio; OS, overall survival; ROC, receiver operating characteristic.
Figure 8
Figure 8
Kaplan-Meier survival analysis, risk score analysis and time-dependent ROC analysis of 3 gene signatures in the DFS training dataset. 1-year, AUC =0.718, 95% CI: 0.608–0.827; 3-year, AUC =0.753, 95% CI: 0.648–0.857; 5-year, AUC =0.632, 95% CI: 0.438–0.825. AUC, area under the curve; CI, confidence interval; DFS, disease-free survival; HR, hazard ratio; ROC, receiver operating characteristic.
Figure 9
Figure 9
Kaplan-Meier survival analysis, risk score analysis and time-dependent ROC analysis of 24-gene signatures in the OS validation dataset. 1-year, AUC =0.719, 95% CI: 0.674–0.764; 3-year, AUC =0.746, 95% CI: 0.699–0.792; 5-year, AUC =0.709, 95% CI: 0.640–0.778. AUC, area under the curve; CI, confidence interval; HR, hazard ratio; OS, overall survival; ROC, receiver operating characteristic.
Figure 10
Figure 10
Kaplan-Meier survival analysis, risk score analysis and time-dependent ROC analysis of 3 gene signatures in the DFS validation dataset. 1-year, AUC =0.682, 95% CI: 0.530–0.834; 3-year, AUC =0.679, 95% CI: 0.515–0.843; 5-year, AUC =0.697, 95% CI: 0.423–0.972. AUC, area under the curve; CI, confidence interval; DFS, disease-free survival; HR, hazard ratio; ROC, receiver operating characteristic.
Figure 11
Figure 11
The mutation information of 24 prognostic genes in cBioPortal online website.
Figure 12
Figure 12
Two distinct KEGG pathways in gene expression matrix are enriched in high- and low-risk groups. KEGG, Kyoto Genes and Genomes Encyclopedia.

Similar articles

Cited by

References

    1. Cao SM, Simons MJ, Qian CN. The prevalence and prevention of nasopharyngeal carcinoma in China. Chin J Cancer 2011;30:114-9. 10.5732/cjc.010.10377 - DOI - PMC - PubMed
    1. Lam WKJ, Chan JYK. Recent advances in the management of nasopharyngeal carcinoma. F1000Res 2018;7:F1000 Faculty Rev-1829. - PubMed
    1. Chen YP, Chan ATC, Le QT, et al. Nasopharyngeal carcinoma. Lancet 2019;394:64-80. 10.1016/S0140-6736(19)30956-0 - DOI - PubMed
    1. Huang X, Liu S, Wu L, et al. High Throughput Single Cell RNA Sequencing, Bioinformatics Analysis and Applications. Adv Exp Med Biol 2018;1068:33-43. 10.1007/978-981-13-0502-3_4 - DOI - PubMed
    1. Chen F, Shen C, Wang X, et al. Identification of genes and pathways in nasopharyngeal carcinoma by bioinformatics analysis. Oncotarget 2017;8:63738-49. 10.18632/oncotarget.19478 - DOI - PMC - PubMed