Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 15;15(7):1215-1226.
doi: 10.4251/wjgo.v15.i7.1215.

Integrated analysis of single-cell and bulk RNA-seq establishes a novel signature for prediction in gastric cancer

Affiliations

Integrated analysis of single-cell and bulk RNA-seq establishes a novel signature for prediction in gastric cancer

Fei Wen et al. World J Gastrointest Oncol. .

Abstract

Background: Single-cell sequencing technology provides the capability to analyze changes in specific cell types during the progression of disease. However, previous single-cell sequencing studies on gastric cancer (GC) have largely focused on immune cells and stromal cells, and further elucidation is required regarding the alterations that occur in gastric epithelial cells during the development of GC.

Aim: To create a GC prediction model based on single-cell and bulk RNA sequencing (bulk RNA-seq) data.

Methods: In this study, we conducted a comprehensive analysis by integrating three single-cell RNA sequencing (scRNA-seq) datasets and ten bulk RNA-seq datasets. Our analysis mainly focused on determining cell proportions and identifying differentially expressed genes (DEGs). Specifically, we performed differential expression analysis among epithelial cells in GC tissues and normal gastric tissues (NAGs) and utilized both single-cell and bulk RNA-seq data to establish a prediction model for GC. We further validated the accuracy of the GC prediction model in bulk RNA-seq data. We also used Kaplan-Meier plots to verify the correlation between genes in the prediction model and the prognosis of GC.

Results: By analyzing scRNA-seq data from a total of 70707 cells from GC tissue, NAG, and chronic gastric tissue, 10 cell types were identified, and DEGs in GC and normal epithelial cells were screened. After determining the DEGs in GC and normal gastric samples identified by bulk RNA-seq data, a GC predictive classifier was constructed using the Least absolute shrinkage and selection operator (LASSO) and random forest methods. The LASSO classifier showed good performance in both validation and model verification using The Cancer Genome Atlas and Genotype-Tissue Expression (GTEx) datasets [area under the curve (AUC)_min = 0.988, AUC_1se = 0.994], and the random forest model also achieved good results with the validation set (AUC = 0.92). Genes TIMP1, PLOD3, CKS2, TYMP, TNFRSF10B, CPNE1, GDF15, BCAP31, and CLDN7 were identified to have high importance values in multiple GC predictive models, and KM-PLOTTER analysis showed their relevance to GC prognosis, suggesting their potential for use in GC diagnosis and treatment.

Conclusion: A predictive classifier was established based on the analysis of RNA-seq data, and the genes in it are expected to serve as auxiliary markers in the clinical diagnosis of GC.

Keywords: Gastric cancer; Least absolute shrinkage and selection operator; Prediction model; Random forest; Single-cell RNA sequencing.

PubMed Disclaimer

Conflict of interest statement

Conflict-of-interest statement: All the authors report having no relevant conflicts of interest for this article.

Figures

Figure 1
Figure 1
Workflow of the study. Bulk RNA-seq: Bulk RNA sequencing; DEGs: Differentially expressed genes; LASSO: Least absolute shrinkage and selection operator; scRNA-seq: Single-cell RNA sequencing; GEO: Gene Expression Omnibus.
Figure 2
Figure 2
Analysis results of single-cell sequencing. A: UMAP of integrated samples, color-coded by cell clusters; B: UMAP of integrated samples, color-coded by cell types; C: Violin plot of expression of typical marker genes in different cell types; D: Heatmap of expression of typical marker genes in different cell clusters; E: Bar plot showing the proportion of each cell type in different tissues [normal gastric tissue (NAG), chronic atrophic gastritis (CAG), gastric cancer (GC)]; F: Bar plot showing the proportion of each cell type in different tissues [NAG, CAG, intestinal metaplasia (IM), intestinal GC, mixed GC, diffuse GC].
Figure 3
Figure 3
Analysis results of bulk RNA sequencing. A: Principal component analysis (PCA) before COMBAT (presented by dataset); B: PCA after COMBAT (presented by dataset); C: PCA after COMBAT (by pathology type); D: Volcano plot showing differentially expressed genes with top 20 genes labeled according to ‘-log10 (P value)’; E: Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of highly expressed genes in gastric cancer tissues.
Figure 4
Figure 4
Protein interaction analysis of the differentially expressed genes.
Figure 5
Figure 5
Gastric cancer prediction model constructed by Least absolute shrinkage and selection operator. A: Plot showing 'prob_min' and 'prob_1se' selected to construct the Least absolute shrinkage and selection operator (LASSO) model; B: Plot showing the predictive efficiency of the LASSO model; C: Importance values of genes in the 'prob_min' model; D: Importance values of genes in the 'prob_1se' model; E: Area under the curve (AUC) analyses depicting the predictive efficiency of the LASSO model in the The Cancer Genome Atlas and Genotype-Tissue Expression datasets.
Figure 6
Figure 6
Gastric cancer prediction model constructed by random forest. A: Feature selection of the gastric cancer prediction model based on random forest; B: Accuracy of randomly selected predictors across repeated cross validation; C: Importance values of genes in the random forest model; D: Area under curve value of the random forest prediction model.
Figure 7
Figure 7
Kaplan–Meier plots evaluating the association between gene expression and gastric cancer survival. A: TIMP1; B: PLOD3; C: CSK2; D: TYMP; E: TNFRSF10B; F: CPNE1; G: GDF15; H: BCAP31; I: CLDN7. HR: Hazard ratio.

Similar articles

Cited by

References

    1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72:7–33. - PubMed
    1. Ma S, Zhou M, Xu Y, Gu X, Zou M, Abudushalamu G, Yao Y, Fan X, Wu G. Clinical application and detection techniques of liquid biopsy in gastric cancer. Mol Cancer. 2023;22:7. - PMC - PubMed
    1. Joshi SS, Badgwell BD. Current treatment and recent progress in gastric cancer. CA Cancer J Clin. 2021;71:264–279. - PMC - PubMed
    1. Izumi D, Zhu Z, Chen Y, Toden S, Huo X, Kanda M, Ishimoto T, Gu D, Tan M, Kodera Y, Baba H, Li W, Chen J, Wang X, Goel A. Assessment of the Diagnostic Efficiency of a Liquid Biopsy Assay for Early Detection of Gastric Cancer. JAMA Netw Open. 2021;4:e2121129. - PMC - PubMed
    1. Chen H, Huang C, Wu Y, Sun N, Deng C. Exosome Metabolic Patterns on Aptamer-Coupled Polymorphic Carbon for Precise Detection of Early Gastric Cancer. ACS Nano. 2022;16:12952–12963. - PubMed