Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 23;12(1):8643.
doi: 10.1038/s41598-022-12199-0.

Development and validation of an RNA-seq-based transcriptomic risk score for asthma

Affiliations

Development and validation of an RNA-seq-based transcriptomic risk score for asthma

Xuan Cao et al. Sci Rep. .

Abstract

Recent progress in RNA sequencing (RNA-seq) allows us to explore whole-genome gene expression profiles and to develop predictive model for disease risk. The objective of this study was to develop and validate an RNA-seq-based transcriptomic risk score (RSRS) for disease risk prediction that can simultaneously accommodate demographic information. We analyzed RNA-seq gene expression data from 441 asthmatic and 254 non-asthmatic samples. Logistic least absolute shrinkage and selection operator (Lasso) regression analysis in the training set identified 73 differentially expressed genes (DEG) to form a weighted RSRS that discriminated asthmatics from healthy subjects with area under the curve (AUC) of 0.80 in the testing set after adjustment for age and gender. The 73-gene RSRS was validated in three independent RNA-seq datasets and achieved AUCs of 0.70, 0.77 and 0.60, respectively. To explore their biological and molecular functions in asthma phenotype, we examined the 73 genes by enrichment pathway analysis and found that these genes were significantly (p < 0.0001) enriched for DNA replication, recombination, and repair, cell-to-cell signaling and interaction, and eumelanin biosynthesis and developmental disorder. Further in-silico analyses of the 73 genes using Connectivity map shows that drugs (mepacrine, dactolisib) and genetic perturbagens (PAK1, GSR, RBM15 and TNFRSF12A) were identified and could potentially be repurposed for treating asthma. These findings show the promise for RNA-seq risk scores to stratify and predict disease risk.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Study workflow for constructing the RSRS containing the steps of data acquisition and analysis. (a) Public data collection, processing and initial data analysis; (b) feature selection pipeline including DEG analysis and gene selection; (c) RSRS formulation and model validation in the testing set and independent cohorts.
Figure 2
Figure 2
Initial quality checking of the RNA-seq data based on the training set. (a) Volcano plot of − log10 adjusted p values (on the y-axis) versus log2 fold changes (on the x-axis) using the training set. The blue points correspond to the 73 RSRS genes. (b) MA plot, which is a scatter plot of log2 fold changes (on the y-axis) versus the mean of normalized counts (on the x-axis), where points were colored red if the genes were selected. Points which fall out of the window were plotted as open triangles pointing either up or down.
Figure 3
Figure 3
The heat map of the pair-wise Pearson correlation among the normalized and log-transformed gene expression values of 73 genes.
Figure 4
Figure 4
Density plots showing the asthma risk based on RSRS for different groups. (a) In training set; (b) testing set.
Figure 5
Figure 5
ROC curves for the asthma prediction performance of RSRS. (a) testing set and comparison with risk scores based on the DEG list ranked by fold change (FC) with p < 0.05, and the top 10, 50,100 genes ranked by p value; (b) in the independent cohort GSE118761 (AUC = 0.70); (c) in the independent cohort GSE38003 (AUC = 0.77); (d) in the independent cohort GSE85567 (AUC = 0.60).
Figure 6
Figure 6
The overlap between the 73 RSRS genes and asthma genome wide association study catalog. The sector width for the SNP is proportional to the − log10 (adjusted p value) corresponding to each SNP.

Similar articles

Cited by

References

    1. Lee JJ, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 2018;50:1112–1121. doi: 10.1038/s41588-018-0147-3. - DOI - PMC - PubMed
    1. Huls A, Czamara D. Methodological challenges in constructing DNA methylation risk scores. Epigenetics. 2020;15:1–11. doi: 10.1080/15592294.2019.1644879. - DOI - PMC - PubMed
    1. Wray NR, et al. Research review: Polygenic methods and their application to psychiatric traits. J. Child Psychol. Psychiatry. 2014;55:1068–1087. doi: 10.1111/jcpp.12295. - DOI - PubMed
    1. Hüls A, Ickstadt K, Schikowski T, Krämer U. Detection of gene-environment interactions in the presence of linkage disequilibrium and noise by using genetic risk scores with internal weights from elastic net regression. BMC Genet. 2017;18:55. doi: 10.1186/s12863-017-0519-1. - DOI - PMC - PubMed
    1. Huls A, et al. Comparison of weighting approaches for genetic risk scores in gene-environment interaction studies. BMC Genet. 2017;18:115. doi: 10.1186/s12863-017-0586-3. - DOI - PMC - PubMed

Publication types