Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 2;22(5):bbab142.
doi: 10.1093/bib/bbab142.

Deep learning of gene relationships from single cell time-course expression data

Affiliations

Deep learning of gene relationships from single cell time-course expression data

Ye Yuan et al. Brief Bioinform. .

Abstract

Time-course gene-expression data have been widely used to infer regulatory and signaling relationships between genes. Most of the widely used methods for such analysis were developed for bulk expression data. Single cell RNA-Seq (scRNA-Seq) data offer several advantages including the large number of expression profiles available and the ability to focus on individual cells rather than averages. However, the data also raise new computational challenges. Using a novel encoding for scRNA-Seq expression data, we develop deep learning methods for interaction prediction from time-course data. Our methods use a supervised framework which represents the data as 3D tensor and train convolutional and recurrent neural networks for predicting interactions. We tested our time-course deep learning (TDL) models on five different time-series scRNA-Seq datasets. As we show, TDL can accurately identify causal and regulatory gene-gene interactions and can also be used to assign new function to genes. TDL improves on prior methods for the above tasks and can be generally applied to new time-series scRNA-Seq data.

Keywords: deep learning; single cell RNA-Seq; time-course data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
TDL model architecture. To infer gene interactions (top left), we first convert time-course single cell expression data to 3D tensor, which we term NEPDF. Each 2D slice of the NEPDF captures the co-expression of a pair of genes at one of the time points profiled and the 3D NEPDF represents their co-expression over time. 3D NEPDF is then used as input to a TDL model. The model is trained using labeled positive and negative pairs. The figure shows the convolutional LSTM architecture which is one of the two TDL models we tested. This model consists of LSTM layer, followed by a dense layer which concatenates all convolutional hidden state from LSTM layer and then a final output (classification) layer. See Figure S1 available online at https://academic.oup.com/bib for the other TDL architecture we tested, 3D CNN.
Figure 2
Figure 2
Causality prediction. (A) AUROC of Granger causality (with and without pseudo-time ordering), CNNC, 3D CNN and conv-LSTM on TF-target causality prediction tasks in mESC1, mESC2, hESC1 and hESC2 datasets, respectively. (BE) Average gene pair expression over time of four pairs that were correctly predicted as gene1 → gene2 (top) and 2 → 1 (bottom) by TDL models in hESC2 dataset. (FH) The Average gene expression over time (F), and the stereoscopic surface (upper)/heatmap (bottom) of NEPDF used by CNNC (G), and time-course NEPDFs along with down-sampled time point used by TDL (H) for a pair that was correctly predicted as positive by TDL while wrongly predicted as negative by CNNC in hESC1 dataset.
Figure 3
Figure 3
TF target prediction. (A) AUROC of dyngenie3, dyngenie3 with pseudo-time ordering by Monocle3, Pearson correlation (PC), mutual information (MI), CNNC, 3D CNN and conv-LSTM on TF-target prediction in mESC1, mESC2, hESC1 and hESC2 datasets respectively. (BE) Average gene pair expression along with time point of typical samples that were correctly predicted as interacting gene pairs and non-interacting gene pairs by TDL models in hESC2 dataset. (FH) Average gene pair expression along with time point (F), stereoscopic surface (upper)/heatmap (bottom) of NEPDF used by CNNC (G) and time-course NEPDF along with down-sampled time point used by TDL (H) of a sample that was correctly predicted as positive by TDL while wrongly predicted as negative by CNNC in hESC1 dataset.
Figure 4
Figure 4
Function assignment. (AD) AUROC of CNNC, 3D CNN and conv-LSTM on the function prediction task for cell cycle, rhythm, immune and proliferation genes respectively. (E, F) The 2D NEPDF used by CNNC and 3D NEPDF used by conv-LSTM for a positive pair which both CNNC and conv-LSTM correctly classified. (G, H) 2D and 3D NEPDF for a pair that was correctly classified by conv-LSTM while incorrectly classified by CNNC.

Similar articles

Cited by

References

    1. Stuart JM, Segal E, Koller D, et al. . A gene-coexpression network for global discovery of conserved genetic modules. Science 2003;302:249–55. - PubMed
    1. Marbach D, Costello JC, Kuffner R, et al. . Wisdom of crowds for robust gene network inference. Nat Methods 2012;9:796–804. - PMC - PubMed
    1. Finkle JD, Wu JJ, Bagheri N. Windowed Granger causal inference strategy improves discovery of gene regulatory networks. Proc Natl Acad Sci USA 2018;115:2252–7. - PMC - PubMed
    1. Huynh-Thu VA, Irrthum A, Wehenkel L, et al. . Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010;5:e12776. - PMC - PubMed
    1. Huynh-Thu VA, Geurts P. dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci Rep 2018;8:3384. - PMC - PubMed

Publication types