Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 27;44(13):e117.
doi: 10.1093/nar/gkw430. Epub 2016 May 13.

TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis

Affiliations

TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis

Zhicheng Ji et al. Nucleic Acids Res. .

Abstract

When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
TSCAN Overview. (AB) A toy example illustrating a limitation of cell-based MST. Here cells (blue circles) are placed in a two dimensional space, and the true biological time runs top-down. An MST that connects cells is not unique. Both (A) and (B) are possible solutions. (B) is more consistent with the truth. However, in reality, random measurement noise may shift the cell labeled by ‘*’ away from other cells as indicated by the arrow and dashed lines. As a result, (B) is no longer an MST. The MST in (A) on the other hand does not reflect the true order of cells. (C) The true time-axis can be found if one first groups similar cells into clusters and then constructs an MST to connect cluster centers. (D) TSCAN first constructs cluster-based MST (five clusters of cells encoded by different colors are shown as an example; numbers indicate cluster centers). The tree can have multiple paths (e.g. 1-2-3-4 or 1-2-3-5). TSCAN orders cells along each path by projecting each cell onto the tree edge. (E) The number of principal components to retain is determined by finding the best piecewise linear fit consisting of two lines (dashed).
Figure 2.
Figure 2.
TSCAN graphical user interface. Left panel contains function menus and tools for setting parameters. Right panel displays data and results. The top scatter plot shows the MST constructed for the LPS data (see Results). Cells (dots) are displayed based on their first two principal components. Clusters of cells are indicated by different colors. Numbers are cluster centers. Expression level of a marker gene BCL3 is shown for each cell. Larger marker size means higher expression. The bottom plot shows the average BCL3 expression for each tree node, standardized across all nodes to have zero mean and unit standard deviation.
Figure 3.
Figure 3.
TSCAN analysis in HSMM data set using 518 a priori chosen genes for pseudo-time reconstruction. (A) MST reported by TSCAN is shown in the three-dimensional space spanned by the first three PCs of formula image. (B) Users can display cells and MST in chosen PCs (e.g. PC1 and PC2). (C) Mean expression level of ENO3 in each cluster. (D) Mean expression level of SPHK1 in each cluster. Values in (C) and (D) are both standardized across all clusters to have zero mean and unit SD.
Figure 4.
Figure 4.
Evaluation results for different methods in HSMM data set where pseudo-time was constructed based on 518 a priori chosen genes. (A) POS score. (B) Robustness measured by the average similarity score from 100 independent perturbations. The heat map shows robustness of each method in each perturbation scheme. Cell Perturb: cell-level perturbation. Expr Perturb: expression-level perturbation. (C) Mean rank of gold standard genes. (D) Number of detected gold standard genes among top differential genes.
Figure 5.
Figure 5.
Demonstration of GUI and TSCAN analysis of HSMM data using all genes for pseudo-time reconstruction. (A) MST constructed by TSCAN using all genes. (B) Users can choose a marker gene in GUI to visualize its expression. (C) Users can define a path by specifying the clusters to include and their ordering. (D) The average expression of SPHK1 in each cluster. (E) The average expression of ENO3 in each cluster.
Figure 6.
Figure 6.
Evaluation results for different methods in HSMM data where pseudo-time was constructed using all genes. (A) POS score. (B) Robustness measured by the average similarity score from 100 independent perturbations. (C) Mean rank of gold standard genes. (D) Number of detected gold standard genes among top differential genes.
Figure 7.
Figure 7.
MEF2C and MYH2 expression patterns in HSMM data set where pseudo-time was constructed using all genes. The expression of each gene in each cell is plotted as a function of cell order on the pseudo-time axis. The solid curves are the fitted GAM function. The dashed curve is the GAM fit for ENO3, the marker gene used to determine the path direction.
Figure 8.
Figure 8.
Evaluation results for different methods in LPS data set. (A) POS score. (B) Robustness measured by the average similarity score from 100 independent perturbations. (C) Mean rank of gold standard genes. (D) Number of detected gold standard genes among top differential genes.
Figure 9.
Figure 9.
STAT2 expression patterns in LPS data set. STAT2 expression in each cell is plotted as a function of cell order on the pseudo-time axis. The orange curve is the fitted GAM function.
Figure 10.
Figure 10.
Evaluation results for different methods in qNSC data set. (A) Robustness measured by the average similarity score from 100 independent perturbations. (B) Mean rank of gold standard genes. (C) Number of detected gold standard genes among top differential genes.

Similar articles

Cited by

References

    1. Tang F., Barbacioru C., Wang Y., Nordman E., Lee C., Xu N., Wang X., Bodeau J., Tuch B.B., Siddiqui A., et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 2009;6:377–382. - PubMed
    1. Tang F., Barbacioru C., Bao S., Lee C., Nordman E., Wang X., Lao K., Surani M. A. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell. 2010;6:468–478. - PMC - PubMed
    1. Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
    1. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10:57–63. - PMC - PubMed
    1. Schena M., Shalon D., Davis R. W., Brown P.O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. - PubMed

Publication types