Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;29(1):591-601.
doi: 10.1109/TVCG.2022.3209408. Epub 2022 Dec 20.

Polyphony: an Interactive Transfer Learning Framework for Single-Cell Data Analysis

Polyphony: an Interactive Transfer Learning Framework for Single-Cell Data Analysis

Furui Cheng et al. IEEE Trans Vis Comput Graph. 2023 Jan.

Abstract

Reference-based cell-type annotation can significantly reduce time and effort in single-cell analysis by transferring labels from a previously-annotated dataset to a new dataset. However, label transfer by end-to-end computational methods is challenging due to the entanglement of technical (e.g., from different sequencing batches or techniques) and biological (e.g., from different cellular microenvironments) variations, only the first of which must be removed. To address this issue, we propose Polyphony, an interactive transfer learning (ITL) framework, to complement biologists' knowledge with advanced computational methods. Polyphony is motivated and guided by domain experts' needs for a controllable, interactive, and algorithm-assisted annotation process, identified through interviews with seven biologists. We introduce anchors, i.e., analogous cell populations across datasets, as a paradigm to explain the computational process and collect user feedback for model improvement. We further design a set of visualizations and interactions to empower users to add, delete, or modify anchors, resulting in refined cell type annotations. The effectiveness of this approach is demonstrated through quantitative experiments, two hypothetical use cases, and interviews with two biologists. The results show that our anchor-based ITL method takes advantage of both human and machine intelligence in annotating massive single-cell datasets.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The interface of Polyphony contains three views: the comparison view (A), the anchor set view (B), and the marker view (C). The comparison view provides an overview of the joint embedding space, and offers users interactions to inspect (A1), delete (A2), and add (A3) anchors. The anchor set view orders the anchors in a table, supporting inspection and comparison of different anchors (B1-2). The marker view shows the significant genes (C1) for the query and reference cells from a focal anchor.
Fig. 2.
Fig. 2.
(A) Single-cell transcriptomics data is represented by a matrix that records the amount of RNA corresponding to each gene (row) detected in each cell (column). Batch effects appear in data collected from different studies and must be removed to allow an integrative analysis (B, C).
Fig. 3.
Fig. 3.
The proposed interactive transfer learning framework includes four key steps: anchor recommendation, user feedback, model fine-tuning, and embedding updating.
Fig. 4.
Fig. 4.
(A) Visualization designs in the comparison view. We use different encodings for the query (C) and the reference (D) datasets. The anchor annotations (B) encode the gene expression distances between the query cells and the reference cells, helping users to better understand the integration quality.
Fig. 5.
Fig. 5.
(A) The marker view groups significantly differentially-expressed genes into three columns. (B) The glyph design enables the comparison of gene significance and gene ranking simultaneously.
Fig. 6.
Fig. 6.
The joint embedding space before (A) and after (B) the integration guided by user-specified anchors.
Fig. 7.
Fig. 7.
Integrating and discovering unknown cell populations from the PBMC dataset. The user first gains an initial impression of the two datasets (A) and inspects some low-quality anchors (B1), which potentially contain unknown cell populations. Then, the user tries to improve the integration quality by confirming high-quality anchors containing familiar cell types (B2). The updated embedding fuses the two datasets well (C). An exception is an anchor with a wide border in its anchor annotation (C1), corresponding to a previously-marked anchor (D1). After checking its marker genes, the user confirms that these anchor cells belong to pDC, a cell population missing from the reference dataset.

Similar articles

Cited by

References

    1. Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, and Hicks SC. Orchestrating single-cell analysis with Bioconductor. Nature Methods, 17(2):137–145, 2020. - PMC - PubMed
    1. Argelaguet R, Cuomo AS, Stegle O, and Marioni JC. Computational principles and challenges in single-cell data integration. Nature Biotechnology, 39(10):1202–1215, 2021. - PubMed
    1. Barkas N, Petukhov V, Kharchenko P, and Biederstedt E. pagoda2: Single cell analysis and differential expression. https://github.com/kharchenkolab/pagoda2, 2021.
    1. Barkas N, Petukhov V, Nikolaeva D, Lozinsky Y, Demharter S, Khodosevich K, and Kharchenko PV. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nature Methods, 16(8):695–698, 2019. - PMC - PubMed
    1. Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, Ryu JH, Wagner BK, Shen-Orr SS, Klein AM, Melton DA, and Yanai I. A Single-Cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Systems, 3(4):346–360.e4, 2016. - PMC - PubMed

Publication types