Abstract
Single-cell transcriptomic analysis is widely used to study human tumors. However, it remains challenging to distinguish normal cell types in the tumor microenvironment from malignant cells and to resolve clonal substructure within the tumor. To address these challenges, we developed an integrative Bayesian segmentation approach called copy number karyotyping of aneuploid tumors (CopyKAT) to estimate genomic copy number profiles at an average genomic resolution of 5 Mb from read depth in high-throughput single-cell RNA sequencing (scRNA-seq) data. We applied CopyKAT to analyze 46,501 single cells from 21 tumors, including triple-negative breast cancer, pancreatic ductal adenocarcinoma, anaplastic thyroid cancer, invasive ductal carcinoma and glioblastoma, to accurately (98%) distinguish cancer cells from normal cell types. In three breast tumors, CopyKAT resolved clonal subpopulations that differed in the expression of cancer genes, such as KRAS, and signatures, including epithelial-to-mesenchymal transition, DNA repair, apoptosis and hypoxia. These data show that CopyKAT can aid in the analysis of scRNA-seq data in a variety of solid human tumors.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
scRNA-seq data from this study were deposited in the Gene Expression Omnibus (GEO; GSE148673).
Code availability
Software is available at GitHub (https://github.com/navinlabcode/copykat).
References
Peng, J. et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 29, 725–738 (2019).
Ma, L. et al. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Cancer Cell 36, 418–430 (2019).
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Gao, R. et al. Nanogrid single-nucleus RNA sequencing reveals phenotypic diversity in breast cancer. Nat. Commun. 8, 228 (2017).
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).
Taylor, A. M. et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell 33, 676–689 (2018).
Fan, J. et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018).
Freeman, M. F. & Tukey, J. W. Transformations related to the angular and the square root. Ann. Math. Stat. 21, 607–611 (1950).
Petris, G. An R package for dynamic linear models. J. Stat. Softw. 36, 1–16 (2010).
Baslan, T. et al. Genome-wide copy number analysis of single cells. Nat. Protoc. 7, 1024–1041 (2012).
Harada, T. et al. Genome-wide DNA copy number analysis in pancreatic cancer using high-density single nucleotide polymorphism arrays. Oncogene 27, 1951–1960 (2008).
Samuel, N. et al. Integrated genomic, transcriptomic, and RNA-interference analysis of genes in somatic copy number gains in pancreatic ductal adenocarcinoma. Pancreas 42, 1016–1026 (2013).
Cancer Genome Atlas Research Network. Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer Cell 32, 185–203 (2017).
Yao, H. et al. Glypican-3 and KRT19 are markers associating with metastasis and poor prognosis of pancreatic ductal adenocarcinoma. Cancer Biomark. 17, 397–404 (2016).
Girgis, A. H., Bui, A., White, N. M. & Yousef, G. M. Integrated genomic characterization of the kallikrein gene locus in cancer. Anticancer Res. 32, 957–963 (2012).
Dijk, F. et al. Unsupervised class discovery in pancreatic ductal adenocarcinoma reveals cell-intrinsic mesenchymal features and high concordance between existing classification systems. Sci. Rep. 10, 337 (2020).
Heid, I. et al. Co-clinical assessment of tumor cellularity in pancreatic cancer. Clin. Cancer Res. 23, 1461–1470 (2017).
Ravi, N. et al. Identification of targetable lesions in anaplastic thyroid cancer by genome profiling. Cancers 11, 402 (2019).
Ribeiro, F. R., Meireles, A. M., Rocha, A. S. & Teixeira, M. R. Conventional and molecular cytogenetics of human non-medullary thyroid carcinoma: characterization of eight cell line models and review of the literature on clinical samples. BMC Cancer 8, 371 (2008).
Guo, D. et al. Cytokeratin-8 in anaplastic thyroid carcinoma: more than a simple structural cytoskeletal protein. Int. J. Mol. Sci. 19, 577 (2018).
Hunt, J. L. Molecular Pathology of Endocrine Diseases (Springer, 2010).
Barletta, J. A. Endocrine pathology: advances, updates, and diagnostic pearls. Surg. Pathol. Clin. 12, xi–xii (2019).
Asa, S. L. & LiVolsi, V. A. New diagnostic and management approaches in endocrine pathology. Arch. Pathol. Lab. Med. 132, 1228–1230 (2008).
Turner, N. et al. Integrative molecular profiling of triple negative breast cancers identifies amplicon drivers and potential therapeutic targets. Oncogene 29, 2013–2023 (2010).
Gao, R. et al. Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat. Genet. 48, 1119–1130 (2016).
Andre, F. et al. Molecular characterization of breast cancer with high-resolution oligonucleotide comparative genomic hybridization array. Clin. Cancer Res. 15, 441–451 (2009).
Neftel, C. et al. An integrative model of cellular states, plasticity, and genetics for glioblastoma. Cell 178, 835–849 (2019).
Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Hanzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
Xin, Y. et al. Use of the Fluidigm C1 platform for RNA sequencing of single mouse pancreatic islet cells. Proc. Natl Acad. Sci. USA 113, 3293–3298 (2016).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Martin, A. D., Quinn, K. M. & Park, J. H. MCMCpack: Markov chain Monte Carlo in R. J. Stat. Softw. 42, 1–21 (2011).
Kim, C. et al. Chemoresistance evolution in triple-negative breast cancer delineated by single-cell sequencing. Cell 173, 879–893 (2018).
Olshen, A. B., Venkatraman, E. S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).
Willenbrock, H. & Fridlyand, J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21, 4084–4091 (2005).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Acknowledgements
This work was supported by grants to N.E.N. from the American Cancer Society (129098-RSG-16-092-01-TBG), the National Cancer Institute (RO1CA240526, RO1CA236864), the Emerson Collective Cancer Research Fund (20200619153514) and the CPRIT Single Cell Genomics Center (RP180684). N.E.N. is an AAAS Wachtel Scholar, AAAS Fellow, Andrew Sabin Family Fellow and Jack & Beverly Randall Innovator. This study was supported by the MD Anderson Breast Cancer Moonshot Program. This study was supported by the MD Anderson Sequencing Core Facility Grant (CA016672). This project was also supported by a Susan Komen Postdoctoral Fellowship to R.G. (PDF17487910). Other grant support includes the Anaplastic Thyroid Cancer Research Fund (S.Y.L. and J.R.W.) and an institutional multi-investigator research program grant to S.Y.L.
Author information
Authors and Affiliations
Contributions
R.G. and N.E.N. designed the research project. R.G. developed and implemented the computational methods with contributions from N.E.N., Y.Y., A.D., F.W. and K.C. M.H. preprocessed the data. S.F.S. and S.M. provided clinical samples. J.R.W. and S.Y.L. collected thyroid tumor samples. S.B., Y.C.H., Y.L., A.S., T.K. and E.S. performed single-cell sequencing experiments. R.G. and N.E.N. wrote the manuscript with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Biotechnology thanks Elana Fertig, Jan Korbel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–4 and Tables 1 and 2.
Rights and permissions
About this article
Cite this article
Gao, R., Bai, S., Henderson, Y.C. et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol 39, 599–608 (2021). https://doi.org/10.1038/s41587-020-00795-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-020-00795-2
This article is cited by
-
Integration of single-cell RNA-seq and bulk RNA-seq data to construct and validate a cancer-associated fibroblast-related prognostic signature for patients with ovarian cancer
Journal of Ovarian Research (2024)
-
Mosaic loss of Y chromosome is associated with aging and epithelial injury in chronic kidney disease
Genome Biology (2024)
-
Individualized detection of TMPRSS2-ERG fusion status in prostate cancer: a rank-based qualitative transcriptome signature
World Journal of Surgical Oncology (2024)
-
An atlas of cell-type-specific interactome networks across 44 human tumor types
Genome Medicine (2024)
-
Characterization of the stem cell landscape and identification of a stemness-associated prognostic signature in bladder cancer
Cancer Cell International (2024)