Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct:248-249:34-38.
doi: 10.1016/j.cancergen.2020.09.005. Epub 2020 Oct 2.

CytoGPS: A large-scale karyotype analysis of CML data

Affiliations

CytoGPS: A large-scale karyotype analysis of CML data

Zachary B Abrams et al. Cancer Genet. 2020 Oct.

Abstract

Karyotyping, the practice of visually examining and recording chromosomal abnormalities, is commonly used to diagnose diseases of genetic origin, including cancers. Karyotypes are recorded as text written in the International System for Human Cytogenetic Nomenclature (ISCN). Downstream analysis of karyotypes is conducted manually, due to the visual nature of analysis and the linguistic structure of the ISCN. The ISCN has not been computer-readable and, as such, prevents the full potential of these genomic data from being realized. In response, we developed CytoGPS, a platform to analyze large volumes of cytogenetic data using a Loss-Gain-Fusion model that converts the human-readable ISCN karyotypes into a machine-readable binary format. As proof of principle, we applied CytoGPS to cytogenetic data from the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer, a National Cancer Institute hosted database of over 69,000 karyotypes of human cancers. Using the Jaccard coefficient to determine similarity between karyotypes structured as binary vectors, we were able to identify novel patterns from 4,968 Mitelman CML karyotypes, such as the co-occurrence of trisomy 19 and 21. The CytoGPS platform unlocks the potential for large-scale, comparative analysis of cytogenetic data. This methodological platform is freely available at CytoGPS.org.

Keywords: Bioinformatics; Chronic myeloid leukemia; CytoGPS; Cytogenetics; Data science; Karyotypes.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. T-SNE projection of the CML LGF karyotype data using the Jaccard distance demonstrates approximately 28 distinguishable clusters.
Each cluster is represented by an ‘eye like’ structure, and is defined largely by a single abnormality or a set of abnormalities. Karyotypes that contain additional abnormalities fan out from center.
Figure 2:
Figure 2:. Heat map of recurrent aberrations by cluster.
This heat map shows the relationship between the most common cytogenetic abnormalities (y-axis) by each individual cluster (x-axis). This illustrates that different clusters are cytogenetically defined by certain commonly recurring abnormalities.

Similar articles

Cited by

References

    1. Shuman S, Structure, mechanism, and evolution of the mRNA capping apparatus. Prog Nucleic Acid Res Mol Biol, 2001. 66: p. 1–40. - PubMed
    1. Heim S and Mitelman F, Cancer cytogenetics: chromosomal and molecular genetic aberrations of tumor cells. 2015: John Wiley & Sons.
    1. Stevens-Kroef M, et al., Cytogenetic Nomenclature and Reporting. Methods Mol Biol, 2017. 1541: p. 303–309. - PubMed
    1. Hiller B, et al., CyDAS: a cytogenetic data analysis system. Bioinformatics, 2005. 21(7): p. 1282–3. - PubMed
    1. Abrams ZB, et al., CytoGPS: a web-enabled karyotype analysis tool for cytogenetics. Bioinformatics, 2019. 35(24): p. 5365–5366. - PMC - PubMed

Publication types