Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 19;23(6):bbac422.
doi: 10.1093/bib/bbac422.

CReSIL: accurate identification of extrachromosomal circular DNA from long-read sequences

Affiliations

CReSIL: accurate identification of extrachromosomal circular DNA from long-read sequences

Visanu Wanchai et al. Brief Bioinform. .

Erratum in

Abstract

Extrachromosomal circular DNA (eccDNA) of chromosomal origin is found in many eukaryotic species and cell types, including cancer, where eccDNAs with oncogenes drive tumorigenesis. Most studies of eccDNA employ short-read sequencing for their identification. However, short-read sequencing cannot resolve the complexity of genomic repeats, which can lead to missing eccDNA products. Long-read sequencing technologies provide an alternative to constructing complete eccDNA maps. We present a software suite, Construction-based Rolling-circle-amplification for eccDNA Sequence Identification and Location (CReSIL), to identify and characterize eccDNA from long-read sequences. CReSIL's performance in identifying eccDNA, with a minimum F1 score of 0.98, is superior to the other bioinformatic tools based on simulated data. CReSIL provides many useful features for genomic annotation, which can be used to infer eccDNA function and Circos visualization for eccDNA architecture investigation. We demonstrated CReSIL's capability in several long-read sequencing datasets, including datasets enriched for eccDNA and whole genome datasets from cells containing large eccDNA products. In conclusion, the CReSIL suite software is a versatile tool for investigating complex and simple eccDNA in eukaryotic cells.

Keywords: CRESIL; bioinformatic tool; eccDNA; long-read sequence.

PubMed Disclaimer

Conflict of interest statement

None of the authors have any competing interests.

Figures

Figure 1
Figure 1
EccDNA identification by long-read sequencing. Step 0. The experimental workflow begins with purified genomic DNA; chromosomal DNA (blue), mitochondrial DNA (magenta), eccDNA (red), restriction enzyme (green), CRISPR-Cas9 (cyan), and exonuclease V (yellow). Step 1. A read (left panel) can be aligned on regions of 2 chromosomes (green and red) of the reference genome (blue) with breakpoint reads (dashed line) linking the aligned chromosomal regions and trimmed unmapped portions (gray). Self-read dot plots (right panels) showing read regions that align to the reference genome (blue ovals); reads without CTCs (left dot plot) and with CTCs (right dot plot). Step 2. Merged regions R and S with the reference sequences (green and red bars) and the breakpoint event (dashed lines) that links the two regions; arrows for non-breakpoint reads (light blue), breakpoint reads (blue), and reads with CTCs (magenta) point in the direction of the aligned orientation on the plus strand of the reference sequence (arrows above the bars) or the minus strand (arrows below the bars). Step 3. The information of the read alignments was converted to directed graphs. Step 4. The reads were assembled using our developed regions/linkages algorithm; the assembled sequences were polished, variants were identified, and eccDNA was annotated with genomic features such as exon, repeat, and CpG. Step 5. Circos visualization to present the eccDNA architecture.
Figure 2
Figure 2
Evaluation of eccDNA detection performance of CReSIL and its comparison with the other tools. A) Frequency polygon plot showing the size distribution of simulated eccDNA of true positive (TP, orange line) and true negative (TN, green line) datasets. B) Donut plots showing the percent distribution of the number of chromosomal regions of the individual simulated eccDNA datasets. C) Bar plots showing the number of simulated eccDNA that contains CTC reads (magenta) and non-CTC reads (dark blue) across different sequencing depths. D) Stacked bar plots showing the number of eccDNA detected for the true positive dataset by different tools. The eccDNA detection results are classified into five categories; CircularT (red) = circular sequences with 95% reciprocal overlaps and 90% identity with the simulated eccDNA sequences, CircularF (orange) = circular sequences without the criteria, LinearT (dark gray) = linear sequences with the criteria, LinearF (light gray) = linear sequences without the criteria, and not detected (black) = the tool cannot detect; the number of true positive eccDNA (blue dashed line). E) Stacked bar plots presenting the number of eccDNA detected for the true negative dataset. See the color code for panel D, the number of true negative eccDNA (blue dashed line). F) Point and line plots showing the performance (F1 score) of eccDNA detection of the individual tools across different sequencing depths.
Figure 3
Figure 3
Identification of eccDNA from long-read sequencing of eccDNA enrichment samples derived from human and mouse cells. A) Stacked bar plot showing the fraction of non-CTC reads (dark blue) and CTCs (magenta) across five human cell samples and six mouse cell samples; the total number of high-quality reads after the CReSIL trimming step (yellow), in a million read units; * = datasets prepared by primer-free-based RCA; the rest were prepared by primer-based RCA; # = sample prepared for sequencing without debranching. B) Box-whiskers plots showing the fraction of reads used for eccDNA identification that was kept after trimming of normal (dark blue) and CTC (magenta) reads. C) Bar plots summarizing the number of eccDNA in human (top panel) and mouse (bottom panel) datasets D) Violin boxplots showing the distributions of the length of identified eccDNA. E) Upset plots showing overlaps (right panels), only overlapped numbers over 100 are shown. F) Donut charts showing the percentages of eccDNA harbored repeats, genes, and CpG islands. G) Stacked bar plots showing the frequency of different classes of repeats harbored in identified eccDNA. H) Bar plots showing the frequency of CpG, exon, intron, 3’UTR, and 5’UTR harbored in identified eccDNA.
Figure 4
Figure 4
Identification of eccDNA from long-read sequencing of eccDNA enrichment samples derived from human and mouse cells. A) Hexagonal bin plots showing no correlation between the length of the identified eccDNA with their coverage depth. B) Bar plots showing the fraction of the identified eccDNA containing variants (SNVs and/or INDELs) based on a quality cut-off of 20. C) Bar plots showing the distribution of deletions, insertions, and SNVs of the identified eccDNA. D) Circos plots visualizing the architecture of three selected human eccDNA datasets. E) Circos plots visualizing the architecture of three selected mouse eccDNA datasets (see Figure 1.5 for lane annotation; data set information (bold), eccDNA name (italics), coverage depth (underline).
Figure 5
Figure 5
Identification of eccDNA from WGLS of yeast human and mouse cells. A) Scheme showing CReSIL extension workflow to identify eccDNA from WGLS dataset; the sequencing depth on a small window of focal amplified regions (blue bars) was higher than the background (gray), and the identified focal regions (horizontal lines) are shown with the breakpoint reads generating linkages between the regions (dark blue curved lines); also shown are the number of breakpoints reads (thicker lines = more reads) and the number breakpoints reads at that location (higher vertical lines = more reads). B) Dot-line plot of F1 scores showing the performance of CReSIL in identifying eccDNA from WGLS synthetic data by mixing 20× of human genome reads with different coverage of simulated eccDNA reads of the true positive and true negative set. C) Circos plot showing the identified yeast eccDNA of the known circular rDNA. D) Histogram showing the distribution of normal reads (dark blue), breakpoint reads (red), and CTC reads (magenta) with the size of identified eccDNA of rDNA (vertical dashed line). E) Bar plots showing the number of identified eccDNA from the WGLS datasets. Green represents circular, and gray represents non-circular. F) Circos plots showing examples of identified eccDNA with high coverage depth on centromere regions of human (left) and mouse (right) datasets. All are satellite repeats. G) Circos plots showing examples of identified eccDNA containing gene(s) of human (left) and mouse (right) datasets (see Figure 1.5 for lane annotation; dataset information (bold), the eccDNA name (italic), and coverage depth (underline).

Similar articles

Cited by

References

    1. Paulsen T, Kumar P, Koseoglu MM, et al. Discoveries of extrachromosomal circles of DNA in normal and tumor cells. Trends Genet 2018;34:270–8. - PMC - PubMed
    1. Zuo S, Yi Y, Wang C, et al. Extrachromosomal circular DNA (eccDNA): from chaos to function. Front Cell Dev Biol 2022;9:792555. - PMC - PubMed
    1. Peng H, Mirouze M, Bucher E. Extrachromosomal circular DNA: a neglected nucleic acid molecule in plants. Curr Opin Plant Biol 2022;69:102263. - PubMed
    1. Kanda T, Otter M, Wahl GM. Mitotic segregation of viral and cellular acentric extrachromosomal molecules by chromosome tethering. J Cell Sci 2001;114:49–58. - PubMed
    1. Prada-Luengo I, Moller HD, Henriksen RA, et al. Replicative aging is associated with loss of genetic heterogeneity from extrachromosomal circular DNA in Saccharomyces cerevisiae. Nucleic Acids Res 2020;48:7883–98. - PMC - PubMed

Publication types