Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

doi:10.1371/journal.pcbi.1005595

. 2017 Jun 8;13(6):e1005595.

doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

Ryan R Wick¹, Louise M Judd¹, Claire L Gorrie¹, Kathryn E Holt¹

Affiliations

PMID: 28594827
PMCID: PMC5481147
DOI: 10.1371/journal.pcbi.1005595

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

Ryan R Wick et al. PLoS Comput Biol. 2017.

. 2017 Jun 8;13(6):e1005595.

doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.

Authors

Ryan R Wick¹, Louise M Judd¹, Claire L Gorrie¹, Kathryn E Holt¹

Affiliation

¹ Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Victoria, Australia.

PMID: 28594827
PMCID: PMC5481147
DOI: 10.1371/journal.pcbi.1005595

Abstract

The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Key steps in the Unicycler pipeline.**

**Fig 2. Simulated short-read assemblies: Errors.**
Misassembly and small-error (mismatches and indels) rates for assemblies of simulated short-read sets, summarising results across all reference genomes and replicate tests (total 360 per assembler).

**Fig 3. Simulated short-read assemblies: NGA50.**
NGA50 for assemblies of simulated short-read sets, summarising results across all reference genomes and replicate tests (total 360 per assembler).

**Fig 4. Simulated hybrid assemblies: Errors.**
Error rates for hybrid assemblies of simulated short-read and long-read sets, summarising results across all reference genomes and replicate tests (total 2520 per assembler).

**Fig 5. Simulated hybrid assemblies: NGA50 against long-read depth.**
Mean NGA50 values for hybrid assemblies of simulated read sets. Mean values were calculated across all read lengths, read accuracies and replicate tests for each reference genome (210 hybrid-read sets each); the top panel shows mean values for all 12 reference genomes (2520 hybrid-read sets). Horizontal dashed lines indicate the N50 size of the reference genome. For the bacterial genomes, this is the size of their only chromosome; for *Saccharomyces*, it is the size of chromosome XIII, an intermediate-sized replicon in the genome.

**Fig 6. Simulated hybrid assemblies: Read length and accuracy.**
NGA50 values segregated by read length and read accuracy. These plots summarise results across all reference genomes and replicate tests, but only include the tests of 8x long-read depth. For read lengths, the p-value is from a two-tailed t-test. For read accuracies, the p-value is from a one-way ANOVA test.

**Fig 7. E. *coli* K-12 assemblies: NGA50 against long-read depth.**
Mean NGA50 values for hybrid assemblies of real E. *coli* read sets, summarised across 20 replicate tests at each depth. Top panel shows mean values for all six long-read sets.

**Fig 8. *Klebsiella pneumoniae* INF274 assembler comparison.**
Final assemblies of *Klebsiella pneumoniae* INF274 produced by Unicycler, SPAdes, HGAP and Canu. The contigs/graph of the assembly are shown on the left, coloured by replicon. The read depth plot of plasmid 1’s contig is shown on the right. Low read depth at the ends of the contig is indicative of start-end overlap.

**Fig 9. *Klebsiella pneumoniae* INF125 ONT assemblies over sequencing time.**
Assembly metrics of K. *pneumoniae* INF125 produced by Unicycler, SPAdes, npScarf and miniasm over a four-hour period of sequencing. Miniasm assemblies contain error rates comparable to that of the raw reads and are therefore excluded from the error rate plots.

See this image and copyright information in PMC

Cited by

Identification of Haemoproteus infection in an imported grey crowned crane (Balearica regulorum) in China.
Shen X, Zhai J, Li Y, Gan Y, Liang X, Yu H, Zhang L, Irwin DM, Shen Y, Chen W. Shen X, et al. Parasitol Res. 2024 Oct 11;123(10):349. doi: 10.1007/s00436-024-08373-0. Parasitol Res. 2024. PMID: 39392533 Free PMC article.
Analysis of risk factors and different treatments for infections caused by carbapenem-resistant Acinetobacter baumannii in Shaanxi, China.
He X, Tang J, He S, Huang X. He X, et al. BMC Infect Dis. 2024 Oct 9;24(1):1130. doi: 10.1186/s12879-024-10036-5. BMC Infect Dis. 2024. PMID: 39385067 Free PMC article.
Complex transcriptional regulations of a hyperparasitic quadripartite system in giant viruses infecting protists.
Bessenay A, Bisio H, Belmudes L, Couté Y, Bertaux L, Claverie JM, Abergel C, Jeudy S, Legendre M. Bessenay A, et al. Nat Commun. 2024 Oct 9;15(1):8608. doi: 10.1038/s41467-024-52906-1. Nat Commun. 2024. PMID: 39384766 Free PMC article.
The marine environmental microbiome mediates physiological outcomes in host nematodes.
Xue Y, Xie Y, Cao X, Zhang L. Xue Y, et al. BMC Biol. 2024 Oct 8;22(1):224. doi: 10.1186/s12915-024-02021-w. BMC Biol. 2024. PMID: 39379910 Free PMC article.
First report of carbapenems encoding multidrug-resistant gram-negative bacteria from a pediatric hospital in Gaza Strip, Palestine.
El Aila NA, Al Laham NA, Doijad SP, Imirzalioglu C, Mraheil MA. El Aila NA, et al. BMC Microbiol. 2024 Oct 9;24(1):393. doi: 10.1186/s12866-024-03550-8. BMC Microbiol. 2024. PMID: 39379824 Free PMC article.

See all "Cited by" articles

References

1. Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34(Database issue):D32–6. doi: 10.1093/nar/gkj014 - DOI - PMC - PubMed
1. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9. doi: 10.1038/nmeth.2474 - DOI - PubMed
1. Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:1–15. - PMC - PubMed
1. Kwong JC, McCallum N, Sintchenko V, Howden BP. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015;47(3):199–210. doi: 10.1097/PAT.0000000000000235 - DOI - PMC - PubMed
1. Hunt M, Newbold C, Berriman M, Otto TD. A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 2014;15(3):R42 doi: 10.1186/gb-2014-15-3-r42 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

This work was funded by the NHMRC of Australia (project #1043822 and Fellowship #1061409 to KEH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
- scite Smart Citations

[1] Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34(Database issue):D32–6. doi: 10.1093/nar/gkj014 - DOI - PMC - PubMed

[2] Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34(Database issue):D32–6. doi: 10.1093/nar/gkj014 - DOI - PMC - PubMed

[3] Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9. doi: 10.1038/nmeth.2474 - DOI - PubMed

[4] Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9. doi: 10.1038/nmeth.2474 - DOI - PubMed

[5] Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:1–15. - PMC - PubMed

[6] Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:1–15. - PMC - PubMed

[7] Kwong JC, McCallum N, Sintchenko V, Howden BP. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015;47(3):199–210. doi: 10.1097/PAT.0000000000000235 - DOI - PMC - PubMed

[8] Kwong JC, McCallum N, Sintchenko V, Howden BP. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015;47(3):199–210. doi: 10.1097/PAT.0000000000000235 - DOI - PMC - PubMed

[9] Hunt M, Newbold C, Berriman M, Otto TD. A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 2014;15(3):R42 doi: 10.1186/gb-2014-15-3-r42 - DOI - PMC - PubMed

[10] Hunt M, Newbold C, Berriman M, Otto TD. A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 2014;15(3):R42 doi: 10.1186/gb-2014-15-3-r42 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

Affiliation

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources