The present and future of de novo whole-genome assembly
- PMID: 27742661
- DOI: 10.1093/bib/bbw096
The present and future of de novo whole-genome assembly
Abstract
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome.
Keywords: de Bruijn graph; de novo assembly algorithms; next-generation sequencing; single-molecule sequencing.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
FastEtch: A Fast Sketch-Based Assembler for Genomes.IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11. IEEE/ACM Trans Comput Biol Bioinform. 2019. PMID: 28910776
-
Assembly of long error-prone reads using de Bruijn graphs.Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12. Proc Natl Acad Sci U S A. 2016. PMID: 27956617 Free PMC article.
-
Advancements in long-read genome sequencing technologies and algorithms.Genomics. 2024 May;116(3):110842. doi: 10.1016/j.ygeno.2024.110842. Epub 2024 Apr 11. Genomics. 2024. PMID: 38608738 Review.
-
The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing.Brief Funct Genomics. 2019 Feb 14;18(1):1-12. doi: 10.1093/bfgp/ely037. Brief Funct Genomics. 2019. PMID: 30462154 Review.
Cited by
-
Mosaicism in Human Health and Disease.Annu Rev Genet. 2020 Nov 23;54:487-510. doi: 10.1146/annurev-genet-041720-093403. Epub 2020 Sep 11. Annu Rev Genet. 2020. PMID: 32916079 Free PMC article. Review.
-
Preparation of Mammalian Nascent RNA for Long Read Sequencing.Curr Protoc Mol Biol. 2020 Dec;133(1):e128. doi: 10.1002/cpmb.128. Curr Protoc Mol Biol. 2020. PMID: 33085989 Free PMC article.
-
Hybrid-hybrid correction of errors in long reads with HERO.Genome Biol. 2023 Dec 1;24(1):275. doi: 10.1186/s13059-023-03112-7. Genome Biol. 2023. PMID: 38041098 Free PMC article.
-
Highly Continuous Genome Assembly of Eurasian Perch (Perca fluviatilis) Using Linked-Read Sequencing.G3 (Bethesda). 2018 Dec 10;8(12):3737-3743. doi: 10.1534/g3.118.200768. G3 (Bethesda). 2018. PMID: 30355765 Free PMC article.
-
App-SpaM: phylogenetic placement of short reads without sequence alignment.Bioinform Adv. 2021 Oct 13;1(1):vbab027. doi: 10.1093/bioadv/vbab027. eCollection 2021. Bioinform Adv. 2021. PMID: 36700102 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials