Assemblathon 1: a competitive assessment of de novo short read assembly methods
- PMID: 21926179
- PMCID: PMC3227110
- DOI: 10.1101/gr.126599.111
Assemblathon 1: a competitive assessment of de novo short read assembly methods
Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
Figures
Similar articles
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27. Proc Natl Acad Sci U S A. 2011. PMID: 21187386 Free PMC article.
-
Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6. BMC Genomics. 2020. PMID: 32727443 Free PMC article.
-
Genetic variation and the de novo assembly of human genomes.Nat Rev Genet. 2015 Nov;16(11):627-40. doi: 10.1038/nrg3933. Epub 2015 Oct 7. Nat Rev Genet. 2015. PMID: 26442640 Free PMC article. Review.
-
Chromosome-level hybrid de novo genome assemblies as an attainable option for nonmodel insects.Mol Ecol Resour. 2020 Sep;20(5):1277-1293. doi: 10.1111/1755-0998.13176. Epub 2020 Jun 7. Mol Ecol Resour. 2020. PMID: 32329220 Review.
Cited by
-
The genome sequence of Lone Star virus, a highly divergent bunyavirus found in the Amblyomma americanum tick.PLoS One. 2013 Apr 29;8(4):e62083. doi: 10.1371/journal.pone.0062083. Print 2013. PLoS One. 2013. PMID: 23637969 Free PMC article.
-
PRICE: software for the targeted assembly of components of (Meta) genomic sequence data.G3 (Bethesda). 2013 May 20;3(5):865-80. doi: 10.1534/g3.113.005967. G3 (Bethesda). 2013. PMID: 23550143 Free PMC article.
-
Whole-Genome Sequencing Analyses Reveal the Whip-like Tail Formation, Innate Immune Evolution, and DNA Repair Mechanisms of Eupleurogrammus muticus.Animals (Basel). 2024 Jan 29;14(3):434. doi: 10.3390/ani14030434. Animals (Basel). 2024. PMID: 38338077 Free PMC article.
-
LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads.Gigascience. 2015 Aug 4;4:35. doi: 10.1186/s13742-015-0076-3. eCollection 2015. Gigascience. 2015. PMID: 26244089 Free PMC article.
-
Finishing monkeypox genomes from short reads: assembly analysis and a neural network method.BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):497. doi: 10.1186/s12864-016-2826-8. BMC Genomics. 2016. PMID: 27585810 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
- F31 HG000064/HG/NHGRI NIH HHS/United States
- R01 HG003474/HG/NHGRI NIH HHS/United States
- HHMI/Howard Hughes Medical Institute/United States
- U41HG004568/HG/NHGRI NIH HHS/United States
- U01HG004695/HG/NHGRI NIH HHS/United States
- 1U24CA143858-01/CA/NCI NIH HHS/United States
- U41 HG004568/HG/NHGRI NIH HHS/United States
- U24 CA143858/CA/NCI NIH HHS/United States
- K22 HG000064/HG/NHGRI NIH HHS/United States
- P41HG002371/HG/NHGRI NIH HHS/United States
- U54 HG004555/HG/NHGRI NIH HHS/United States
- P41 HG002371/HG/NHGRI NIH HHS/United States
- HG00064/HG/NHGRI NIH HHS/United States
- R21 AA022707/AA/NIAAA NIH HHS/United States
- U01 HG004695/HG/NHGRI NIH HHS/United States
- U54HG004555/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources