BUSCO: Assessing Genomic Data Quality and Beyond
- PMID: 34936221
- DOI: 10.1002/cpz1.323
BUSCO: Assessing Genomic Data Quality and Beyond
Abstract
Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.
Keywords: gene content completeness; genomes; phylogenomics; quality assessment; single-copy orthologs.
© 2021 The Authors. Current Protocols published by Wiley Periodicals LLC.
Similar articles
-
BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. doi: 10.1093/molbev/msab199. Mol Biol Evol. 2021. PMID: 34320186 Free PMC article.
-
BUSCO: Assessing Genome Assembly and Annotation Completeness.Methods Mol Biol. 2019;1962:227-245. doi: 10.1007/978-1-4939-9173-0_14. Methods Mol Biol. 2019. PMID: 31020564
-
Assessing genome assembly quality prior to downstream analysis: N50 versus BUSCO.Mol Ecol Resour. 2021 Jul;21(5):1416-1421. doi: 10.1111/1755-0998.13364. Epub 2021 Mar 9. Mol Ecol Resour. 2021. PMID: 33629477
-
OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes.Nucleic Acids Res. 2025 Jan 6;53(D1):D516-D522. doi: 10.1093/nar/gkae987. Nucleic Acids Res. 2025. PMID: 39535043 Free PMC article.
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
Cited by
-
Genome-Wide Comparisons Reveal Extensive Divergence Within the Lichen Photobiont Genus, Trebouxia.Genome Biol Evol. 2024 Oct 9;16(10):evae219. doi: 10.1093/gbe/evae219. Genome Biol Evol. 2024. PMID: 39475309 Free PMC article.
-
Chromosome-Level Reference Genome of the Ponza Grayling (Hipparchia sbordonii), an Italian Endemic and Endangered Butterfly.Genome Biol Evol. 2024 Jul 3;16(7):evae136. doi: 10.1093/gbe/evae136. Genome Biol Evol. 2024. PMID: 39023104 Free PMC article.
-
Draft genome sequence of Lactiplantibacillus plantarum KCKM 0106, isolated from mustard leaf kimchi.Microbiol Resour Announc. 2024 Jan 17;13(1):e0090823. doi: 10.1128/mra.00908-23. Epub 2023 Dec 1. Microbiol Resour Announc. 2024. PMID: 38038465 Free PMC article.
-
Pan-genome analysis of six Paracoccus type strain genomes reveal lifestyle traits.PLoS One. 2023 Dec 20;18(12):e0287947. doi: 10.1371/journal.pone.0287947. eCollection 2023. PLoS One. 2023. PMID: 38117845 Free PMC article.
-
Complete genome sequence data of Leuconostoc mesenteroides KNU-2 and Weissella hellenica MBEL1842 isolated from kimchi.Data Brief. 2023 Jan 18;47:108919. doi: 10.1016/j.dib.2023.108919. eCollection 2023 Apr. Data Brief. 2023. PMID: 36819902 Free PMC article.
References
Literature Cited
-
- Bağcı, C., Patz, S., & Huson, D. H. (2021). DIAMOND+MEGAN: fast and easy taxonomic and functional analysis of short and long microbiome sequences. Current Protocols, 1, e59. doi: 10.1002/cpz1.59.
-
- Blanco, E., Parra, G., & Guigó, R. (2007). Using geneid to identify genes. Current Protocols in Bioinformatics, 18, 4.3.1-4.3.28.
-
- Boes, K. E., Ribeiro, J. M. C., Wong, A., Harrington, L. C., Wolfner, M. F., & Sirot, L. K. (2014). Identification and characterization of seminal fluid proteins in the Asian tiger mosquito, Aedes albopictus. PLoS Neglected Tropical Diseases, 8, e2946. doi: 10.1371/journal.pntd.0002946.
-
- Borowiec, M. L. (2016). AMAS: A fast tool for alignment manipulation and computing of summary statistics. PeerJ, 4, e1660. doi: 10.7717/peerj.1660.
-
- Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: Architecture and applications. BMC Bioinformatics, 10, 421. doi: 10.1186/1471-2105-10-421.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources