BUSCO: Assessing Genomic Data Quality and Beyond
- PMID: 34936221
- DOI: 10.1002/cpz1.323
BUSCO: Assessing Genomic Data Quality and Beyond
Abstract
Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.
Keywords: gene content completeness; genomes; phylogenomics; quality assessment; single-copy orthologs.
© 2021 The Authors. Current Protocols published by Wiley Periodicals LLC.
Similar articles
-
BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. doi: 10.1093/molbev/msab199. Mol Biol Evol. 2021. PMID: 34320186 Free PMC article.
-
BUSCO: Assessing Genome Assembly and Annotation Completeness.Methods Mol Biol. 2019;1962:227-245. doi: 10.1007/978-1-4939-9173-0_14. Methods Mol Biol. 2019. PMID: 31020564
-
Assessing genome assembly quality prior to downstream analysis: N50 versus BUSCO.Mol Ecol Resour. 2021 Jul;21(5):1416-1421. doi: 10.1111/1755-0998.13364. Epub 2021 Mar 9. Mol Ecol Resour. 2021. PMID: 33629477
-
BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics.Mol Biol Evol. 2018 Mar 1;35(3):543-548. doi: 10.1093/molbev/msx319. Mol Biol Evol. 2018. PMID: 29220515 Free PMC article.
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
Cited by
-
Pathogen-specific social immunity is associated with erosion of individual immune function in an ant.Nat Commun. 2024 Oct 26;15(1):9260. doi: 10.1038/s41467-024-53527-4. Nat Commun. 2024. PMID: 39461955 Free PMC article.
-
A population genomics analysis of the Aotearoa New Zealand endemic rewarewa tree (Knightia excelsa).NPJ Biodivers. 2024 Mar 20;3(1):7. doi: 10.1038/s44185-024-00038-6. NPJ Biodivers. 2024. PMID: 39242911 Free PMC article.
-
QTL analysis of femaleness in monoecious spinach and fine mapping of a major QTL using an updated version of chromosome-scale pseudomolecules.PLoS One. 2024 Feb 23;19(2):e0296675. doi: 10.1371/journal.pone.0296675. eCollection 2024. PLoS One. 2024. PMID: 38394294 Free PMC article.
-
A cloud-based training module for efficient de novo transcriptome assembly using Nextflow and Google cloud.Brief Bioinform. 2024 May 23;25(4):bbae313. doi: 10.1093/bib/bbae313. Brief Bioinform. 2024. PMID: 38941113 Free PMC article.
-
The genome assembly and annotation of the cricket Gryllus longicercus.Sci Data. 2024 Jun 28;11(1):708. doi: 10.1038/s41597-024-03554-z. Sci Data. 2024. PMID: 38942791 Free PMC article.
References
Literature Cited
-
- Bağcı, C., Patz, S., & Huson, D. H. (2021). DIAMOND+MEGAN: fast and easy taxonomic and functional analysis of short and long microbiome sequences. Current Protocols, 1, e59. doi: 10.1002/cpz1.59.
-
- Blanco, E., Parra, G., & Guigó, R. (2007). Using geneid to identify genes. Current Protocols in Bioinformatics, 18, 4.3.1-4.3.28.
-
- Boes, K. E., Ribeiro, J. M. C., Wong, A., Harrington, L. C., Wolfner, M. F., & Sirot, L. K. (2014). Identification and characterization of seminal fluid proteins in the Asian tiger mosquito, Aedes albopictus. PLoS Neglected Tropical Diseases, 8, e2946. doi: 10.1371/journal.pntd.0002946.
-
- Borowiec, M. L. (2016). AMAS: A fast tool for alignment manipulation and computing of summary statistics. PeerJ, 4, e1660. doi: 10.7717/peerj.1660.
-
- Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: Architecture and applications. BMC Bioinformatics, 10, 421. doi: 10.1186/1471-2105-10-421.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources