Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;1(12):e323.
doi: 10.1002/cpz1.323.

BUSCO: Assessing Genomic Data Quality and Beyond

Affiliations

BUSCO: Assessing Genomic Data Quality and Beyond

Mosè Manni et al. Curr Protoc. 2021 Dec.

Abstract

Evaluation of the quality of genomic "data products" such as genome assemblies or gene sets is of critical importance in order to recognize possible issues and correct them during the generation of new data. It is equally essential to guide subsequent or comparative analyses with existing data, as the correct interpretation of the results necessarily requires knowledge about the quality level and reliability of the inputs. Using datasets of near universal single-copy orthologs derived from OrthoDB, BUSCO can estimate the completeness and redundancy of genomic data by providing biologically meaningful metrics based on expected gene content. These can complement technical metrics such as contiguity measures (e.g., number of contigs/scaffolds, and N50 values). Here, we describe the use of the BUSCO tool suite to assess different data types that can range from genome assemblies of single isolates and assembled transcriptomes and annotated gene sets to metagenome-assembled genomes where the taxonomic origin of the species is unknown. BUSCO is the only tool capable of assessing all these types of sequences from both eukaryotic and prokaryotic species. The protocols detail the various BUSCO running modes and the novel workflows introduced in versions 4 and 5, including the batch analysis on multiple inputs, the auto-lineage workflow to run assessments without specifying a dataset, and a workflow for the evaluation of (large) eukaryotic genomes. The protocols further cover the BUSCO setup, guidelines to interpret the results, and BUSCO "plugin" workflows for performing common operations in genomics using BUSCO results, such as building phylogenomic trees and visualizing syntenies. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Assessing an input sequence with a BUSCO dataset specified manually Basic Protocol 2: Assessing an input sequence with a dataset automatically selected by BUSCO Basic Protocol 3: Assessing multiple inputs Alternate Protocol: Decreasing analysis runtime when assessing a large number of small genomes with BUSCO auto-lineage workflow and Snakemake Support Protocol 1: BUSCO setup Support Protocol 2: Visualizing BUSCO results Support Protocol 3: Building phylogenomic trees.

Keywords: gene content completeness; genomes; phylogenomics; quality assessment; single-copy orthologs.

PubMed Disclaimer

Similar articles

Cited by

References

Literature Cited

    1. Bağcı, C., Patz, S., & Huson, D. H. (2021). DIAMOND+MEGAN: fast and easy taxonomic and functional analysis of short and long microbiome sequences. Current Protocols, 1, e59. doi: 10.1002/cpz1.59.
    1. Blanco, E., Parra, G., & Guigó, R. (2007). Using geneid to identify genes. Current Protocols in Bioinformatics, 18, 4.3.1-4.3.28.
    1. Boes, K. E., Ribeiro, J. M. C., Wong, A., Harrington, L. C., Wolfner, M. F., & Sirot, L. K. (2014). Identification and characterization of seminal fluid proteins in the Asian tiger mosquito, Aedes albopictus. PLoS Neglected Tropical Diseases, 8, e2946. doi: 10.1371/journal.pntd.0002946.
    1. Borowiec, M. L. (2016). AMAS: A fast tool for alignment manipulation and computing of summary statistics. PeerJ, 4, e1660. doi: 10.7717/peerj.1660.
    1. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: Architecture and applications. BMC Bioinformatics, 10, 421. doi: 10.1186/1471-2105-10-421.

LinkOut - more resources