Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 1;35(3):543-548.
doi: 10.1093/molbev/msx319.

BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics

Affiliations

BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics

Robert M Waterhouse et al. Mol Biol Evol. .

Abstract

Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

Keywords: bioinformatics; evolution; metagenomics; transcriptomics.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1
Fig. 1
BUSCO completeness assessments for genomics data quality control. Assessments of initial, intermediate, and latest versions of the (a) honeybee and (b) chicken genomes and their annotated gene sets with the Metazoa, Hymenoptera, and Aves lineage data sets. Bar charts produced with the BUSCO plotting tool show proportions classified as complete (C, blues), complete single-copy (S, light blue), complete duplicated (D, dark blue), fragmented (F, yellow), and missing (M, red).
<sc>Fig</sc>. 2
Fig. 2
BUSCO-trained ab initio gene prediction with Augustus. When no pretrained parameter set is available, for example, for (a) the centipede, BUSCO-trained predictions are substantially better than using Augustus parameters from another arthropod (fly). Where species-specific-trained parameter sets are available, BUSCO-trained predictions are almost as good, for example, (b) tomato, just as good, for example, (c) fruit fly, or even better, for example, (d) Tribolium beetle. Performance was assessed by computing the percent sequence length match of the ab initio gene models to the official gene set annotations for each species (Materials and Methods).
<sc>Fig</sc>. 3
Fig. 3
Genome and transcriptome BUSCO assessments to identify universal single-copy markers for phylogenomics studies. The phylogeny was generated using the Euarchontoglires results to identify complete single-copy orthologs found in all species for building the superalignment used for maximum likelihood tree reconstruction (Materials and Methods). Mammalia and Metazoa results produced identical tree topologies. Bars below the BUSCO results show how the sizes of the assessment data sets influence the superalignment lengths and the analysis runtimes. The tree was rooted with the rabbit, all nodes have 100% bootstrap support, branch lengths are in substitutions per site (s.s.).

Similar articles

Cited by

References

    1. Blanga-Kanfi S, Miranda H, Penn O, Pupko T, DeBry RW, Huchon D.. 2009. Rodent phylogeny revised: analysis of six nuclear genes from all major rodent clades. BMC Evol Biol. 9:71.. - PMC - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL.. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421.. - PMC - PubMed
    1. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T.. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973. - PMC - PubMed
    1. Davey JW, Chouteau M, Barker SL, Maroja L, Baxter SW, Simpson F, Joron M, Mallet J, Dasmahapatra KK, Jiggins CD.. 2016. Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution. G3 (Bethesda) 6(3):695–708. - PMC - PubMed
    1. Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol. 7(10):e1002195.. - PMC - PubMed