Loading [MathJax]/jax/output/PreviewHTML/jax.js
Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 25;16(8):1191.
doi: 10.3390/v16081191.

VOGDB-Database of Virus Orthologous Groups

Affiliations

VOGDB-Database of Virus Orthologous Groups

Lovro Trgovec-Greif et al. Viruses. .

Abstract

Computational models of homologous protein groups are essential in sequence bioinformatics. Due to the diversity and rapid evolution of viruses, the grouping of protein sequences from virus genomes is particularly challenging. The low sequence similarities of homologous genes in viruses require specific approaches for sequence- and structure-based clustering. Furthermore, the annotation of virus genomes in public databases is not as consistent and up to date as for many cellular genomes. To tackle these problems, we have developed VOGDB, which is a database of virus orthologous groups. VOGDB is a multi-layer database that progressively groups viral genes into groups connected by increasingly remote similarity. The first layer is based on pair-wise sequence similarities, the second layer is based on the sequence profile alignments, and the third layer uses predicted protein structures to find the most remote similarity. VOGDB groups allow for more sensitive homology searches of novel genes and increase the chance of predicting annotations or inferring phylogeny. VOGD B uses all virus genomes from RefSeq and partially reannotates them. VOGDB is updated with every RefSeq release. The unique feature of VOGDB is the inclusion of both prokaryotic and eukaryotic viruses in the same clustering process, which makes it possible to explore old evolutionary relationships of the two groups. VOGDB is freely available at vogdb.org under the CC BY 4.0 license.

Keywords: comparative Genomics; genome analysis; genome annotation; orthologous groups; protein families; virus genomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Schema of the layered structure of the database. For each layer, different tools were used to create clusters. Clusters from every next layer are built from the clusters of the previous layer and are connected by more remote similarity.
Figure 2
Figure 2
Number of groups per layer in different size bins. Size bins represent the range of the number of proteins for groups in a certain bin. The distribution with many smaller clusters and fewer of the larger ones is what is also observed in the similar databases.
Figure 3
Figure 3
Homogeneity of functional annotations and protein structure classifications in VOGDB layers compared to the random model. (af) The groups from each layer are put into size bins based on the number of proteins with functional and structural annotation. The random model is created by randomly redistributing the functional and structural annotation labels between the proteins with respective annotation 1000 times and calculating the overall homogeneity. The results show that groups from VOGDB layers are significantly more homogeneous in terms of SwissProt keywords and structural classifications based on the SCOPe superfamilies (Kolmogorov–Smirnov test, p < ).
Figure 4
Figure 4
Homogeneity of SwissProt keywords (a) and SCOPe superfamilies (b) for layers from VOGDB and the other databases with orthologous/homologous groups: pVOG (phage orthologous groups), PHROG (phage remote orthologous groups) and COG (prokaryotic orthologous groups). The databases are split into size bins according to the number of proteins with a functional or structural annotation. Bins containing less than 3 proteins are not shown. The results show that the function and structure-based homogeneity of the layers from VOGDB are in the same range as in other similar databases.

Similar articles

References

    1. Villarreal L. Encyclopedia of Virology. Elsevier; Amsterdam, The Netherlands: 2008. Evolution of Viruses; pp. 174–184. - DOI
    1. Hendrix R.W., Smith M.C.M., Burns R.N., Ford M.E., Hatfull G.F. Evolutionary relationships among diverse bacteriophages and prophages: All the world’s a phage. Proc. Natl. Acad. Sci. USA. 1999;96:2192–2197. doi: 10.1073/pnas.96.5.2192. - DOI - PMC - PubMed
    1. Mushegian A.R. Are There 1031 Virus Particles on Earth, or More, or Fewer? J. Bacteriol. 2020;202:e00052-20. doi: 10.1128/JB.00052-20. - DOI - PMC - PubMed
    1. Koonin E.V., Krupovic M., Dolja V.V. The global virome: How much diversity and how many independent origins? Environ. Microbiol. 2023;25:40–44. doi: 10.1111/1462-2920.16207. - DOI - PubMed
    1. Krishnamurthy S.R., Wang D. Origins and challenges of viral dark matter. Virus Res. 2017;239:136–142. doi: 10.1016/j.virusres.2017.02.002. - DOI - PubMed

LinkOut - more resources