Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep;12(5):449-62.
doi: 10.1093/bib/bbr042. Epub 2011 Aug 27.

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

Affiliations

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

Pascale Gaudet et al. Brief Bioinform. 2011 Sep.

Abstract

The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The concept of PAINT. This example presents a MutS homolog family showing experimental evidence for ‘GO term’. (A) Primary experimentally based annotations to one term or any of its ancestors (light green labels) are used to infer that the most recent common ancestor (CA) of the all those proteins also had that function. The curator notes this by dragging the term onto the node of the MCRA (orange box). (B) Subsequently, PAINT propagated this annotation forward to other descendant leaves (blue labels).
Figure 2:
Figure 2:
Gain of function. The MRCA of all eukaryotic MSH2 orthologs (leftmost orange circle) already likely functioned in DNA repair (inherited from LUCA, data not shown) and maintenance of DNA repeats. The gene was then coopted in the animal MRCA for a role in apoptosis, and later, in the vertebrate MRCA for a role in somatic hypermutation of immunoglobulin genes. Inferences for ancestral genes (orange circles) are based on experimental GO annotations for the genes shown in green, which are inferred by inheritance for descendants including uncharacterized genes in extant organisms shown in blue. Thus, the ortholog in Bos taurus, for example, will be annotated by PAINT with different functions than the ortholog in Saccharomyces cerevisiae.
Figure 3:
Figure 3:
Loss of Function. The active site residues of PGM1 relatives have been annotated in the CDD database based on the 3D protein structure for PGM from Paramecium tetraurelia. In PAINT, the biocurator used the integrated multiple sequence alignment viewer to determine that key active site residues are mutated in all of the vertebrate PGM5 orthologs, suggesting that phosphoglucomutase activity was lost shortly after duplication. The biocurator correspondingly annotated the vertebrate ancestor of PGM5 with ‘NOT phosphoglucomutase activity’, which PAINT then propagated to all vertebrate orthologs of PGM5.
Figure 4:
Figure 4:
General workflow for annotation of functional evolution events using PAINT. Step1: The curator uses experimental-based annotations to give an initial hypothesis that the function first appeared in the MRCA of all genes with a related experiment-based annotation. Step 2: The curator decides which ancestor is most appropriate for annotation: either the initially hypothesized MRCA (Option A); an earlier ancestor (Option B), meaning that the MRCA from Step 1 likely inherited its annotation from an earlier ancestor; or more recent ancestor(s) (Option C), meaning that there was homoplasy and the MRCA from Step 1 is not where the function first appeared.
Figure 5:
Figure 5:
A simplified phylogeny of the SOD family (PTHR10003). The last universal common ancestor, LUCA, was duplicated in the ancestors to eukaryotes (square node). The descendents of the duplication that shows the least divergence from its ancestor also retained the SOD activity. That was lost in the CCS clade.

Similar articles

Cited by

References

    1. Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9. - PMC - PubMed
    1. Gene Ontology Consortium. The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2010;38:D331–5. - PMC - PubMed
    1. du Plessis L, Skunca N, Dessimoz C. The what, where, how and why of gene ontology–a primer for bioinformaticians. Brief Bioinform. 2011 In press doi: 10.1093/bib/bbr002. - PMC - PubMed
    1. Gaudet P and the Reference Genome Group of the Gene Ontology Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 2009;5(7):e1000431. - PMC - PubMed
    1. Felsenstein J. Inferring Phylogenies. Massachusetts: Sinauer Associates Inc.; 2004. ISBN 0-87893-177-5.

Publication types