Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Jun:32:33-8.
doi: 10.1016/j.sbi.2015.01.007. Epub 2015 Feb 10.

Template-based prediction of protein function

Affiliations
Review

Template-based prediction of protein function

Donald Petrey et al. Curr Opin Struct Biol. 2015 Jun.

Abstract

We discuss recent approaches for structure-based protein function annotation. We focus on template-based methods where the function of a query protein is deduced from that of a template for which both the structure and function are known. We describe the different ways of identifying a template. These are typically based on sequence analysis but new methods based on purely structural similarity are also being developed that allow function annotation based on structural relationships that cannot be recognized by sequence. The growing number of available structures of known function, improved homology modeling techniques and new developments in the use of structure allow template-based methods to be applied on a proteome-wide scale and in many different biological contexts. This progress significantly expands the range of applicability of structural information in function annotation to a level that previously was only achievable by sequence comparison.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Function annotation using a template library
The structure of a query protein (A) is used to scan a library of templates with known function (B). Templates can be proteins with various binding partners including other proteins (green), peptides (teal), RNA/DNA (brown) or small molecules (red star). For each complex in the library, the query, template and binding partner are placed in the same coordinate system by superposing the template and query based on global or local similarity (C, dotted line). An interaction model is then created which defines the parameters used to determine whether the query has functional properties similar to the template. These can range from an estimate of the physical interaction energy derived from residues interactions (D, yellow lines) in a 3-dimensional model of the interface, properties such as sequence conservation and covariation in the interface, or other features used as input to machine learning approaches.
Figure 2
Figure 2. Using a machine learning classifier for protein function annotation
Blue panel: Proteins which share a functional relationship are collected, where the relationship can be specific (e.g., two proteins carrying out the same enzymatic reaction) or general (e.g., involved in protein-protein interaction). A vector of features (x,y…) is calculated for each structure in the collection, where x and y are the numerical quantification of some property of the structure (e.g., x may be the number of residues in the largest hydrophobic patch, and y might be the average degree of evolutionary conservation of those residues). A machine learning classifier takes this training set of feature vectors and attempts to identify patterns, i.e., the numerical values that are more likely to be associated with the function. This is typically done by comparing the feature vectors to those calculated for a collection of proteins known not to carry it out and quantifying any difference using statistical measures. Red panel: In annotation, the same set of features is calculated for a protein whose function is unknown and a confidence score for whether the protein has the given function is calculated based on its similarity to patterns found for the training set.

Similar articles

Cited by

References

    1. The UniProt C. Reorganizing the protein space at the Universal Protein Resource (UniProt) Nucleic Acids Research. 2012;40:D71–D75. - PMC - PubMed
    1. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, et al. A large-scale evaluation of computational protein function prediction. Nat Meth. 2013;10:221–227. - PMC - PubMed
    1. Schwede T. Protein Modeling: What Happened to the “Protein Structure Gap”? Structure. 2013;21:1531–1540. - PMC - PubMed
    1. Gallo Cassarino T, Bordoli L, Schwede T. Assessment of ligand binding site predictions in CASP10. Proteins: Structure, Function, and Bioinformatics. 2014;82:154–163. - PMC - PubMed
    1. Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Current Opinion in Structural Biology. 2013;23:191–197. - PMC - PubMed

Publication types

LinkOut - more resources