Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Apr;22(4):359-66.
doi: 10.1002/pro.2225. Epub 2013 Feb 21.

Toward a "structural BLAST": using structural relationships to infer function

Affiliations
Review

Toward a "structural BLAST": using structural relationships to infer function

Fabian Dey et al. Protein Sci. 2013 Apr.

Abstract

We outline a set of strategies to infer protein function from structure. The overall approach depends on extensive use of homology modeling, the exploitation of a wide range of global and local geometric relationships between protein structures and the use of machine learning techniques. The combination of modeling with broad searches of protein structure space defines a "structural BLAST" approach to infer function with high genomic coverage. Applications are described to the prediction of protein-protein and protein-ligand interactions. In the context of protein-protein interactions, our structure-based prediction algorithm, PrePPI, has comparable accuracy to high-throughput experiments. An essential feature of PrePPI involves the use of Bayesian methods to combine structure-derived information with non-structural evidence (e.g. co-expression) to assign a likelihood for each predicted interaction. This, combined with a structural BLAST approach significantly expands the range of applications of protein structure in the annotation of protein function, including systems level biological applications where it has previously played little role.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Exploring the protein universe using structural BLAST. To infer function of a given query protein (purple), the universe of structures (left) is searched broadly for proteins that are structurally similar to the query and each one is placed in the query coordinate system (curved arrows) by superposing the protein backbones (shown schematically within the surface of each protein). In the specific application shown in the figure, binding partners (small molecules, other proteins, nucleic acids, etc.) of the structural neighbors (shown as pink, yellow, magenta, and orange circles) are also transformed into the coordinate system of the query (middle panel). The query is then analyzed to find surface residues that show a statistical propensity to contact ligands from structural neighbors. This information is represented as a heat map on the surface of the query (right panel). Regions where the query is likely to bind other molecules are shown in red and unlikely regions are shown in blue.
Figure 2
Figure 2
Conservation of protein–ligand interfaces. The figure shows the fraction of query proteins in the LigASite benchmark with a Z-score above the value shown on the x-axis. The blue line corresponds to results where structural neighbors of each query are chosen from a 60% non-redundant (with respect to sequence) pool of ligand-containing proteins which we compiled. Structural neighbors are obtained using a PSD cutoff of 0.8 that is large enough to detect geometric similarities even between proteins in different SCOP folds. The remaining lines show results obtained when structural neighbors from the same SCOP family (green), superfamily (purple), and fold (red) as the query were excluded from the set of structural neighbors (proteins without SCOP annotations were also excluded). Since some queries do not have many structural neighbors, Z-scores in this figure are calculated only for 146 proteins (out of a total of 337 in the LigASite benchmark) that had ≥5 neighbors in each set.
Figure 3
Figure 3
Structural alignment can reveal ligand shape similarity. NAD(P)-binding domain (purple, SCOP domain d1hyua1 and fold classification c.3) and nucleotide-binding domain (yellow, SCOP domain d1djna3, and fold classification c.4) and are structurally superimposed and shown in ribbon representation as a stereo pair. An FAD molecule from d1hyua1 (red) and ADP from d1djna3 (cyan) are shown in stick representation. There is a clear similarity in ligand shape and binding mode even though only some secondary structure elements overlap (regions of both structures without structurally equivalent regions in the other are shown as transparent).
Figure 4
Figure 4
Learning with Bayesian statistics. Learning in the Bayesian approach is carried out by examining reliable true positive (TP) and true negative (TN) reference sets. For example, TP might be a set of protein–protein interactions which have been experimentally validated multiple times and TN might be a set of protein pairs that are known not to interact. The degree to which having property X indicates membership in TP is quantified by the likelihood ratio (LR), calculated as the percentage of objects in the TP set that have the property X (P(X|TP)) divided by the percentage in the TN set with property X (P(X|TN)). A LR >1 indicates that an object with property X is more likely to be in the TP set than the TN set.
Figure 5
Figure 5
Evaluating protein interaction models. The figure shows an interaction model of a complex formed between two query proteins A and B (light green and light blue), that is derived from a known template complex present in the PDB (dark green and dark blue) using PrePPI's structural BLAST algorithm. The interaction model is obtained by superimposing the structure of each query protein on its respective structural neighbor in the template complex. The model is evaluated with a set of simple empirical scores that depend on (a) The quality of the structural alignment of the queries with their templates, (b) How well residues in the modeled interface (highlighted in yellow) overlap the template interface (highlighted in gray), and (c) How well predicted interfacial residues in both query proteins (highlighted in red) overlap with interfacial residues in the template.
Figure 6
Figure 6
Expanding the number of reliable predictions using remote relationships. The graphs show the numbers of high-confidence (LR>600) protein–protein interaction predictions that use close (blue, PSD<0.2), intermediate (red, 0.2

Similar articles

Cited by

References

    1. Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucl Acids Res. 2004;32:D226–D229. - PMC - PubMed
    1. Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 2005;33:D247–D251. - PMC - PubMed
    1. Holm L, Rosenström P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010;38:W545–W549. - PMC - PubMed
    1. Kolodny R, Petrey D, Honig B. Protein structure comparison: implications for the nature of ‘fold space’, and structure and function prediction. Curr Opin Struct Biol. 2006;16:393–398. - PubMed
    1. Petrey D, Fischer M, Honig B. Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci USA. 2009;106:17377–17382. - PMC - PubMed

Publication types