Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct 13;106(41):17377-82.
doi: 10.1073/pnas.0907971106. Epub 2009 Sep 24.

Structural relationships among proteins with different global topologies and their implications for function annotation strategies

Affiliations

Structural relationships among proteins with different global topologies and their implications for function annotation strategies

Donald Petrey et al. Proc Natl Acad Sci U S A. .

Abstract

It has become increasingly apparent that geometric relationships often exist between regions of two proteins that have quite different global topologies or folds. In this article, we examine whether such relationships can be used to infer a functional connection between the two proteins in question. We find, by considering a number of examples involving metal and cation binding, sugar binding, and aromatic group binding, that geometrically similar protein fragments can share related functions, even if they have been classified as belonging to different folds and topologies. Thus, the use of classifications inevitably limits the number of functional inferences that can be obtained from the comparative analysis of protein structures. In contrast, the development of interactive computational tools that recognize the "continuous" nature of protein structure/function space, by increasing the number of potentially meaningful relationships that are considered, may offer a dramatic enhancement in the ability to extract information from protein structure databases. We introduce the MarkUs server, that embodies this strategy and that is designed for a user interested in developing and validating specific functional hypotheses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Alignment of Spo0F with structurally similar proteins. (A) Backbone of Spo0F (PDB entry 1F51, chain E). (B) Backbone of AlaD (1EB3). (C) Backbone of the iron ABC transporter (1Y4T, chain A). The colored regions indicate the structurally similar subset of SSEs shared by 1F51, 1EB3 or 1Y4T. (D) Structural alignment of 1F51E (red), 1EB3 (blue) and 1Y4TA (green). Metals associated with each protein shown as spheres. Only regions that structurally align to 1F51 from either 1EB3 or 1Y4T are shown (also see Fig. 2D). (E) Structure-based sequence alignment of residues 1204–1212 in 1F51 to the structurally equivalent regions of 1EB3 and 1Y4T. Residues in color correspond to metal chelating residues using the coloring in D. Structure-based sequence alignments and rigid-body transformations that relate the proteins discussed in all figures are provided in SI Appendix.
Fig. 2.
Fig. 2.
A generic cation binding fragment. (A) Multiple structure alignment with Ska of Spo0F (red) acetylcholinesterase (blue, PDB entry 2ACE), spermidine synthase (yellow, 3B7P), and a UDP-glycosyl transferase from M. truncatula (green, 2ACW). Only the structurally equivalent residues as determined by the structure alignment are shown in worm representation. (B) Cationic moieties from structural neighbors of Spo0F are shown in sphere representation, colored as in A. These correspond to acetylcholine from 2ACE, a spermine from 3B7P, and a histidine side chain from 2ACW. The strand containing the conserved acidic residue is shown in wire representation and the residue itself is shown at the top of this strand in stick representation. The magnesium contained in 1F51 is shown as a sphere and the positively charged amino group from ligands and H22 from 2ACW are also shown as spheres. (C) Structure based sequence alignment of the strand shown in B with the conserved acidic residue shown in red. (D) The set of SSEs common to all of the proteins shown in this figure and to the metal binding proteins shown in Fig. 1.
Fig. 3.
Fig. 3.
A carbohydrate binding fragment. (A–C) The structures of three carbohydrate binding proteins: the VP8 domain of the capsid protein from the CRW-8 strain of porcine rotavirus (PDB entry 2I2S, magenta and gray ribbon representation) (A), garlic lectin (green and gray, 1KJ1) (B), and protein RSC2107 from Ralstonia solanacearum (red and gray, 2BT9 (C)). Colored regions are structurally conserved between the three proteins. Cocrystallized ligands are shown in stick representation. (D) The conserved substructure present in all three of the proteins shown in A. The structurally equivalent strands from each protein (i.e., each strand that aligns to a strand from 2I2S based on a structure-based sequence alignment) are colored identically. The largest rmsd between any neighbor and 2I2S was 4.4 Å. (E) The conserved substructure of 2I2S shown in magenta. Carbohydrate ligands from structural neighbors of 2I2S are shown in stick representation and colored according to the fold to which the protein belongs using the color code of 3A. Two sialic acids cocrystallized with 2I2S are shown as blue sticks. The ligands and PDB files from which they are derived are provided in SI Appendix.
Fig. 4.
Fig. 4.
Identifying a potential ligand that binds to a protein of unknown function. (A) Molecular surface of protein TM1055 highlighting a cleft identified by the program SCREEN (28) as the most likely ligand binding site on the protein surface, colored by solvent accessibility (42). (B) The structure of TM1055 (PDB entry 1RCU) shown as an orange worm. Four ligands from structural neighbors of TM1055 are shown as colored sticks, oriented in the coordinate system of TM1055 by transforming the coordinates of the ligands according to the transformation that relates the structural neighbor to TM1055. These are a tyrosine from a tyrosyl-tRNA synthetase (red, 1WQ4), an AMP from M. methylotrophus electron transfer flavoprotein (green, chain C), an S-adenosylhomocysteine from a methyltransferase (yellow, 9MHT), and a CoA from formyl-CoA transferase (blue, 1VGR). (C) The molecular surface of TM1055 with the aromatic moieties of the ligands from B as magenta sticks. Each aromatic moiety occupies an approximately equivalent position in the cavity identified in A. (D) The set of nine structurally equivalent SSE shared by TM1055 and all four structural neighbors.
Fig. 5.
Fig. 5.
Representative web page of the MarkUs protein function annotation server highlighting a subset of MarkUs functionalities used in the analysis of ligand binding sites of the structural neighbors of the VP8 domain discussed in Fig. 3. The “annotation map” (A) allows the visualization and analysis of functional data from different sources. The gray lines represent the sequences of a query protein (first line) and its structural neighbors. The magenta rectangles indicate functional residues, in this case, ligand-contacting residues as determined from cocrystallized proteins/ligands available in the PDB. The shaded rectangle indicates a structurally conserved region containing residues that bind ligands in “site 1” of the VP8 domain (see Fig. 3). The figure clearly indicates the conservation, both in overall location and in certain cases individual ligand contacting residues between folds (the last two lines represent β-prism and β-propeller proteins). Hovering the mouse over different areas of the annotation map will display “tool tips” (B) that provide additional functional details. For example, hovering the mouse over the icons in the area to the left of each sequence (C) provides details about each individual protein, including protein name from UniProt, source organism, EC class, and full GO tree. Other types of information can be displayed as well. In this annotation map, hovering over the magenta rectangles would display the identity of the residue/ligand pair. By clicking on the GO annotation within these tool tips (B, underlined), the set of proteins displayed can be restricted to those that share that particular annotation (“sugar binding” in this figure). (D and E) The information displayed on the annotation map can be changed using the controls to the left. In this case, all ligand contacts (excluding solvent) are displayed but this can be restricted to ligands of a certain type based on the ChEBI ligand classification (D). The menu (E) at the top left can be used to display a wide array of other structural and functional properties including UniProt sequence features, sequence conservation, protein–protein interactions, SNPS, and secondary structure. (F) Colored boxes in this region indicate residues lining cavities identified by the program SCREEN colored by conservation (dark red, highly conserved; blue, conserved; and black, unconserved).

Similar articles

Cited by

References

    1. Yang AS, Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J Mol Biol. 2000;301:665–678. - PubMed
    1. Shindyalov IN, Bourne PE. An alternative view of protein fold space. Prot: Struct Func Gen. 2000;38:247–260. - PubMed
    1. Kihara D, Skolnick J. The PDB is a covering set of small protein structures. J Mol Biol. 2003;334:793–802. - PubMed
    1. Szustakowski JD, Kasif S, Weng Z. Less is more: Towards an optimal universal description of protein folds. Bioinformatics. 2005;21:ii66–71. - PubMed
    1. Friedberg I, Godzik A. Connecting the protein structure universe by using sparse recurring fragments. Structure. 2005;13:1213–1224. - PubMed

Publication types

LinkOut - more resources