Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Dec 1;61(4):741-7.
doi: 10.1002/prot.20661.

Structural characterization of proteins using residue environments

Affiliations

Structural characterization of proteins using residue environments

Sean D Mooney et al. Proteins. .

Abstract

A primary challenge for structural genomics is the automated functional characterization of protein structures. We have developed a sequence-independent method called S-BLEST (Structure-Based Local Environment Search Tool) for the annotation of previously uncharacterized protein structures. S-BLEST encodes the local environment of an amino acid as a vector of structural property values. It has been applied to all amino acids in a nonredundant database of protein structures to generate a searchable structural resource. Given a query amino acid from an experimentally determined or modeled structure, S-BLEST quickly identifies similar amino acid environments using a K-nearest neighbor search. In addition, the method gives an estimation of the statistical significance of each result. We validated S-BLEST on X-ray crystal structures from the ASTRAL 40 nonredundant dataset. We then applied it to 86 crystallographically determined proteins in the protein data bank (PDB) with unknown function and with no significant sequence neighbors in the PDB. S-BLEST was able to associate 20 proteins with at least one local structural neighbor and identify the amino acid environments that are most similar between those neighbors.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The relationship between average PPV and a given threshold S-BLEST z-score. The proteins used were 100 random members of random SCOP families in ASTRAL 40 v1.65.
Fig. 2
Fig. 2
Illustration of the background distributions used to calculate z-score. The distance histogram distribution of the first nine residue environments of pdb 2TRX:A with respect to the ASTRAL 40 v1.65 dataset.
Fig. 3
Fig. 3
Illustration of classifying residues within map kinase, 1DI9. A: The line plot below indicates the AUC at each position along the chain. The arrows indicate the locations in the sequence with AUC above 0.90. These classifying residues are shown in green on the structure. 1DI9 illustrates the underlying reason for classification. The good classifiers form a core that is surrounded by the ATP binding site (in yellow), the peptide binding channel (in gray), and the residues that, when phosphorylated, activate the enzyme (in red). Additionally, LYS53 directly interacts with the ATP ligand. Interestingly, residues ASN82, VAL83, and LYS165 form another environment that classifies the function well. They are directly behind the peptide binding channel and are in close proximity to the ATP binding site. B: ROC of the ranked chains outputted from the congruence approach. Of the 27 members in our dataset, the first 25 chains ranked were true positives, whereas the method failed to recognize 1KOA and 1FMK as structurally similar (AUC is 0.935).
Fig. 4
Fig. 4
Illustration of the hit results from the 86 structures with unknown function. A: As an example of hit that is a true positive, 1VGY:A is matched with 1LFW:A with a z-score of -6.36. The best matching residues are ARG97 from 1VGY paired with ARG115 from 1LFW, HIS68 with HIS87, ASP70 with ASP89, GLY98 with GLY112, and GLU136 with GLU154. These residues are highlighted in yellow in the figure. B: An interesting hit that is of questionable significance is 1B3U:A, which is matched to the query of 1OYZ:A with a slightly below threshold z-score of -5.21. It is an interesting hit, because the proteins are clearly structurally related, and the best residue matches occur between secondary structural elements, and are often observed “bridging” the structural elements. C: An example of a possible unknown hit, between 1LJO:A and 1B34:A with a z-score of -5.64. Although the proteins share the same fold, their functional relationship is not known.
Fig. 5
Fig. 5
Characterization of a hit (1VGY:A). A: Top hits were associated with a common SCOP family. We then calculated the area under an ROC curve for each residue in that structure, quantifying how well each protein classifies the SCOP family the hits were associated with. The line plot below indicates the AUC at each position along the chain. The arrows indicate the locations in the sequence with AUC above 0.90. We highlight these locations in yellow on the structure. These hits fall into a predicted active site and are localized to a single region. B: The ROC for the congruence approach is shown. Of the five true positives in our dataset, three were the top hits, the fourth was in position five, and the fifth was ranked 65th overall (AUC is 0.995).

Similar articles

Cited by

References

    1. Thornton JM, Todd AE, Milburn D, Borkakoti N, Orengo CA. From structure to function: approaches and limitations. Nat Struct Biol. 2000;7(Suppl):991–994. - PubMed
    1. Lichtarge O, Bourne H, Cohen F. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996;257(2):342–358. - PubMed
    1. Wallace AC, Laskowski RA, Thornton JM. Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci. 1996;5(6):1001–1013. - PMC - PubMed
    1. Fetrow J, Skolnick J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J Mol Bioly. 1998;281:949–968. - PubMed
    1. Holm L, Sander C. Dali/FSSP classification of three-dimensional protein folds. Nucleic Acids Res. 1997;25(1):231–234. - PMC - PubMed

Publication types

LinkOut - more resources