Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Nov 27;46(47):13468-77.
doi: 10.1021/bi7012273. Epub 2007 Nov 1.

Mining alpha-helix-forming molecular recognition features with cross species sequence alignments

Affiliations

Mining alpha-helix-forming molecular recognition features with cross species sequence alignments

Yugong Cheng et al. Biochemistry. .

Abstract

Previously described algorithms for mining alpha-helix-forming molecular recognition elements (MoREs), described by Oldfield et al. (Oldfield, C. J., Cheng, Y., Cortese, M. S., Brown, C. J., Uversky, V. N., and Dunker, A. K. (2005) Comparing and combining predictors of mostly disordered proteins, Biochemistry 44, 1989-2000), also known as molecular recognition features (MoRFs) (Mohan, A., Oldfield, C. J., Radivojac, P., Vacic, V., Cortese, M. S., Dunker, A. K., and Uversky, V. N. (2006) Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol. 362, 1043-1059), revealed that regions undergoing disorder-to-order transition are involved in many molecular recognition events and are crucial for protein-protein interactions. However, these algorithms were developed using a training data set of a limited size. Here we propose to improve the prediction algorithms by (1) including additional alpha-MoRF examples and their cross species homologues in the positive training set, (2) carefully extracting monomer structure chains from the Protein Data Bank (PDB) as the negative training set, (3) including attributes from recently developed disorder predictors, secondary structure predictions, and amino acid indices, and (4) constructing neural network based predictors and performing validation. Over 50 regions which undergo disorder-to-order transition that were identified in the PDB together with a set of corresponding cross species homologues of each structure-based example were included in a new positive training set. Over 1500 attributes, including disorder predictions, secondary structure predictions, and amino acid indices, were evaluated by the conditional probability method. The top attributes, including VSL2 and VL3 disorder predictions and several physicochemical propensities of amino acid residues, were used to develop the feed forward neural networks. The sensitivity, specificity, and accuracy of the resulting predictor, alpha-MoRF-PredII, were 0.87 +/- 0.10, 0.87 +/- 0.11, and 0.87 +/- 0.08 over 10 cross validations, respectively. We present the results of these analyses and validation examples to discuss the potential improvement of the alpha-MoRF-PredII prediction accuracy.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Analysis of 4E-BP1 disorder propensity by different predictors of intrinsic disorder. The plots produced by different predictors are grouped by their overall appearance and shape: A. PONDR® VLXT (red curve), RONN (blue curve) and IUPred (green curve). B. VL3 (black line) and VL2 (red line). C. VSL2B (black line), VSL2P (red line) and VL3BA (green line). D. DisPro (black line), DRIPPRED (red line) and DISOPRED (green line). Pink bar at the top of panel A indicates the region involved in binding of the eukaryotic initiation factor 4E (eIF4E).
Figure 2
Figure 2
Analysis of p53 disorder propensity by different predictors of intrinsic disorder. The plots produced by different predictors are grouped by their overall appearance and shape: Top panel. PONDR® VLXT (red curve), RONN (blue curve), IUPred (green curve), and DRIPPRED (pink line). Middle panel. VL3 (black line) and VL2 (red line), VSL2B (green line), VSL2P (yellow line), and VL3BA (blue line). Bottom panel. DisPro (black line) and DISOPRED (red line). Dark green and dark red bars at the top of panel A indicate the regions involved in binding of Mdm2 and S100B(ββ), respectively.
Figure 3
Figure 3
Analysis of RNase E disorder propensity by different predictors of intrinsic disorder. The plots produced by different predictors are grouped by their overall appearance and shape: A. PONDR® VLXT (red curve). B. VL3 (black line), VL2 (red line), RONN (yellow curve), and IUPred (green curve). C. VSL2B (black line), VSL2P (red line) and VL3BA (green line). D. DisPro (black line), DRIPPRED (red line) and DISOPRED (green line). Bars at the top of panel A indicate RISP regions responsible for RNase E interaction with different binding partners: A (residues 565-585), protein–RNA interaction site; B (residues 633-712), self-recognition region; C (residues 839-850), enolase binding site; and D (residues 1021-1061), PNPase binding site.
Figure 4
Figure 4. Feature selection
Mahalanobis distances for various numbers of feature combinations selected were plotted for Branch and Bound (circle), Forwards Selection (diamond), and AR (square).
Figure 5
Figure 5. Neural networks training
A: The 10 cross validation results of neural networks constructed using top six attribute combination from Forward Selection were plotted. B: Evaluation parameters from A were plotted for various thresholds values. PPV: positive prediction value; NPV: negative prediction value; Sn: sensitivity; Sp: specificity; Acc: accuracy (Table 1).
Figure 6
Figure 6. ROC curves
Receiver operating characteristic (ROC) curves from different constructions of neural networks were plotted. Top6, Top10-AR: top 6 or top 10 attribute combination from AR was used for neural network construction, respectively; Top6, Top10-BB: top 6 or top 10 attributes combination from Branch and Bound was used for neural network construction, respectively; Top6, Top10-FD: top 6 or top 10 attributes combination was used for neural network construction, respectively.
Figure 7
Figure 7. α-MoRF predictions across genomes of three Kingdoms
A: Fractions of proteins in 9 eukaryotic, 57 bacterial, and 16 archaeal genomes predicted to contain α-MoRF by previous (open bar) and present method (closed bar). The error bars indicated 95% confidence interval over 1000-resampling. B: Frequency of α-MoRF in 9 eukaryotic, 57 bacterial, and 16 archaeal genomes by previous (open bar) and present method (closed bar). The error bars indicated 95% confidence interval over 1000-resampling.

Similar articles

Cited by

References

    1. Fry DC, Vassilev LT. Targeting protein-protein interactions for cancer therapy. J Mol Med. 2005;83:955–963. - PubMed
    1. Arkin M. Protein-protein interactions and cancer: small molecules going in for the kill. Curr Opin Chem Biol. 2005;9:317–324. - PubMed
    1. Dyson HJ, Wright PE. Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol. 2002;12:54–60. - PubMed
    1. Uversky VN, Gillespie JR, Fink AL. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins. 2000;41:415–427. - PubMed
    1. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. Intrinsic disorder and protein function. Biochemistry. 2002;41:6573–6582. - PubMed

Publication types