Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Apr 16:5:38.
doi: 10.1186/1471-2105-5-38.

Predicting co-complexed protein pairs using genomic and proteomic data integration

Affiliations

Predicting co-complexed protein pairs using genomic and proteomic data integration

Lan V Zhang et al. BMC Bioinformatics. .

Abstract

Background: Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship.

Results: Using a supervised machine learning approach--probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP) of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue), a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database), and the remaining predictions may potentially represent unknown CCPs.

Conclusions: We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP) pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Decision tree constructed using all protein pairs. Each leaf node is labeled with the numbers of CCPs and non-CCPs associated with it, while each internal node is labeled with the attribute (j) used for subsequent partitioning (see Table 4 or Supplementary Information for descriptions of the attributes). Two edges originate from each internal node, labeled "+" or "-," corresponding to the daughter nodes that have or do not have attribute j, respectively. Nodes with percentages of CCPs higher than that of the root node are colored red, while those with lower CCP percentages are blue. The color saturation depends on the relative entropy compared with the root node. The arrowhead size of an edge from a given node approximately represents the fraction of protein pairs in the parent node assigned to the corresponding daughter node.
Figure 2
Figure 2
ROC curves for predictions based on: all attributes (black), all attributes except the category "high-throughput screens of interaction" (yellow), all attributes except the category "correlated mRNA expression" (green), all attributes except the category "same transcriptional regulator" (red), all attributes except the category "sequence homology" (blue) and all attributes together with the categories "same subcellular localization (MIPS)", "same function (MIPS)" and "same protein class (MIPS)" (grey). The expected ROC curve for random guesses is the diagonal where true-positive rate equals false-positive rate (black dotted line). A-C show the same ROC curve at different resolutions.
Figure 3
Figure 3
A: Decision tree predictions compared with four high-throughput datasets and their simple combinations. B and C: Decision tree predictions compared with two APMS studies: TAP (B) and HMS-PCI (C), respectively. Only protein pairs covered by each respective study (using the "spoke" model [30]) were considered. Black solid line: decision tree predictions using all attributes; blue solid line: decision tree predictions using only high-throughput interaction datasets; grey solid line: decision tree predictions using all attributes together with the categories "same function" and "same protein class"; black dotted line: expected performance of random guesses.
Figure 4
Figure 4
The rRNA processing complex with candidate members predicted by the decision tree. Red circles represent members of the complex annotated in MIPS. Green and yellow circles are proteins found to be co-complexed with the MIPS complex members by the decision tree with a score higher than 0.5. The yellow ones are verified in YPD while the green ones are not. The width of each edge is proportional to the decision tree score of the corresponding protein pair. Edges with scores lower than 0.1 as well as edges between the MIPS complex members are not shown.
Figure 5
Figure 5
Correlation between scores from decision tree predictions and the fractions verified by YPD. For each of the four datasets (TAP spoke, TAP matrix, HMS-PCI spoke and HMS-PCI matrix), we plotted the fractions of its protein pairs at different score intervals that are also annotated in YPD.

Similar articles

Cited by

References

    1. Claverie JM. Gene number. What if there are only 30,000 human genes? Science. 2001;291:1255–1257. doi: 10.1126/science.1058969. - DOI - PubMed
    1. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001;98:4569–4574. doi: 10.1073/pnas.061034498. - DOI - PMC - PubMed
    1. Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci U S A. 2000;97:1143–1147. doi: 10.1073/pnas.97.3.1143. - DOI - PMC - PubMed
    1. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;417:399–403. doi: 10.1038/nature750. - DOI - PubMed

Publication types

MeSH terms

Substances