Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 22;24(1):59.
doi: 10.1186/s12859-023-05178-3.

Normalized L3-based link prediction in protein-protein interaction networks

Affiliations

Normalized L3-based link prediction in protein-protein interaction networks

Ho Yin Yuen et al. BMC Bioinformatics. .

Abstract

Background: Protein-protein interaction (PPI) data is an important type of data used in functional genomics. However, high-throughput experiments are often insufficient to complete the PPI interactome of different organisms. Computational techniques are thus used to infer missing data, with link prediction being one such approach that uses the structure of the network of PPIs known so far to identify non-edges whose addition to the network would make it more sound, according to some underlying assumptions. Recently, a new idea called the L3 principle introduced biological motivation into PPI link predictions, yielding predictors that are superior to general-purpose link predictors for complex networks. Interestingly, the L3 principle can be interpreted in another way, so that other signatures of PPI networks can also be characterized for PPI predictions. This alternative interpretation uncovers candidate PPIs that the current L3-based link predictors may not be able to fully capture, underutilizing the L3 principle.

Results: In this article, we propose a formulation of link predictors that we call NormalizedL3 (L3N) which addresses certain missing elements within L3 predictors in the perspective of network modeling. Our computational validations show that the L3N predictors are able to find missing PPIs more accurately (in terms of true positives among the predicted PPIs) than the previously proposed methods on several datasets from the literature, including BioGRID, STRING, MINT, and HuRI, at the cost of using more computation time in some of the cases. In addition, we found that L3-based link predictors (including L3N) ranked a different pool of PPIs higher than the general-purpose link predictors did. This suggests that different types of PPIs can be predicted based on different topological assumptions, and that even better PPI link predictors may be obtained in the future by improved network modeling.

Keywords: Complex Network; Graph Theory; L3 Principle; Link Prediction; Network Modeling; Protein–Protein Interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Principles behind link prediction in PPI networks. (a) and (b) illustrate conditions that would lead CN and the L3 principle, respectively, to predict that an edge between the two non-adjacent nodes x and y is in fact missing. (c) A graphical representation of the occurrence of a physical PPI between protein x and protein y. (d) Using the abstraction in (c), if the PPIs are arranged as shown on the left, we can infer the existence of a PPI between protein y and protein x as shown on the right
Fig. 2
Fig. 2
By modeling an ideal L3 graph, we illustrate the conditions using that would lead the L3 principle to assign the maximum possible score to Pxy (relative to Pxy obtained in non-ideal L3 graphs). a An example of an ideal L3 graph with the four L3-elements x, U, V, and y. b Three measures of how well a (possible non-ideal) L3 graph fits the L3 principle based on its L3-elements
Fig. 3
Fig. 3
The idea behind Formula (6). a An L3 graph with L3-elements x, U, V, and y. b1b3 Each of the six parts of Formula (6) corresponds to one of the six conditions used to measure how close the graph is to being an ideal L3 graph. Here, the Simple Ratio f1 from Sect. 4.1.1 has been selected as the similarity metric. c Combining all six parts yields Formula (6) for the score PxyL3N(f1)
Fig. 4
Fig. 4
Changes in scores for different link predictors when an ideal L3 graph is modified by: a removing compatible edges; and b adding incompatible edges. The shaded regions denote the variance (the minimum and maximum values) among repeated simulations, and the solid lines denote the medians. The AUC bar charts correspond to the respective plots. In b, a Savitzky-Golay filter using a polynomial of degree 3 and a window size of 21 was applied to make the curves smoother
Fig. 5
Fig. 5
Precision-Recall (PR) curves of the link predictors computed in the datasets used in the study[18] under the same methodology (50% of the PPIs removed, computations repeated 10 times, shaded regions indicate the standard deviations, PR is calculated until the recall reaches 10%). The accompanying bar charts show the predictors’ PR AUC-values (the larger, the better)
Fig. 6
Fig. 6
Precision-recall (PR) curves and its AUC-values (PR AUCs) of the link predictors computed with a 50% of the PPIs removed in the datasets. The solid lines show the median values and the shaded regions indicate the variance (the minimum and maximum values). The accompanying bar charts show the predictors’ PR AUC-values (the larger, the better). b Using these datasets, either 5%, 10%, 15%, 20%, or 25% of the PPIs are replaced with negative (non-) PPIs. For each these datasets, the PR AUCs computed by a link predictor are extracted at each data point and interpolated as a dotted line. The mean relative change of PR AUC w.r.t. the changes in the ratio of negative PPIs are obtained and denoted as “mean relative ΔPR AUC” (the lower, the worse)
Fig. 7
Fig. 7
Precision-Recall (PR) curves of the link predictors using the same experiment setup as in Fig. 6a, except here the datasets are human datasets: ac primary human datasets and d a human reference interactome. Specifically, the setup is to remove 50% of PPIs for each datasets for ten times randomly at uniform to generate ten sample datasets, where the shaded regions illustrate the variance in terms of minimum and maximum value, solid line as the median, and the accompanying bar charts for PR AUC-values (the larger, the better)
Fig. 8
Fig. 8
Illustrating how the PR AUC of the link predictors changes as the percentage of PPIs removed from the datasets decreases. The dotted curves are interpolations of the data points (50%, 40%, 30%, 20%, and 10%). The bar charts show the AUCs of the PR AUCs, i.e., the total area under each dotted line (the larger, the better)
Fig. 9
Fig. 9
a The mean STRING confidence score across all sample sizes, and b the moving means of the STRING confidence scores for sample size 50%. The shaded regions in Fig. 8b illustrate the variance (the minimum and maximum values) in STRING confidence scores

Similar articles

Cited by

References

    1. Costa-Silva J, Domingues D, Lopes FM. RNA-Seq differential expression analysis: an extended review and a software tool. PLoS ONE. 2017;12(12):1–18. doi: 10.1371/journal.pone.0190152. - DOI - PMC - PubMed
    1. Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet. 2019;20(5):257–272. doi: 10.1038/s41576-019-0093-7. - DOI - PubMed
    1. Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. Protein function in the post-genomic era. Nature. 2000;405(6788):823–826. doi: 10.1038/35015694. - DOI - PubMed
    1. Sanchez C, Lachaize C, Janody F, Bellon B, Röder L, Euzenat J, Rechenmann F, Jacq B. Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database. Nucleic Acids Res. 1999;27(1):89–94. 10.1093/nar/27.1.89. - PMC - PubMed
    1. Cusick ME, Klitgord N, Vidal M, Hill DE. Interactome: gateway into systems biology. Human Mol Genet. 2005;14(suppl_2):171–81. 10.1093/hmg/ddi335. - PubMed

MeSH terms

LinkOut - more resources