Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 26;7(1):1400.
doi: 10.1038/s42003-024-07066-9.

Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions

Affiliations

Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions

Yuan Zhang et al. Commun Biol. .

Abstract

Assessing mutation impact on the binding affinity change (ΔΔG) of protein-protein interactions (PPIs) plays a crucial role in unraveling structural-functional intricacies of proteins and developing innovative protein designs. In this study, we present a deep learning framework, PIANO, for improved prediction of ΔΔG in PPIs. The PIANO framework leverages a graph masked self-distillation scheme for protein structural geometric representation pre-training, which effectively captures the structural context representations surrounding mutation sites, and makes predictions using a multi-branch network consisting of multiple encoders for amino acids, atoms, and protein sequences. Extensive experiments demonstrated its superior prediction performance and the capability of pre-trained encoder in capturing meaningful representations. Compared to previous methods, PIANO can be widely applied on both holo complex structures and apo monomer structures. Moreover, we illustrated the practical applicability of PIANO in highlighting pathogenic mutations and crucial proteins, and distinguishing de novo mutations in disease cases and controls in PPI systems. Overall, PIANO offers a powerful deep learning tool, which may provide valuable insights into the study of drug design, therapeutic intervention, and protein engineering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The framework of PIANO.
a The graph masked self-distillation learning module for pre-training to capture intricate structural patterns indicated in the structural context of an amino acid. Both encoders and both decoders were configured with the graph transformer and GCN with ARMA filters, respectively. The mask and re-mask strategies were applied before encoders and decoders, respectively. This module was learned by minimizing the loss function which is the summation of feature mask reconstruction loss and the graph similarity comparison loss (distillation loss). b The multi-branch network module for ΔΔG prediction, the student encoder pre-trained in (a), the GCN with ARMA filters, and the CNN were employed for residue, atom, and protein sequence embeddings, respectively. c The illustration of a graph structure. d Implementation details of graph transformer. Q, K, and V are query, key, and value matrices, respectively. e Implementation details of GCN with ARMA filters. L^ is the modified Laplacian matrix, W and H are learnable weight matrices. f The self-attention pooling on atom-level embeddings.
Fig. 2
Fig. 2. Performance evaluation.
The performance comparison of PIANO with other advanced methods is shown in PCC (a) and RMSE (b). cf The distributions of experimental versus predicted ΔΔGs derived from PIANO (c), DGCddG (d), MpbPPI (e), and GeoPPI (f). The impact factor evaluation through ablation experiments is shown in PCC (g) and RMSE (h).
Fig. 3
Fig. 3. Comparison of distributions of experimental and predicted ΔΔGs associated with different amino acid types.
a The average ΔΔGs. b The variance of ΔΔGs.
Fig. 4
Fig. 4. Visualizations of feature representations.
The t-SNE visualizations of feature representations, obtaining through graph masked self-distillation pre-training module that was not learned and learned, for different amino acid types are shown in (a, b), respectively. The t-SNE visualizations of feature representations, obtaining through graph masked self-distillation pre-training module that was not learned and learned, for each group of property that the amino acids belong to are shown in (c, d), respectively.
Fig. 5
Fig. 5. The PIANO-predicted ΔΔGs are significantly correlated with both mutation pathogenicity and gene essentiality.
a Distributions of PIANO-predicted ΔΔGs for pathogenic (n = 509), benign (n = 287) and VUS sites (n = 3,806). The center lines in boxplot graphs indicate median, the bounds of boxes indicate 25th and 75th percentiles. b The ORs of pathogenic variants versus benign variants for different PIANO-predicted ΔΔG percentile bins. Amino acids with higher PIANO-predicted ΔΔGs are enriched for pathogenic variants while those with lower ΔΔGs are depleted of pathogenic variants. Error bars indicate 95% confidence intervals of ORs. The horizontal dash line represents OR = 1. c The percentage of PIANO-predicted ΔΔGs for pathogenic and benign variants in different percentile bins. d The crystal structure of Protocadherin-19 (PDB: 6VFU), whose PIANO-predicted ΔΔGs for all its known mutations are in highest percentile. The mutations are shown in circles. e PIANO-predicted ΔΔG distributions for mutations in four protein groups encoded by genes with different functional annotations. The central lines in boxplot graphs represent the median, the bounds of boxes indicate 25th and 75th percentiles.
Fig. 6
Fig. 6. PIANO-predicted ΔΔGs exhibit the interpretation of de novo missense mutations from neurodevelopmental disorders.
a The distributions of PIANO-predicted ΔΔGs for de novo missense mutations from neurodevelopmental disorder probands (case, n = 643) and from unaffected siblings of autism spectrum disorder probands (control, n = 129). The center lines in boxplot graphs indicate median, the bounds of boxes indicate 25th and 75th percentiles. b Case variant enrichment for different methods at the 80th percentile. Error bars indicate 95% confidence intervals of ORs.

Similar articles

References

    1. David, A., Razali, R., Wass, M. N. & Sternberg, M. J. Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Hum. Mutat.33, 359–363 (2012). - PubMed
    1. Chuderland, D. & Seger, R. Protein-protein interactions in the regulation of the extracellular signal-regulated kinase. Mol. Biotechnol.29, 57–74 (2005). - PubMed
    1. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science381, eadg7492 (2023). - PubMed
    1. Nooren, I. M. A. & Thornton, J. M. Diversity of protein-protein interactions. EMBO J.22, 3486–3492 (2003). - PMC - PubMed
    1. Rabbani, G., Baig, M. H., Ahmad, K. & Choi, I. Protein-protein Interactions and their Role in Various Diseases and their Prediction Techniques. Curr. Protein Pept. Sci.19, 948–957 (2018). - PubMed

LinkOut - more resources