Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels

doi:10.1101/2023.06.24.546384

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Jun 26:2023.06.24.546384.

doi: 10.1101/2023.06.24.546384.

Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels

Affiliations

¹ Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, USA.
² Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA.
³ Department of Biology, Boston College, Chestnut Hill, Massachusetts, USA.

PMID: 37425916
PMCID: PMC10327070
DOI: 10.1101/2023.06.24.546384

Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels

Erik Nordquist et al. bioRxiv. 2023.

[Preprint]. 2023 Jun 26:2023.06.24.546384.

doi: 10.1101/2023.06.24.546384.

Authors

Affiliations

¹ Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, USA.
² Department of Biomedical Engineering, Center for the Investigation of Membrane Excitability Disorders, Cardiac Bioelectricity and Arrhythmia Center, Washington University in St. Louis, St. Louis, Missouri, USA.
³ Department of Biology, Boston College, Chestnut Hill, Massachusetts, USA.

PMID: 37425916
PMCID: PMC10327070
DOI: 10.1101/2023.06.24.546384

Update in

Incorporating physics to overcome data scarcity in predictive modeling of protein function: A case study of BK channels.
Nordquist E, Zhang G, Barethiya S, Ji N, White KM, Han L, Jia Z, Shi J, Cui J, Chen J. Nordquist E, et al. PLoS Comput Biol. 2023 Sep 15;19(9):e1011460. doi: 10.1371/journal.pcbi.1011460. eCollection 2023 Sep. PLoS Comput Biol. 2023. PMID: 37713443 Free PMC article.

Abstract

Machine learning has played transformative roles in numerous chemical and biophysical problems such as protein folding where large amount of data exists. Nonetheless, many important problems remain challenging for data-driven machine learning approaches due to the limitation of data scarcity. One approach to overcome data scarcity is to incorporate physical principles such as through molecular modeling and simulation. Here, we focus on the big potassium (BK) channels that play important roles in cardiovascular and neural systems. Many mutants of BK channel are associated with various neurological and cardiovascular diseases, but the molecular effects are unknown. The voltage gating properties of BK channels have been characterized for 473 site-specific mutations experimentally over the last three decades; yet, these functional data by themselves remain far too sparse to derive a predictive model of BK channel voltage gating. Using physics-based modeling, we quantify the energetic effects of all single mutations on both open and closed states of the channel. Together with dynamic properties derived from atomistic simulations, these physical descriptors allow the training of random forest models that could reproduce unseen experimentally measured shifts in gating voltage, ΔV _1/2 , with a RMSE ∼ 32 mV and correlation coefficient of R ∼ 0.7. Importantly, the model appears capable of uncovering nontrivial physical principles underlying the gating of the channel, including a central role of hydrophobic gating. The model was further evaluated using four novel mutations of L235 and V236 on the S5 helix, mutations of which are predicted to have opposing effects on V _1/2 and suggest a key role of S5 in mediating voltage sensor-pore coupling. The measured ΔV _1/2 agree quantitatively with prediction for all four mutations, with a high correlation of R = 0.92 and RMSE = 18 mV. Therefore, the model can capture nontrivial voltage gating properties in regions where few mutations are known. The success of predictive modeling of BK voltage gating demonstrates the potential of combining physics and statistical learning for overcoming data scarcity in nontrivial protein function prediction.

Author summary: Deep machine learning has brought many exciting breakthroughs in chemistry, physics and biology. These models require large amount of training data and struggle when the data is scarce. The latter is true for predictive modeling of the function of complex proteins such as ion channels, where only hundreds of mutational data may be available. Using the big potassium (BK) channel as a biologically important model system, we demonstrate that a reliable predictive model of its voltage gating property could be derived from only 473 mutational data by incorporating physics-derived features, which include dynamic properties from molecular dynamics simulations and energetic quantities from Rosetta mutation calculations. We show that the final random forest model captures key trends and hotspots in mutational effects of BK voltage gating, such as the important role of pore hydrophobicity. A particularly curious prediction is that mutations of two adjacent residues on the S5 helix would always have opposite effects on the gating voltage, which was confirmed by experimental characterization of four novel mutations. The current work demonstrates the importance and effectiveness of incorporating physics in predictive modeling of protein function with scarce data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Figure 1:. Overview of the BK channel structure, voltage gating, and mutations.**
A) Cryo-EM structure of the Ca²⁺-bound structure of human BK channels (PDB: 6V38 [58]) embedded in a lipid bilayer. The protein is drawn in cartoon style, with the PGD colored in green, VSD in red, RCK1 in purple and RCK2 in blue. The Ca²⁺-binding sites are shown in yellow with bound Ca²⁺ ions shown as orange spheres. Bound K⁺ ions in the selectivity filter are shown as gold spheres. The lipid aliphatic chains are drawn in gray bonds, with the polar head groups in dark grey spheres. This snapshot was taken from an MD-equilibrated simulation. B) Normalized ionic conductance-voltage (G-V) curves measured for the WT and V236W mutant BK channels. Dashed lines plot the Boltzmann fits for each curve (see Methods). The black arrows mark the WT V_1/2 as well as the shift (ΔV_1/2) for V236W with respect to the WT. C) All residues with a mutation in the dataset (see Methods) drawn in different-colored van der Waals spheres. The rest of the BK channel is drawn in transparent black Cartoon.

**Figure 2:. Overview of key physics-based descriptors.**
A) Total Rosetta ΔΔΔG scores as a function of residue number (ResID). B) Rosetta dispersion (fa_atr) ΔΔΔG. C) Rosetta solvation (fa_sol) ΔΔΔG. D) C_α-C_α covariance matrix within the monomer in the closed state, averaged across 4 monomers, derived from atomistic MD simulation in explicit solvent and membrane. E) Row of covariance matrix in **(D)** corresponding to the pore-lining residue A316.

**Figure 3:. Results of training and validation in 5 random train/test data splits.**
Correlations of predicted and true ΔV_1/2 for 5-fold cross-validation on 80% of dataset (blue) and independent test validation on the remaining 20% (orange). The dashed lines indicate trends for training and test, and the solid line marks x=y. The blue points show the performance on the training dataset, with overall R = 0.97 – 0.98, RMSE = 16 – 17 mV. The orange points show the independent test set with R = 0.54 – 0.80, RMSE = 30 – 35 mV.

**Figure 4:. Maximum experimental (Expt) and predicted (RF) ΔV_1/2 for mutations at each position.**
For each residue position, the maximum shift was selected from available experimental mutants or predicted values of all possible mutations. Only two opposing monomers are shown for clarity.

**Figure 5:. Experimental (Expt) and predicted (RF) ΔV_1/2 of N-, K-, and V-scanning mapped onto the TM structure of BK channels.**
The VSD, and PGD components: S5, S6 and selectivity filter, are denoted. The two domains are facing one another as they would be in the structure (90° rotation), not mirror images of each other.

**Figure 6:. S5 helix residues L235 and V236 and neighboring residues.**
A) Zoomed-in view of the PGD of two monomers, with L235 and V236 labeled and colored in red and blue bonds, respectively. The PGD helices S6 and S5, as well as the contacting VSD helix S4, are labeled. B) Predicted ΔV_1/2 of all mutations of L235 (red) and V236 (blue), arranged by increasing magnitude of predictions for L235X. Note that WT “mutations”, L235L and V236V, reflect the inherent uncertainty of the RF model prediction.

**Figure 7:. Correlation of experimental and predicted ΔV_1/2 for four novel L235 and V236 mutations.**
A) Current traces for the WT and four mutant channels. B) Normalized conductance (G/G_max) versus voltage (V) curves for the WT and four mutants. Dashed lines denote the Boltzmann fits for each curve (see Methods). C) Correlation between measured and predicted ΔV_1/2. Error bars report the predicted RF error and the propagated error from the experimental fitting, respectively. The dashed red line represents the best linear fit with R = 0.92, and the gray line plots y = x.

See this image and copyright information in PMC

References

1. Hie BL, Yang KK. Adaptive machine learning for protein engineering. Current Opinion in Structural Biology. 2022. Feb 1;72:145–52. - PubMed
1. Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. 2018. Jul;559(7715):547–55. - PubMed
1. Wang Y, Lamim Ribeiro JM, Tiwary P. Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr Opin Struct Biol. 2020. Apr;61:139–45. - PubMed
1. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Aug;596(7873):583–9. - PMC - PubMed
1. Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021. Aug;596(7873):590–6. - PMC - PubMed

Publication types

Actions

Grants and funding

R01 HL142301/HL/NHLBI NIH HHS/United States

LinkOut - more resources

Full Text Sources

[1] Hie BL, Yang KK. Adaptive machine learning for protein engineering. Current Opinion in Structural Biology. 2022. Feb 1;72:145–52. - PubMed

[2] Hie BL, Yang KK. Adaptive machine learning for protein engineering. Current Opinion in Structural Biology. 2022. Feb 1;72:145–52. - PubMed

[3] Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. 2018. Jul;559(7715):547–55. - PubMed

[4] Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. 2018. Jul;559(7715):547–55. - PubMed

[5] Wang Y, Lamim Ribeiro JM, Tiwary P. Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr Opin Struct Biol. 2020. Apr;61:139–45. - PubMed

[6] Wang Y, Lamim Ribeiro JM, Tiwary P. Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr Opin Struct Biol. 2020. Apr;61:139–45. - PubMed

[7] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Aug;596(7873):583–9. - PMC - PubMed

[8] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Aug;596(7873):583–9. - PMC - PubMed

[9] Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021. Aug;596(7873):590–6. - PMC - PubMed

[10] Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021. Aug;596(7873):590–6. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels

Affiliations

Incorporating physics to overcome data scarcity in predictive modeling of protein function: a case study of BK channels

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

This is a preprint.

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Related information

Grants and funding

LinkOut - more resources

Full Text Sources