Rescaling protein-protein interactions improves Martini 3 for flexible proteins in solution

Thomasen, F. Emil; Skaalum, Tórur; Kumar, Ashutosh; Srinivasan, Sriraksha; Vanni, Stefano; Lindorff-Larsen, Kresten

doi:10.1038/s41467-024-50647-9

Download PDF

Article
Open access
Published: 05 August 2024

Rescaling protein-protein interactions improves Martini 3 for flexible proteins in solution

Nature Communications volume 15, Article number: 6645 (2024) Cite this article

5158 Accesses
18 Altmetric
Metrics details

Subjects

Abstract

Multidomain proteins with flexible linkers and disordered regions play important roles in many cellular processes, but characterizing their conformational ensembles is difficult. We have previously shown that the coarse-grained model, Martini 3, produces too compact ensembles in solution, that may in part be remedied by strengthening protein–water interactions. Here, we show that decreasing the strength of protein–protein interactions leads to improved agreement with experimental data on a wide set of systems. We show that the ‘symmetry’ between rescaling protein–water and protein–protein interactions breaks down when studying interactions with or within membranes; rescaling protein-protein interactions better preserves the binding specificity of proteins with lipid membranes, whereas rescaling protein-water interactions preserves oligomerization of transmembrane helices. We conclude that decreasing the strength of protein–protein interactions improves the accuracy of Martini 3 for IDPs and multidomain proteins, both in solution and in the presence of a lipid membrane.

Full structural ensembles of intrinsically disordered proteins from unbiased molecular dynamics simulations

Article Open access 23 February 2021

mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics

Article Open access 28 November 2024

Extended experimental inferential structure determination method in determining the structural ensembles of disordered protein states

Article Open access 09 June 2020

Introduction

Intrinsically disordered proteins (IDPs), folded proteins with long disordered tails, and multidomain proteins with folded domains connected by flexible linkers, are characterized by their high level of conformational dynamics. Molecular dynamics (MD) simulations provide a valuable tool for studying IDPs and multidomain proteins, as they can be used to determine full conformational ensembles at atomic resolution¹. However, there are two central challenges that must be overcome for MD simulations to provide a useful description of such systems: the force field describing all the bonded and non-bonded interactions between atoms in the system must be sufficiently accurate and the conformational space of the protein must be sufficiently sampled².

One way to address the challenge of sufficient sampling is to use coarse-grained (CG) MD simulations in which groups of atoms are represented as single beads³. Martini is a widely used CG model in which 2–4 non-hydrogen atoms are represented by a single bead^4,5. An attractive aspect of Martini is its modular structure and high degree of transferability, which allows the simulation of complex systems containing several different classes of biomolecules. The current version of Martini, Martini 3, shows improvements over previous versions in areas such as molecular packing, transmembrane helix interactions, protein aggregation, and DNA base pairing⁶.

We have previously shown that Martini 3 simulations of IDPs produce overly compact conformational ensembles, resulting in poor agreement with small-angle X-ray scattering (SAXS) and paramagnetic relaxation enhancement (PRE) experiments⁷. Using an approach inspired by previous work on assessing and rebalancing non-bonded interactions in Martini^{8,9,10,11,12,13,14,15,16,17} and atomistic force fields¹⁸, we found that agreement with SAXS and PRE data could be significantly improved by uniformly increasing the strength of non-bonded Lennard-Jones interactions between protein and water beads by ~ 10%⁷. This was also shown to be the case for three multidomain proteins, hnRNPA1, hisSUMO-hnRNPA1, and TIA1; however, due to the small sample size and the similarity between these three proteins, it remains an open question whether the approach generalizes to other multidomain proteins.

Our previous work was concerned with the properties of proteins in aqueous solution in the absence of other classes of biomolecules. Intuitively, increasing the strength of protein-water interactions should affect the affinity between proteins and other biomolecules. As a prototypical example, one would expect that increasing protein-water interactions would decrease the affinity of proteins for lipid membranes, since the interaction is tuned by the relative affinity of proteins for water versus the membrane environment. The extent to which our previously described force field modification affects protein-membrane interactions, however, remains unclear. There is increasing evidence that IDPs and intrinsically disordered regions (IDRs) play important physiological roles at lipid membranes^{19,20,21,22,23}, and so it is important to better understand how force field changes that improve the description of IDPs in solution affect their interactions with membranes. In this context, it is important to note that unmodified Martini 3 has been quite successful at reproducing the specific membrane interactions for peripheral membrane proteins, as we have previously shown^24,25.

For previous versions of Martini, problems with overestimated protein-protein interactions have been corrected either by increasing the strength of interactions between protein and water beads^10,11,13,17 or by decreasing the strength of interactions between protein beads^8,9,14. We hypothesize that for proteins in solution, the two force field corrections likely have similar effects, simply rebalancing the relative energies associated with hydration versus self-interaction. However, in the case of mixed systems, for example with proteins, water, and membranes, we might expect clearer differences between these approaches. For example, decreasing the strength of protein-protein interactions may better retain the affinity between proteins and other molecules as originally parameterized, while increased protein-water interactions may lower this affinity (Fig. 1). It remains an open question whether specificity in protein-membrane interactions is retained when protein-water interactions are increased, and whether rescaling protein-protein interactions provides equivalent or improved agreement with experimental observations, both in comparison with unmodified Martini 3 and Martini 3 with rescaled protein-water interactions. We note that Martini 3 has already been shown to provide good agreement with free energies of dimerization for transmembrane proteins⁶, so the major focus of this work is to rebalance the interactions of proteins in solution to improve the agreement with experiments.

**Fig. 1: Expected effects of proposed force field modifications.**

Here, we expand upon our previous work to address these questions. First, we have expanded the set of multidomain proteins to include 15 proteins for which SAXS data have previously been collected (Fig. 2). Using this five-times larger set of proteins, we show that, as was the case for IDPs, increasing the strength of protein-water interactions by 10% improves the agreement with SAXS data. We further show that decreasing the strength of non-bonded interactions between protein beads by 12% leads to a comparable improvement in agreement with SAXS and PRE data for IDPs and multidomain proteins in solution, while better preserving the specificity of protein-membrane interactions for peripheral membrane proteins. In contrast, we find that rescaling protein-protein interactions decreases the propensity of transmembrane helices to dimerize, whereas this propensity is mostly unchanged when rescaling protein-water interactions.

**Fig. 2: Starting structures for simulations of multidomain proteins.**

Results

Analysis of an expanded set of multidomain proteins

Previously, we tested Martini 3 using a set of three multidomain proteins, TIA1, hnRNPA1, and hisSUMO-hnRNPA1, for which SAXS data have been measured^17,26. Given the similarity of the three proteins (all three are RNA-binding proteins, and two of them differ only by a hisSUMO-tag), we wished to expand the set of proteins with mixed regions of order and disorder to include a wider range of sizes and domain architectures. We searched the literature for such proteins with reported SAXS data and identified 12 proteins that we added to our set (Fig. 2): the tri-helix bundle of the m-domain and the C2 domain of myosin-binding protein C (MyBP-C_MTHB-C2)²⁷; the C5, C6, and C7 domains of myosin-binding protein C (MyBP-C_C5-C6-C7)²⁸; linear di-, tri-, and tetraubiquitin (Ubq₂, Ubq₃, Ubq₄)²⁹; the two fluorescent proteins mTurquoise2 and mNeonGreen connected by a linker region with the insertion of 0, 8, 16, 24, 32, or 48 GS repeats (mTurq-GS_X-mNeon)³⁰; and Galectin-3 (Gal-3)³¹. Apart from Gal-3, these proteins all contain at least two distinct folded domains, connected by linkers of different lengths and composition; three proteins (Gal-3, hnRNPA1, and hisSUMO-hnRNPA1) also contain an IDR attached to a folded domain. Collectively, we will refer to this set as multidomain proteins, though we note that Gal-3 only contains a single folded domain.

We have previously shown that Martini 3 produces conformational ensembles that are more compact than found experimentally for a set of 12 IDPs and for the three multidomain proteins TIA1, hnRNPA1, and hisSUMO-hnRNPA1, and that rescaling ϵ in the Lennard-Jones potential between all protein and water beads by a factor λ_PW = 1.10 resulted in more expanded ensembles that substantially improved the agreement with SAXS data⁷. Using our much larger set of multidomain proteins, we examined whether Martini 3 generally produces too compact conformational ensembles of multidomain proteins, and whether our modified force field with rescaled protein-water interactions would generalize to the expanded set of proteins. We ran Martini 3 simulations of the 12 new multidomain proteins with unmodified Martini 3 and with λ_PW = 1.10 and calculated SAXS intensities from the simulations. We found that, on average across the 15 proteins, increasing the strength of protein-water interactions by λ_PW = 1.10 substantially improved the direct agreement with the experimental SAXS data, as quantified by the reduced χ², ${\chi }_{r}^{2}$ (Fig. 3). For only one of the 15 proteins, MyBP-C_MTHB-C2, the modified force field gave rise to reduced agreement with the SAXS data. This result shows that our previously proposed modification of protein-water interactions in Martini 3, which was optimized to improve the global dimensions of IDPs, also provides a general improvement in the global dimensions of multidomain proteins.

**Fig. 3: Agreement between simulations and SAXS data for multidomain proteins.**

Rescaling protein-protein interactions

Inspired by previous work on earlier versions of the Martini force field^8,9,14, we next examined whether rescaling protein-protein interactions instead of protein-water interactions would provide a similar or further improvement in the agreement with the experimental data. To do so, we ran Martini 3 simulations for the set of 12 IDPs with SAXS data available that we had studied previously⁷ and the new set of 15 multidomain proteins. In these simulations, we rescaled ϵ in the Lennard-Jones potential between all protein beads by a factor λ_PP. We scanned different values of this parameter, and found λ_PP = 0.88 to provide the best agreement with experiments (Supplementary Fig. 1). We found that this level of rescaling protein-protein interactions (λ_PP = 0.88) provides a comparable improvement in the agreement with the experimental data as rescaling protein-water interactions by λ_PW = 1.10 for both multidomain proteins (Fig. 3) and IDPs (Fig. 4).

**Fig. 4: Agreement between simulations and SAXS or PRE data for IDPs.**

To further test the effect of rescaling protein-protein interactions by λ_PP = 0.88 and compare with the approach of rescaling protein-water interactions, we ran simulations of five IDPs with intramolecular PRE data available: the LCD of hnRNPA2³², the LCD of FUS³³, α-synuclein³⁴, full-length tau (hTau40)³⁵, and osteopontin (OPN)³⁶, and calculated PRE data from the simulations (Fig. 4e). Again, λ_PP = 0.88 provided a comparable level of agreement with the PRE data as we previously found using λ_PW = 1.10⁷. Specifically, the agreement with the PRE data improved for all proteins except the hnRNPA2 LCD (Fig. 4e).

To further characterize the symmetry between rescaling protein-water and protein-protein interactions, we compared the ensembles produced with the two rescaling approaches, and with unmodified Martini 3, using the distribution of R_g (Supplementary Fig. 2-3) and a principal component analysis (PCA) based on the pairwise distances between backbone beads (Supplementary Fig. 4-5 and Supplementary Table 1-2). This analysis confirmed that the two rescaling approaches produce ensembles which are highly similar to each other in comparison with unmodified Martini 3. We conclude that decreasing the strength of protein-protein interactions by λ_PP = 0.88 provides an equally good alternative to rescaling protein-water interactions for IDPs and multidomain proteins in solution.

Protein self-association in solution

The observation that multidomain proteins in solution are too compact in Martini 3 simulations suggests that interactions between folded protein domains may be overestimated, at least at the high effective concentration within a single chain. To explore this further, we examined the effect of rescaling protein-protein interactions on the interactions between folded proteins in trans. To this aim, we ran MD simulations of two protein systems that should undergo transient homodimerization, ubiquitin and villin HP36, which we also used in our previous work⁷. Ubiquitin self-associates with a K_d of 4.9 ± 0.3 mM based on NMR chemical shift perturbations³⁷ and villin HP36 self-associates with a K_d > 1.5 mM based on NMR diffusion measurements³⁸. We ran MD simulations of two copies of the proteins with λ_PP = 0.88 and calculated the fraction of the time that the proteins were bound (Fig. 5a). For both proteins λ_PP = 0.88 resulted in decreased self-association, and again we found that λ_PP = 0.88 gave comparable results to our previously published simulations with λ_PW = 1.10⁷. Comparing the simulations with the expected fraction bound based on the experimentally determined K_d values, we found that ubiquitin self-association is likely slightly overestimated with unmodified Martini 3 and slightly underestimated with λ_PW = 1.10 and λ_PP = 0.88. For villin HP36, all three force fields gave rise to a fraction bound within the expected range. While the overestimated compaction of multidomain proteins suggest that interactions between folded domains may be too strong in Martini 3, our results on the self-association of ubiquitin and villin HP36 do not provide a clear indication that this is the case.

To further investigate the effect of rescaling protein-protein interactions on protein self-association, we performed simulations of four IDP systems, which we also used in our previous work⁷. Specifically, we ran simulations with λ_PP = 0.88 of two copies of α-synuclein, hTau40, or p15PAF, which should not self-associate under the given conditions based on PRE^34,35 or size-exclusion chromatography-multiangle laser-light scattering (SEC-MALLS) data³⁹, as well as two copies of the FUS LCD, which should transiently interact under the given conditions based on PRE data³³. We then calculated the fraction of time that the proteins were bound in the simulations (Fig. 5a). Again λ_PP = 0.88 gave comparable results to our previously published simulations with λ_PW = 1.10. The results show that unmodified Martini 3 overestimates the self-association of IDPs, and that both rescaling approaches result in lowered self-association and therefore better agreement with experiments. However, none of the force fields give rise to a clear distinction between the FUS LCD and the three IDPs which should not self-associate, suggesting that Martini 3 does not properly capture specificity in IDP-IDP interactions.

To investigate further how well specific interactions between copies of the FUS LCD were captured, we calculated intermolecular PRE data from our simulations for direct comparison with the experimental PRE data³³ (Fig. 5b-c). Simulations with λ_PP = 0.88 and λ_PW = 1.10 produce similar PREs when calculated with the spin-labels at residue 16 and residue 142, but we observed some discrepancy between the two force fields for PREs calculated with the spin-label at residue 86. These differences could be due to a true difference between the force fields, but may also be due to lack of convergence on the protein-protein contacts, as the bound state is not very populated in the simulations. Both λ_PP = 0.88 and λ_PW = 1.10 show slight improvement over unmodified Martini 3 based on the ${\chi }_{r}^{2}$ to the experimental PRE data, but none of the force fields fully capture the variation in interactions across the sequence. For example, interactions with the N-terminal region seem to be underestimated with the rescaled force fields based on the PRE data with the spin-label at residue 16, while the interactions with the central region seem to be overestimated with the unmodified force field based on the PRE data with the spin-label at residue 86. The interpretation of the results is complicated by the fact that the rotational correlation time, τ_c, providing the best fit to the experimental data is lower for the unmodified force field (1 ns), than for λ_PW = 1.10 (8 ns) and λ_PP = 0.88 (9 ns), suggesting that the fit of τ_c is absorbing some of the true difference between the force fields. Overall, the comparison with intermolecular PRE data for the FUS LCD is consistent with an improvement in the overall strength of IDP-IDP interactions, but a remaining lack of interaction specificity with the rescaled force fields. The results also show that rescaling protein-protein interactions gives as good or better agreement with the intermolecular PRE data when compared with our previous approach of rescaling protein-water interactions.

Rescaling protein-water interactions for backbone beads only

While the overall agreement with SAXS experiments was improved for almost all proteins when rescaling protein-protein or protein-water interactions, some proteins were still too expanded or compact with respect to the experimental R_g, suggesting that some sequence-specific effects on compaction were not fully captured. We reasoned that sequence-specific effects on the ensemble properties would possibly be better captured if we rescaled only the interactions between the protein backbone and water; this approach could lead to the desired expansion of the proteins while retaining the interactions of the amino acid side chains as originally parameterized. We therefore performed simulations of our set of IDPs and multidomain proteins in which we rescaled ϵ in the Lennard-Jones potential between all protein backbone and water beads by a factor λ_PW-BB, scanning different values of this parameter, and found λ_PW-BB = 1.22 to provide the best agreement with experiments (Supplementary Fig. 1). However, the simulations of the IDPs and multidomain proteins with λ_PW-BB = 1.22 showed similar agreement with experiments as when rescaling all protein-water interactions or protein-protein interactions (Supplementary Fig. 6–7).

We also compared the ensembles produced with λ_PW-BB = 1.22 to the other force fields using R_g distributions (Supplementary Fig. 2-3) and analyses of the pairwise distances between backbone beads (Supplementary Fig. 4-5 and Supplementary Table 1-2). The results show that ensembles produced with λ_PW-BB = 1.22, λ_PW = 1.10, and λ_PP = 0.88 have a comparable level of similarity to unmodified Martini 3. The ensembles produced with λ_PW = 1.10 and λ_PP = 0.88 are slightly more similar to each other than to λ_PW-BB = 1.22, possibly due to the rebalancing of backbone versus side chain interactions in the λ_PW-BB = 1.22 force field. Given that rescaling of only protein backbone-water interactions did not show any substantial improvement in the agreement with the experimental data or higher similarity to unmodified Martini 3 with respect to the previous approaches, and that the strong interactions between the protein backbone and water may have undesirable effects on the behaviour of the hydration shell, we decided not to pursue this approach further.

Amino acid side chain analogues

We wished to further investigate the symmetry between rescaling protein-water and protein-protein interactions using simulations of oil/water partitioning, as this was a central approach in the original parameterization of non-bonded interactions in Martini. Inspired by the initial parameterization of Martini proteins, we performed simulations of the cyclohexane/water partitioning of amino acid side chain analogues⁵. As rescaling protein-protein interactions should not substantially affect the interactions of amino acids with water or cyclohexane, we ran a single set of simulations to represent both unmodified Martini 3 and λ_PP = 0.88, as well as a set of simulations with protein-water interactions rescaled by λ_PW = 1.10. We calculated the transfer free energy from cyclohexane to water, ΔG_CHEX-W, from our simulations and compared them with experimentally determined ΔG_CHEX-W-values (Supplementary Fig. 8-9)^5,40. The results show that rescaling protein-water interactions by λ_PW = 1.10 slightly increases partitioning to the water phase, as would be expected, but the effect is small when compared with the overall discrepancy between simulation and experiment. The two rescaling approaches also provide comparable Pearson correlations with the experimental ΔG_CHEX-W-values (r_Pearson = 0.92 ± 0.04 and r_Pearson = 0.94 ± 0.03 for λ_PP = 0.88 and λ_PW = 1.10 respectively). We conclude that the results from the oil/water partitioning simulations do not clearly favour one rescaling approach over the other. However, the results illustrate that changes in the non-bonded interactions which have a very modest effect on small molecule partitioning may have a much larger effect on protein-protein interactions and the ensemble properties of flexible proteins, highlighting the importance of a direct comparison with experiments that report on protein structure.

Simulations of the dimerization of side chain analogues have previously been used to shed light on similarities and differences across force fields⁴¹. We therefore also performed simulations of the self-association of Phe-Phe, Tyr-Phe, Tyr-Tyr, Lys-Asp, and Arg-Asp side chain analogues. Here λ_PP = 0.88 and λ_PW = 1.10 both result in a small decrease in self-association when compared with unmodified Martini 3 (Supplementary Fig. 10). The two rescaling approaches also give comparable free energy profiles along the center-of-mass (COM) distance, despite the fact that λ_PP = 0.88 results in a rebalancing of the Coulomb and Lennard-Jones potentials in the Lys-Asp and Arg-Asp interactions. Comparing with experimentally measured affinities shows that Martini 3 correctly ranks Arg-Asp interactions as stronger than Lys-Asp, and this behaviour is preserved with both λ_PP = 0.88 and λ_PW = 1.10⁴². The ranking of Tyr-Tyr, Tyr-Phe, and Phe-Phe is consistent with previous analyses of Martini⁴¹ and show Phe-Phe to be the strongest in all three versions of Martini 3. In contrast, previous analyses of experimental data on IDPs suggest that Tyr-Tyr interactions should be stronger than Tyr-Phe and Phe-Phe^43,44, and measurements of vapour pressure show that benzene-phenol interactions are stronger than benzene-benzene interactions⁴⁵. These results suggest that a rebalancing of aromatic-aromatic interactions in Martini 3 may be necessary to better capture sequence-specific effects in IDPs and multidomain proteins. Additionally, the self-association of Phe-Phe, Tyr-Phe, Lys-Asp, and Arg-Asp side chain analogues is slightly overestimated with all three versions of the force field when compared with the experimentally measured affinities.

Protein-membrane interactions

In the simulations described above, we found that the effects of increasing protein-water interactions or decreasing protein-protein interactions were very similar. We, however, hypothesized that these two force field modifications could have substantially different effects on systems in which proteins interact with other classes of molecules that are not protein or water. We expected that increased protein-water interactions would result in lower affinity for other molecules, which bind in competition with solvation, while decreased protein-protein interactions would not affect the affinity to the same extent, barring any effects of altering the conformational ensemble.

To examine the effect of rescaling the Lennard-Jones interaction parameters on the affinity of proteins for different biomolecules, we chose to investigate protein interactions with lipid membranes. We had two main motivations for this choice: first, protein-membrane interactions have been thoroughly characterized using Martini^24,46,47; second, Martini has been particularly focused on lipid membranes and protein-membrane interactions since its early development days^9,48,49.

We therefore performed simulations of peripheral proteins in the presence of lipid bilayers, using both unmodified Martini 3 and the two modified versions, λ_PP = 0.88 and λ_PW = 1.10, following a protocol we have previously described²⁴. In short, we ran unbiased MD simulations starting with the protein at a minimum distance of 3 nm away from the bilayer. Over the course of the MD simulation, the proteins interact, often transiently and reversibly, with the membrane (Supplementary Fig. 11-12), and membrane binding was quantified as previously described²⁴ based on defining bound states when the minimum distance was lower than or equal to 0.7 nm.

To characterize the effect of our rescaling protocol on a broad set of protein-membrane interactions, we selected a diverse set of proteins: (i) one negative control, hen egg-white lysozyme, which is highly soluble in water and is not expected to interact specifically with the membrane in the absence of negatively charged phospholipids⁵⁰; (ii) three peripheral membrane proteins consisting of a single folded domain (Phospholipase2, Arf1 in its GTP-bound state, and the C2 domain of Lactadherin) for which we previously characterized the membrane-binding behaviour²⁴; (iii) two membrane-binding multidomain proteins: PTEN (1–351), containing a N-terminal Phosphatase domain and C2 domain that are known to be sufficient for membrane binding, and the Talin FERM domain, which has multiple sub-domains (F0 to F3) and binds to membranes through specific phosphoinositol(4,5)phosphate (PIP2) binding sites present in its F2 and F3 subdomains⁵¹; (iv) two IDRs that have been characterized as membrane-binding regions: the N-terminal IDR of TRPV4⁵² and a short C-terminal motif (CTM) of Complexin⁵³. For the two IDRs, simulations in solution with both λ_PP = 0.88 and λ_PW = 1.10 result in expanded ensembles and a larger average value of R_g compared to unmodified Martini 3 (Supplementary Fig. 13).

As hypothesized, the different force field modifications have different effects on protein-membrane interactions (Fig. 6). In particular, we found that simulations with decreased protein-protein interactions (λ_PP = 0.88) provide a similar degree of protein-membrane interaction when compared with unmodified Martini-3. In contrast, simulations with an increased strength of protein-water interactions (λ_PW = 1.10) show significantly reduced membrane affinity and binding for all proteins, almost always leading to a complete lack of interactions between the protein and the lipid bilayer. Importantly, we observe a clear difference between lysozyme (as a non-interacting negative control) and all other proteins (as membrane binding) in our simulations with unmodified Martini 3 and λ_PP = 0.88, while this is not the case in our simulations with λ_PW = 1.10. Given that λ_PW = 1.10 and λ_PP = 0.88 provide a comparably good description of IDPs and multidomain proteins in solution, and that λ_PP = 0.88 more accurately retains the specificity and strength of protein-membrane interactions as originally parameterized in Martini 3, we suggest that λ_PP = 0.88 is overall a more robust and transferable modification to Martini 3.

**Fig. 6: Protein-membrane interactions.**

Capturing effects of sequence changes

Having selected λ_PP = 0.88 as the preferred force field modification for proteins in solution, we next examined to what extent this force field could capture more subtle sequence effects in IDPs and multidomain proteins.

The λ_PP = 0.88 force field provides the same Pearson correlations between experimental and simulation R_g as unmodified Martini 3, initially suggesting that there is no improvement in capturing relative protein-specific differences in R_g (Figs. 3-4). To test this for a series of similar proteins with systematic differences in sequence and structure, we first selected the mTurq-GS_X-mNeon proteins, for which the R_g should increase systematically with linker length. We calculated the Pearson correlation between simulation and experimental R_g-values for these proteins with the different force fields (Fig. 7a), and found that the simulations with unmodified Martini 3 only provide a small separation of the R_g-values as a function of linker length, and therefore give a Pearson correlation coefficient with a high degree of uncertainty based on bootstrapping (r_Pearson = 0.6 ± 0.6), while the simulations with rescaled interactions allow for a clearer separation of R_g as a function of linker length (r_Pearson = 0.9 ± 0.1 for both λ_PW = 1.10 and λ_PP= 0.88). This result suggests that rescaling protein-water or protein-protein interactions allows for a higher sensitivity of ensemble properties to subtle changes in protein sequence and structure, such as differences in interdomain linker length.

**Fig. 7: Radii of gyration of mTurq-mNeon and hnRNPA1_LCD variants.**

To further investigate the ability of Martini 3 with rescaled protein-protein interactions to capture more subtle sequence effects in IDPs, we performed simulations of six variants of the LCD of hnRNPA1, which have varied composition of charged and aromatic residues while retaining the length of the wild-type sequence⁵⁴, using unmodified Martini 3 and Martini 3 with λ_PP = 0.88. We also performed simulations with λ_PP = 0.92, as this provided the optimal agreement with SAXS data for wild-type hnRNPA1 LCD. We compared the R_g calculated from the simulations with R_g values measured by SAXS for the six variants and wild-type. As expected based on the results presented above, we found that unmodified Martini 3 substantially underestimates the R_g of all variants (Fig. 7b). While modifying protein-protein interactions by λ_PP = 0.88 gives the best results on average across all proteins we studied, it leads to a slight overestimation of the R_g for the wild-type and variants of the LCD from hnRNPA1 (Fig. 7b). If we instead select λ_PP = 0.92 as the value of λ_PP that gives the best result for the wild-type hnRNPA1 LCD (among the values that we examined) we—per construction—find a more accurate level of expansion across the variants. Equally important, we found that unmodified Martini 3 does not accurately capture the variation in R_g associated with the sequence variation (r_Pearson = –0.1 ± 0.5), while simulations with λ_PP = 0.92 and λ_PP = 0.88 result in a more accurate estimate of the effect of the sequence variation on the R_g values (r_Pearson = 0.7 ± 0.3 and r_Pearson = 0.9 ± 0.2 respectively). This result suggests that decreasing the strength of protein-protein interactions in Martini 3 improves the sensitivity of IDP ensemble properties to sequence variation.

Comparison with high-resolution ensembles

Next, we aimed to test the effect of our proposed force field modification by comparing our Martini 3 simulations with ensembles produced by higher resolution models. First, we compared our simulations of α-synuclein with extensive atomistic MD simulations produced with state-of-the-art force fields. We used an ensemble similarity metric based on dimensionality reduction of the pairwise RMSD between ensemble conformers^56,57 to quantitatively compare our unmodified and λ_PP = 0.88 Martini 3 simulations with atomistic simulations performed with the Amber03ws and Amber99SB-disp force fields⁵⁸. We note that both Amber03ws and Amber99SB-disp produce ensembles of α-synuclein which are slightly too compact when compared with R_g from SAXS⁵⁹, while the ensemble from our Martini 3 simulation with λ_PP = 0.88 is more expanded, in excellent agreement with SAXS. In spite of this discrepancy, the ensemble comparison shows that rescaling protein-protein interactions in Martini 3 by λ_PP = 0.88 increases the similarity to the atomistic simulations with both Amber03ws and Amber99SB-disp (Table 1). Interestingly, the λ_PP = 0.88 Martini 3 simulation is more similar to both atomistic simulations than the atomistic simulations are to each other, suggesting that the agreement is within the expected variation between force fields.

Table 1 Jensen-Shannon divergence based on Cα RMSD between α-synuclein ensembles from Martini simulations and atomistic simulations

Full size table

Next we wished to perform a similar test for a multidomain protein. We used the same approach to quantify the similarity between our unmodified and λ_PP = 0.88 Martini 3 simulations of hnRNPA1 with an ensemble that was generated based on data from double electron-electron resonance (DEER) electron paramagnetic resonance, PRE, and SAXS experiments⁶⁰. Again, the comparison shows that the Martini 3 simulation with λ_PP = 0.88 is more similar to the experimentally derived atomistic ensemble (Table 2). The results from these two test cases suggest that our proposed force field modification of λ_PP = 0.88 also improves the agreement with higher resolution simulations and experimentally derived ensemble models.

Table 2 Jensen-Shannon divergence based on Cα RMSD between hnRNPA1 ensembles from Martini simulations and an integrative atomistic model

Full size table

Protein self-association in the membrane

To test the effect of rescaling protein-protein and protein-water interactions on protein behaviour in a lipid membrane environment, we performed simulations of the homo-dimerization of the transmembrane domain of both EphA1 and ErbB1 from the receptor tyrosine kinase (RTK) domain family, which were used as test systems for Martini 3⁶. RTKs are a well-studied protein class for protein-protein interactions in a membrane environment and, for both proteins, experimental free energies of association have been determined by Förster resonance energy transfer (FRET)^61,62. Our results show that unmodified Martini 3 and Martini 3 with λ_PW = 1.10 produce comparable potentials of mean force (PMFs) (Fig. 8), resulting in overestimated ΔG of association by ~4 kJ/mol for EphA1 and reasonable agreement with the experimental ΔG for ErbB1, consistent with the results from Souza et al.⁶. Rescaling protein-protein interactions by λ_PP = 0.88 results in a complete loss of self-association as the PMF profiles becomes repulsive for both proteins (Fig. 8). These results suggest that, while unmodified Martini 3 and Martini 3 with λ_PW = 1.10 may slightly overestimate protein-protein interactions in the membrane environment, λ_PP = 0.88 results in a substantial underestimation of protein-protein interactions in the membrane, and is likely not a suitable force field modification for studying oligomerization of transmembrane protein systems.

**Fig. 8: Transmembrane protein self-association.**

Discussion

We have previously shown that simulations with Martini 3 underestimate the global dimensions of IDPs, and that increasing the strength of protein-water interactions by 10% results in more expanded ensembles and substantially improves the agreement with SAXS data⁷. Here, we expanded this approach to a set of 15 multidomain proteins for which SAXS data have been recorded. Our results show that Martini 3 on average provides too compact ensembles of these multidomain proteins, and that, as was the case for IDPs, rescaling protein-water interactions by 10% substantially improves the agreement with SAXS data. We also show that decreasing the strength of interactions between protein beads by 12% results in the same expansion of the ensembles and improved agreement with experiments. We also tested the effect of increasing the strength of interactions between only the protein backbone beads and water, but did not find that this provides any further improvement in the agreement with the experimental data. While the different rescaling approaches provide essentially the same results for proteins in solution, we show that rescaling protein-protein interactions is the preferable option in order to best retain the specificity and strength of protein-membrane interactions as originally parameterized in Martini 3. We note, however, that this change to the force field leads to decreased dimerization of proteins within a membrane environment, and a significant underestimation of free energies of dimerization. Therefore, we suggest that decreasing the strength of protein-protein interactions by 12% is suitable for systems with flexible proteins in solution and in proximity to membranes, but likely not for systems with specific protein-protein interactions in the membrane. An important outcome of our work is also the curation of a set of multidomain proteins with available SAXS data and starting structures for simulations, which can be used for future research in force field assessment and development.

One of the challenges when running Martini 3 simulations of multidomain proteins is selecting which regions to keep folded with the elastic network model and which regions to leave unrestrained. In this work, we manually selected the folded domains in the structures using domain annotations and intuition. It is, however, difficult to know a priori whether distinct domains should act as single structural modules due to specific interactions or move freely with respect to one another. Recently, it has been proposed to use the pairwise alignment error output from AlphaFold2 predictions to assign automatically the elastic network restraints⁶³. In future work, this may provide a more accurate distinction between domains that should be relatively rigid or dynamic with respect to each other. Additionally, replacing the elastic network model with a more flexible structure-based model⁶⁴ may provide the ability to sample both the bound and unbound state in cases where folded domains have specific interactions⁶⁵. In stronger and more specific interdomain interactions, the resolution of Martini 3 may also play a more important role. For example, water-mediated hydrogen-bonding networks would not be captured with the 4–1 mapping of water beads. As most of the proteins presented in this work likely do not have very specific interactions between domains, the lack of structured water is presumably not an issue.

Although the simple approach of decreasing the strength of protein-protein interactions uniformly by 12% shows an improvement over unmodified Martini 3 in reproducing the global dimensions of IDPs and multidomain proteins, we note that the agreement with the SAXS data is still not perfect (${\chi }_{r}^{2}$ > 1 in most cases), and there are systematic outliers with respect to the experimental R_g values. Although some of the system-specific deviations could potentially be alleviated by e.g. more accurately assigning and modeling the restraints on the folded domains, the overall deviation from the experimental data suggests that a more fundamental rebalancing of non-bonded interactions, and perhaps also CG mapping scheme, is necessary to describe the behaviour of IDPs and multidomain proteins within the Martini framework. Again, we suggest that the data we have collected here will be useful to test any such changes, and the results obtained with λ_PP = 0.88 are a useful point of reference for other force field modifications. The increased sensitivity to sequence perturbations observed for the hnRNPA1 LCD sequence variants and the series of mTurq-GS_X-mNeon proteins also suggests that λ_PP = 0.88 could provide a good starting point for rebalancing protein interactions at the amino acid or bead level to improve the specificity in weaker protein-protein interactions.

For other types of systems, it has been suggested that the non-bonded interactions in Martini 3 must be rescaled to a different extent to reach agreement with experimental observations. For example, modifying protein-water interactions in Martini 3 affects the propensity of the disordered LCD of FUS to form condensates in a way that appears to depend on the salt concentration⁶⁶, while the insertion of transmembrane helices into the phospholipid bilayer may require decreased protein-water interactions⁶⁷. Additionally, unmodified Martini 3 has been shown to provide accurate free energies of dimerization for transmembrane proteins⁶. Our results show that this behaviour is preserved when rescaling protein-water interactions, whereas decreasing the strength of protein-protein interactions is likely not suitable for systems with specific protein self-association in the membrane. In light of these results, it seems that uniformly rescaling non-bonded interactions may not be able to provide a universally transferable protein model within the Martini framework, and that a more detailed rebalancing of interactions or CG mapping scheme is necessary. Future work could, for example, examine the combined effects of more modest rescaling of protein-protein and protein-water interactions, or focus on secondary-structure dependent force field parameters as recently proposed for another CG force field⁶⁸.

Overall, however, our results demonstrate that for soluble proteins decreasing the non-bonded interactions between all protein beads by 12% leads to a more accurate balance of interactions while retaining the specificity of protein-membrane interactions. We foresee that our protocol will be a useful starting point to investigate the interactions of IDPs with lipid membranes using chemically transferable MD simulations, and that these investigations will further provide insights into possible strategies on future force field development efforts. Since CG simulations also play an important role in integrative structural biology¹, we also expect that these developments will enable an even tighter link between simulations and experiments to study large and complex biomolecular assemblies.

Methods

IDP simulations

We performed MD simulations of a set of 12 IDPs with SAXS data available (Supplementary Table 4) and five IDPs with intramolecular PRE data available (Supplementary Table 5)^7,44 using Gromacs 2020.3⁶⁹. We ran simulations with the Martini 3 force field⁶ with the well-depth, ϵ, in the Lennard-Jones potential between all protein beads rescaled by a factor λ_PP or with ϵ in the Lennard-Jones potential between all protein backbone and water beads rescaled by a factor λ_PW-BB. We generated CG structures using Martinize2 based on initial all-atom structures corresponding to the 95th percentile of the R_g-distributions from simulations in Tesei et al.⁴⁴. Secondary structure and elastic network restraints were not assigned for IDPs. Structures were placed in a dodecahedral box using Gromacs editconf and solvated, with NaCl concentrations corresponding to the ionic strength used in SAXS or PRE experiments, using the Insane python script⁷⁰. The systems were equilibrated for 10 ns with a 2 fs time step using the Velocity-Rescaling thermostat⁷¹ and Parinello-Rahman barostat⁷². Production simulations were run for 40 μs with a 20 fs time step using the Velocity-Rescaling thermostat⁷¹ and Parinello-Rahman barostat⁷². The simulation temperature was set to match the SAXS or PRE experiment, and the pressure was set to 1 bar. Non-bonded interactions were treated with the Verlet cut-off scheme. A cut-off of 1.1 nm was used for van der Waals interactions. A dielectric constant of 15 and cut-off of 1.1 nm were used for Coulomb interactions. Simulation frames were saved every 1 ns. Molecule breaks from crossing the periodic boundaries were treated with Gromacs trjconv using the flags: -pbc whole -center. Convergence of the simulations was assessed by block-error analysis⁷³ of R_g calculated from simulation coordinates using the blocking code from: https://github.com/fpesceKU/BLOCKING. All CG trajectories were back-mapped to all-atom structures using a simplified version¹³ of the Backward algorithm⁷⁴, in which simulation runs are excluded and the two energy minimization runs are shortened to 200 steps.

Multidomain protein structures

We performed MD simulations of a set of 15 multidomain proteins with SAXS data available (Supplementary Table 6). We built the initial structure of MyBP-C_MTHB-C2 based on the NMR structure containing both domains (PDB: 5K6P)²⁷. We built the structures of the linear polyubiquitin chains, Ubq₂, Ubq₃, and Ubq₄, based on the crystal structure of the open conformation of Ubq₂ (PDB: 2W9N)⁷⁵. For Ubq₃ and Ubq₄, the linker regions between the original and extended structures were remodelled using Modeller^76,77. We built the initial structure of Gal-3 based on the crystal structure of the folded C-terminal domain (PDB: 2NMO)⁷⁸ and the IDR from the AlphaFold structure of full-length Gal3 (AF-P17931-F1)^79,80. We built the structure of MyBP-C_C5-C6-C7 based on the NMR structure of the C5 domain (PDB: 1GXE)⁸¹, and the AlphaFold structure of the full-length MyBP-C (AF-Q14896-F1)^79,80. We inserted missing residues in the NMR structure of the C5 domain using Modeller⁷⁶. For the mTurq-GS_X-mNeon constructs, we used structures from Monte-Carlo simulations in³⁰ as starting structures for our simulations. To validate the starting structures, we calculated the RMSD between the two fluorescent protein domains and corresponding crystal structures (mTurquoise2 (PDB: 4AR7)⁸² and mNeonGreen (PDB: 5LTR)⁸³) using PyMOL align, which gave an RMSD of 0.2-0.3 Å.

Multidomain protein simulations

We ran MD simulations of the set of multidomain proteins using Gromacs 2020.3⁶⁹. We ran simulations with the Martini 3 force field⁶, as well as several modified versions of Martini 3 in which the well-depth, ϵ, in the Lennard-Jones potential between all protein and water beads was rescaled by a factor λ_PW, ϵ in the Lennard-Jones potential between all protein beads was rescaled by a factor λ_PP, or ϵ in the Lennard-Jones potential between all protein backbone and water beads was rescaled by a factor λ_PW-BB. We assigned secondary structure-specific potentials using DSSP⁸⁴ and Martinize2. The secondary structure of all residues in linkers and IDRs were manually assigned to coil, turn, or bend. We applied an elastic network model using Martinize2 consisting of harmonic potentials with a force constant of 700 kJ mol^-1 nm^-2 between all backbone beads within a cut-off distance of 0.9 nm. We removed the elastic network potentials in all linkers and IDRs and between folded domains, so only the structures of individual folded domains were restrained (Supplementary Table 3). Dihedral and angle potentials between sidechain and backbone beads were assigned using the -scfix flag in Martinize2, but removed in all linkers and IDRs. Structures were placed in a dodecahedral box using Gromacs editconf and solvated, with NaCl concentrations corresponding to the ionic strength used in SAXS experiments, using the Insane python script⁷⁰. The systems were equilibrated for 10 ns with a 2 fs time step using the Berendsen thermostat and Berendsen barostat⁸⁵. Production simulations were run for at least 40 μs with a 20 fs time step using the Velocity-Rescaling thermostat⁷¹ and Parinello-Rahman barostat⁷². The simulation temperature was set to match the corresponding SAXS experiment and the pressure was set to 1 bar. Non-bonded interactions were treated with the Verlet cut-off scheme. A cut-off of 1.1 nm was used for van der Waals interactions. A dielectric constant of 15 and cut-off of 1.1 nm were used for Coulomb interactions. Simulation frames were saved every 1 ns. Molecule breaks from crossing the periodic boundaries were treated with Gromacs trjconv using the flags: -pbc whole -center. Convergence of the simulations was assessed by block-error analysis⁷³ of R_g calculated from simulation coordinates using the blocking code from: https://github.com/fpesceKU/BLOCKING. All CG trajectories were back-mapped to all-atom structures using a simplified version¹³ of the Backward algorithm⁷⁴, in which simulation runs are excluded and the two energy minimization runs are shortened to 200 steps.

Simulations of protein self-association in solution

We ran MD simulations of two copies of the two folded proteins ubiquitin and villin HP36, and the four IDPs FUS_LCD, α-synuclein, hTau40, and p15PAF, as previously described⁷, using the Martini 3 force field⁶ with the well-depth, ϵ, in the Lennard-Jones potential between all protein beads rescaled by a factor λ_PP = 0.88. We used PDB ID 1UBQ⁸⁶ and PDB ID 1VII⁸⁷ as starting structures for ubiquitin and villin HP36, respectively. The simulations were set up and run using the same protocol as for IDP simulations. Two copies of ubiquitin, villin HP36, FUS_LCD, α-synuclein, hTau40, and p15PAF were placed in cubic boxes with side lengths 14.92, 7.31, 40.5, 25.51, 48.02, and 34.15 nm giving protein concentrations of 1000, 8500, 50, 200, 30, and 83.4 μM respectively. NaCl concentrations and temperatures were set according to the corresponding experimental conditions (Supplementary Table 7-8). For ubiquitin and villin HP36 the following steps were also used in the simulation setup: (i) Secondary structure was assigned with DSSP⁸⁴ in Martinize2. (ii) An elastic network model was applied with Martinize2. The elastic network restraints consisted of a harmonic potential with a force constant of 700 kJ mol^-1 nm^-2 between backbone beads within a 0.9 nm cut-off. For ubiquitin, we removed elastic restraints from the C-terminus (residues 72–76) to allow for flexibility⁸⁸. (iv) Dihedral and angular potentials between side chains and backbone beads were added based on the initial structures with the -scfix flag in Martinize2. For ubiquitin, villin HP36, α-synuclein, and p15PAF we ran 10 replica simulations of 40 μs per replica. For hTau40 and FUS_LCD, we ran 10 replica simulations of 13 μs and 25 μs per replica respectively.

We analyzed the population of the bound states in our simulations by calculating the minimum distance between beads in the two protein copies over the trajectory with Gromacs mindist. The fraction bound was defined as the fraction of frames where the minimum distance was below 0.8 nm. For ubiquitin and villin HP36, we calculated the expected fraction of bound protein at the concentrations in our simulations based on the respective K_d-values of 4.9 mM and 1.5 mM determined for self-association^37,38. The bound fraction was calculated as

$${\phi }_{b}=\frac{4{C}_{p}+{K}_{d}-\sqrt{8{K}_{d}{C}_{p}+{{K}_{d}}^{2}}}{4{C}_{p}}$$

(1)

where ϕ_b is the bound fraction, C_p is the concentration of protein in the simulation box (using the average box volume over all simulation trajectories), and K_d is the dissociation constant.

Amino acid side chain analogue simulation parameters

The Martini 3 parameters for amino acid side chain analogues were produced based on the existing amino acid parameters by simply removing the backbone bead and any potentials or exclusions involving the backbone bead. For simulations of Arg-Asp side chain analogue self-association, the SC1 bead was also removed from Arg (leaving only the SC2 bead of type SQ3p) in order to best emulate the guanidine-acetate system used to measure the experimental affinity⁴².

Amino acid side chain analogue self-association simulations

We ran MD simulations of two copies of Tyr and Phe side chain analogues, as well as Tyr-Phe, Arg-Asp, and Lys-Asp side chain analogues using the Martini 3 force field⁶ either unmodified or with the well-depth, ϵ, in the Lennard-Jones potential between all protein and water beads rescaled by a factor λ_PW = 1.10 or ϵ in the Lennard-Jones potential between all protein beads rescaled by a factor λ_PP = 0.88. The simulations were set up and run using the same protocol as for IDP simulations. The two side chain analogues were placed in a cubic box with a side length of 5 nm for the Tyr-Tyr, Arg-Asp, and Lys-Asp systems and a side length of 10 nm for the Phe-Phe and Tyr-Phe systems. A NaCl concentration of 150 mM was used for Phe-Phe, Tyr-Phe, and Tyr-Tyr simulations. No NaCl was added in the Lys-Asp and Arg-Asp systems. The simulations were run for 100 μs each at 300 K.

We used a similar approach as in Souza et al.⁸⁹ to determine the standard Gibbs free energy of self-association, ΔG⁰, from simulations. We calculated free energy profiles of self-association as:

$$\Delta G(r)=-RT\ln \left(p(r)\right)+2RT\ln \left(r\right)$$

(2)

where r is the COM distance between side chain analogues calculated with Gromacs distance, p(r) is the probability density of the COM distance, R is the gas constant, and T is the temperature. The second term is an entropic correction. We then determined the association constant, K_a, from the free energy profile using:

$${K}_{a}=\int_{{r}_{min }}^{{r}_{max}}4\pi {r}^{2}\exp \left(\frac{-\Delta G(r)}{RT}\right)dr$$

(3)

where ${r}_{\min }$ and ${r}_{\max }$ define the boundaries of the bound state along the free energy profile. ${r}_{\min }$ and ${r}_{\max }$ were selected for each simulation to encompass the first negative free-energy well of ΔG(r) as shown in Supplementary Fig. 10. We then determined ΔG⁰ as:

$$\Delta {G}^{0}=-RT\ln \left({K}_{a}{C}^{0}\right)$$

(4)

where C⁰ is a standard concentration of (1/1.66) nm⁻³. We also used eq. (4) to calculate ΔG⁰ from experimental values of K_a using instead C⁰ = 1 M. K_a-values of 0.4 M⁻¹ for Phe-Phe (benzene-benzene) and 0.6 M⁻¹ for Phe-Tyr (benzene-phenol) were obtained from Christian and Tucker⁴⁵. Experimental K_a-values of 0.31 M⁻¹ for Lys-Asp (butylammonium-acetate) and 0.37 M⁻¹ for Arg-Asp (guanidine-acetate) were obtained from Springs and Haake⁴².

Amino acid side chain analogue cyclohexane/water partitioning simulations

We ran MD simulations of the cyclohexane/water partitioning of the uncharged amino acid side chain analogues for which experimental transfer free energies were taken from Radzicka and Wolfenden (as in Monticelli et al.)^5,40 using unmodified Martini 3 and Martini 3 with λ_PW = 1.10. We prepared a simulation box with 716 copies of Martini 3 cyclohexane (CHEX) and water (W) respectively. For each partitioning simulation, we added a single copy of a side chain analogue. Simulations were set up and run using the same protocol as for IDP simulations. Each simulation was run for 100 μs at 300 K.

We centered the trajectories on the cyclohexane phase and calculated the normalized density of water, cyclohexane, and side chain analogue along the z-axis of the simulation box averaged over the simulation frames. We defined the boundaries between the water and cyclohexane phases as the cross-over point of their densities and determined the average density of the side chain analogue in the water and cyclohexane phase, ρ_W and ρ_CHEX, excluding the regions close to the phase boundaries (Supplementary Fig. 9). We then calculated the transfer free energy from cyclohexane to water as:

$$\Delta {G}_{{{{\rm{CHEX}}}}-{{{\rm{W}}}}}=RT\ln \left(\frac{{\rho }_{{{{\rm{CHEX}}}}}}{{\rho }_{{{{\rm{W}}}}}}\right)$$

(5)

where R is the gas constant and T is the temperature. The Pearson correlations with experimental transfer free energies were calculated using the pearsonr function in SciPy stats and standard errors were determined with bootstrapping using the bootstrap function in SciPy stats with 9999 resamples⁹⁰.

hnRNPA1 LCD variant simulations

We ran MD simulations of a set of six variants of the hnRNPA1 LCD (-10R, -10R+10K, -12F+12Y, -6R+6K, +7F-7Y, +7K+12D) for which the R_g has previously been determined by SAXS experiments⁵⁴. The variants contain substitutions to and from charged and aromatic residues, but have the same sequence length as the wild-type protein, and were selected to have a relatively large deviation in R_g from the wild-type; protein sequences can be found in the supporting information of Bremer et al.⁵⁴. We ran MD simulations with unmodified Martini 3 and Martini 3 with ϵ in the Lennard-Jones potential between all protein beads rescaled by a factor λ_PP = 0.92 or λ_PP = 0.88. Simulations were set up using the same protocol as for the other IDPs described above. The systems were equilibrated for 10 ns with a 2 fs time step using the Berendsen thermostat and Berendsen barostat⁸⁵. Production simulations were run for 100 μs with a 20 fs time step using the Velocity-Rescaling thermostat⁷¹ and Parinello-Rahman barostat⁷². Simulations were run with 150 mM NaCl at 298 K and 1 bar.

Peripheral membrane protein simulations

We performed MD simulations of one soluble protein as a negative control, three folded peripheral membrane proteins, two multidomain proteins, and two IDRs with lipid bilayers of different compositions (Supplementary Table 9). We ran simulations with the Martini 3 force field⁶, or with modified force fields in which ϵ in the Lennard-Jones potential between all protein beads were rescaled by a factor λ_PP = 0.88 or with ϵ in the Lennard-Jones potential between all protein and water beads rescaled by a factor λ_PW = 1.10.

Initial structures of proteins were obtained either from the RCSB database⁹¹ or from the AlphaFold protein structure database⁹². For Complexin CTM, we used ColabFold v1.5.2⁹³ to model the 16-residues long (ATGAFETVKGFFPFGK) disordered region. The N-terminal IDR of TRPV4 (residues 2–134) was taken from the full-length AlphaFold structure of TRPV4 (A0A1D5PXA5). Initial structure of the FERM domains in Talin (PDB:3IVF) had missing residues (134–172), which we modelled using Modeller^76,77 via the Chimera interface⁹⁴. CG structures of proteins were generated using Martinize2, with the DSSP⁸⁴ flag to assign secondary structure. An elastic network was applied consisting of harmonic potentials with a force constant of 700 kJ mol^-1 nm^-2 between all backbone beads within a cut-off of 0.8 nm. We removed elastic network potentials between different domains and in linkers and in IDRs of multidomain proteins. Secondary structure and elastic network was not assigned to the two IDRs.

All the lipid bilayers, with initial lateral dimension of 20 nm × 20 nm, were generated using CHARMM-GUI Martini maker⁹⁵, except in the systems where phosphoinositol-(4,5)-phosphate (PIP2) lipids were needed, which instead were generated using the Insane python script⁷⁰. We used the parameters for SAP2_45 lipids⁹⁶ to model PIP2 in the bilayer. The bilayers were then minimized and equilibrated following the 6-step equilibration protocol in CHARMM-GUI. To compute protein-membrane interactions, systems were generated as previously described²⁴, with a minimum distance of 3 nm between any protein and lipid bead. Systems were then solvated and water was removed from the bilayer. In all cases, a water layer of 1.5 nm was kept at the top of the protein and bottom of the bilayer, which results in an initial box size of 20 nm × 20 nm in the x and y directions and variable length of the box in the z direction depending on the size of the protein. Systems were then neutralized and excess NaCl was added as described (Supplementary Table 9). Systems were first energy minimized using steepest descent algorithm after which a short MD run of 200 ps was performed with the protein backbone beads restrained. Production simulations (four replicas for each system) were run for 3 μs with a time step of 20 fs using velocity-rescale thermostat⁷¹ and Parrinello-Rahman barostat⁷².

We performed MD simulations of the two IDRs (Complexin CTM and TRPV4 IDR) in solution with unmodified Martini 3 and both of the modified versions of Martini 3. For these simulations, we took the CG structure and placed it in a cubic box using Gromacs editconf, and solvated with 150 mM NaCl. Then the system was minimized for 10,000 steps with the steepest descent algorithm, and a short equilibration run was performed with the Berendsen thermostat and Berendsen barostat⁸⁵ with a time step of 2 fs. Production simulations were run for 10 μs with a 20 fs time-step using the Parrinello-Rahman barostat⁷² and velocity-rescaling thermostat⁷¹. All the simulations were performed with GROMACS 2021.5⁶⁹. The initial 100 ns of production run were discarded from all the trajectories for further analysis.

Simulations of transmembrane protein self-association

We performed MD simulations of the transmembrane domains of two protein dimers from the RTK family to calculate the free energy of association, ΔG, using the Martini 3 force field⁶ either unmodified or with the well-depth, ϵ, in the Lennard-Jones potential between all protein and water beads rescaled by a factor λ_PW = 1.10 or ϵ in the Lennard-Jones potential between all protein beads rescaled by a factor λ_PP = 0.88. Simulations were performed with Gromacs 2021.5. PDB 2K1L⁹⁷ was used as the starting structure for EphA1. We used Charmm-GUI⁹⁵ to embed the EphA1 dimer in a bilayer of 400 DLPC lipids and 0.5 M NaCl corresponding to the conditions in the reference experiment⁶², as in Javanainen et al.⁹. The system was equilibrated using the standard six-step protocol in Charmm-GUI. For ErbB1, the starting structure of the system, based on PDB 2M0B⁹⁸, was taken from Souza et al.⁶. The system has 400 DLPC lipids and 0.15 M NaCl corresponding to the conditions in the reference experiment⁶¹. The system was equilibrated for 50 ns in the NPT ensemble with position restraints of 1000 kJ/(mol nm²) on both chains of the dimer.

For both systems, pulling simulations were run for 100 ns at a rate of 0.05 nm/ns. The 2D COM distance between the protein subunits, r_COM, was used as the reaction coordinate. Frames ranging from 0.6 nm – 3.4 nm with a spacing of 0.2 nm were extracted from the pulling simulation trajectories as umbrella sampling windows, to be consistent with previous work⁶. A spring constant of 400 kJ/(mol nm²) was applied as the umbrella potential in production runs. The temperature was maintained at 303 K separately for peptides, lipids, and solvents and semi-isotropic pressure coupling was applied at 1 bar. The production run was performed for 10 μs in each window using a time-step of 20 fs and the Gromacs WHAM tool⁹⁹ was used to obtain the PMF. A correction term was added to the PMF obtained using WHAM to account for the entropic contribution before plotting the profiles.

$$PMF=PM{F}_{{{{\rm{WHAM}}}}}+RT\ln \left(r\right)$$

(6)

The error in PMF plots represents the standard deviation of 4 profiles calculated from 2 μs blocks, where the first 2 μs were discarded. All PMFs plateaued before r_COM = 3.4 nm and were aligned to zero at this value. ΔG values were estimated from the minimum of zero aligned PMFs for comparison with experimental values.

SAXS calculations

We extracted 20,000 evenly distributed frames from each back-mapped trajectory to calculate SAXS profiles using Pepsi-SAXS¹⁰⁰. To avoid overfitting the parameters for the contrast of the hydration layer (δρ) and the displaced solvent (r0) by fitting them individually for each structure, we used the fixed values for these parameters determined in Pesce and Lindorff-Larsen¹⁰¹. We globally fitted the scale and constant background with least-squares regression weighted by the experimental errors using Scikit-learn¹⁰². To assess the agreement between the experimental SAXS profiles and those calculated from simulations, we calculated the ${\chi }_{r}^{2}$ between the ensemble-averaged calculated SAXS intensities (I_calc) and the experimental SAXS intensity (I_exp):

$${\chi }_{r}^{2}=\frac{1}{m}{\sum }_{q}^{m}{\left(\frac{{I}_{q}^{calc}-{I}_{q}^{exp}}{{\sigma }_{q}^{exp}}\right)}^{2}$$

(7)

where σ^exp is the error of the experimental SAXS intensity and m is the number of measured SAXS intensities. We used the Bayesian Indirect Fourier Transform algorithm (BIFT) to rescale the errors of the experimental SAXS intensities, in order to obtain a more consistent error estimate across the different proteins^103,104.

PRE calulcations

We used the DEER-PREdict software¹⁰⁵ to calculate intrachain PREs from the back-mapped trajectories of α-synuclein, FUS_LCD,hnRNPA2_LCD, OPN and hTau40 (Supplementary Table 5), and interchain PREs from the back-mapped trajectories of two copies of FUS_LCD. DEER-PREDICT uses a rotamer library approach to model the MTSL spin-label¹⁰⁶ and a model-free formalism to calculate the spectral density¹⁰⁷. We assumed an effective correlation time of the spin label, τ_t, of 100 ps, a molecular correlation time, τ_c, of 4 ns¹⁰⁸, a transverse relaxation rate for the diamagnetic protein of 10 s⁻¹ and a total INEPT time of the HSQC measurement of 10 ms¹⁰⁹. For the simulations of two copies of FUS_LCD, τ_c was not fixed to 4 ns. We instead scanned values of τ_c from 1 – 20 ns in steps of 1 ns and selected the τ_c that minimized the ${\chi }_{r}^{2}$ to the experimental PRE data for each force field. The optimal values were 1 ns, 8 ns, and 9 ns for unmodified Martini 3, λ_PW = 1.10, and λ_PP = 0.88 respectively. The agreement between calculated and experimental PREs was assessed by calculating the ${\chi }_{r}^{2}$ over all spin-label positions,

$${\chi }_{r}^{2}=\frac{1}{{N}_{labels}{N}_{res}}{\sum }_{j}^{{N}_{labels}}{\sum }_{i}^{{N}_{res}}{\left(\frac{{Y}_{ij}^{exp}-{Y}_{ij}^{calc}}{{\sigma }_{ij}^{exp}}\right)}^{2}$$

(8)

where N_labels and N_res are the number of spin-labels and residues, ${Y}_{ij}^{exp}$ and ${Y}_{ij}^{calc}$ are the experimental and calculated PRE rates for label j and residue i, and ${\sigma }_{ij}^{exp}$ is the experimental error of the PRE rate for label j and residue i. For the simulations of two copies of the FUS_LCD, the ${\chi }_{r}^{2}$ was calculated as an average over the 10 replica simulations.

Radii of gyration

We calculated the R_g from CG simulation trajectories using Gromacs gyrate⁶⁹ and calculated the error of the average R_g using block-error analysis⁷³ (https://github.com/fpesceKU/BLOCKING). Experimental R_g-values and corresponding error bars were calculated from SAXS profiles by Guinier analysis using ATSAS AUTORG with default settings¹¹⁰, except in the case of the hnRNPA1_LCD variants, for which we used the R_g-values reported in Bremer et al.⁵⁴, which were determined from SAXS data using an empirical molecular form factor approach. Pearson correlation coefficients were calculated using the pearsonr function in SciPy stats and standard errors were determined with bootstrapping using the bootstrap function in SciPy stats with 9999 resamples⁹⁰.

Principal component analysis

We used PCA based on the pairwise distances between backbone beads to compare our unmodified, λ_PW = 1.10, λ_PP = 0.88, and λ_PW-BB = 1.22 Martini 3 simulations of IDPs and multidomain proteins. PCA was performed with PyEMMA¹¹¹. For each protein, all four ensembles were pooled for PCA in order to project into the same two principal components. For all IDPs except PRN_NT and CoR_NID, and for the multidomain proteins Gal-3, MyBP-C_MTHB-C2, Ubq₂, and Ubq₃, the pairwise distances between all backbone beads were used as features for PCA. For the remaining proteins, the pairwise distances between every 5th backbone bead were used as features for PCA. To quantify the similarity between the ensembles, we calculated the Jensen-Shannon divergence between the probability distribution of the principal components produced with each pair of force fields using the jensenshannon function in SciPy⁹⁰. We calculated the Jensen-Shannon divergence between all pairwise combinations of force fields for each protein, and then averaged these values across the set of proteins to quantify the overall similarity. For a given protein, we used the same set of bins to calculate the probability histogram of the principle components for all force field pairings.

Comparison with atomistic ensembles

We compared our unmodified Martini 3 and λ_PP = 0.88 Martini 3 simulations of α-synuclein with a 20 μs simulation with the Amber03ws force field and a 73 μs simulation with the Amber99SB-disp force field from Robustelli et al.⁵⁸. Because of problems caused by interactions between periodic images of the protein in the originally published Amber99SB-disp simulation, we used the corrected version of the simulation also used in Ahmed et al.⁵⁹. We also compared our unmodified Martini 3 and λ_PP = 0.88 Martini 3 simulations of hnRNPA1 with an atomistic ensemble from Ritsch et al.⁶⁰ (Protein Ensemble Database PED00212). We used the dimensionality reduction ensemble similarity (DRES) approach in Encore^56,57 implemented in MDAnalysis¹¹² to quantify the ensemble similarity based on Cα RMSD. We used the all-atom back-mapped versions of our Martini simulations (described above). For hnRNPA1, we used every 10th frame from our Martini simulations for a total of 4001 frames per simulation and all structures from the atomistic ensemble. Different constructs of hnRNPA1 were used for our simulations and the experimentally restrained ensemble⁶⁰, so the Encore DRES calculations were only performed for residues 2-258, which are identical in both constructs^17,60. For α-synuclein, we used every 10th frame from each simulation for a total of 4001 frames per Martini simulation, 2998 frames from the Amber03ws simulation, and 2998 frames from the Amber99sb-disp simulation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data generated for this paper is available via https://github.com/KULL-Centre/_2023_Thomasen_Martini and a backup of this repository is available via https://doi.org/10.5281/zenodo.11545138. Simulation data for protein membrane simulations are available via https://zenodo.org/record/8154919. Simulation data for TM helices potential of mean force are available via https://zenodo.org/records/10949496. Simulation data for IDP self-association simulations, single-chain simulations of OPN and htau40, and side chain analogue simulations are available via https://doi.org/10.17894/ucph.80cbd22e-bb35-46ac-b14c-95ea92899608. Simulation data for all other simulations are available via https://zenodo.org/record/8010043. Force field files for Martini 3 with interactions between protein beads rescaled by λ_PP = 0.88 are available at https://github.com/KULL-Centre/_2023_Thomasen_Martini/tree/main/force_field. PDB accession codes for protein structures used in this paper are available in Supplementary Table 11. Source data are provided with this paper.

Code availability

Code and scripts used for this paper are available via https://github.com/KULL-Centre/_2023_Thomasen_Martini and a backup of this repository is available via https://doi.org/10.5281/zenodo.11545138.

References

Thomasen, F. E. & Larsen, K. L. Conformational ensembles of intrinsically disordered proteins and flexible multidomain proteins. Biochem. Soc. Trans. 50, 541–554 (2022).
Bottaro, S. & Lindorff-Larsen, K. Biophysical experiments and biomolecular simulations: a perfect match? Science 361, 355 LP – 360 (2018).
Article Google Scholar
Ingólfsson, H. I. et al. The power of coarse graining in biomolecular simulations. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 225–248 (2014).
Article PubMed Google Scholar
Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P. & De Vries, A. H. The MARTINI force field: coarse grained model for biomolecular simulations. J. Phys.Chem. B 111, 7812–7824 (2007).
Article CAS PubMed Google Scholar
Monticelli, L. et al. The MARTINI coarse-grained force field: extension to proteins. J. Chem. Theory Comput. 4, 819–834 (2008).
Article CAS PubMed Google Scholar
Souza, PauloC. T. et al. Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nat. Methods 18, 382–388 (2021).
Article CAS PubMed Google Scholar
Thomasen, F. E., Pesce, F., Roesgaard, MetteAhrensback, Tesei, G. & Lindorff-Larsen, K. Improving martini 3 for disordered and multidomain proteins. J. Chem. Theory Comput. 18, 2033–2041 (2022).
Article CAS PubMed Google Scholar
Stark, A. C., Andrews, C. T. & Elcock, A. H. Toward optimized potential functions for protein-protein interactions in aqueous solutions: osmotic second virial coefficient calculations using the MARTINI coarse-grained force field. J. Chem. Theory Comput. 9, 10.1021/ct400008p (2013).
Javanainen, M., Martinez-Seara, H. & Vattulainen, I. Excessive aggregation of membrane proteins in the martini model. PLoS ONE 12, e0187936 (2017).
Article PubMed PubMed Central Google Scholar
Berg, A., Kukharenko, O., Scheffner, M. & Peter, C. Towards a molecular basis of ubiquitin signaling: a dual-scale simulation study of ubiquitin dimers. PLoS Comput. Biol. 14, 1–14 (2018).
Article Google Scholar
Berg, A. & Peter, C. Simulating and analysing configurational landscapes of protein-protein contact formation. Interface Focus 9, 20180062 (2019).
Article PubMed PubMed Central Google Scholar
Alessandri, R. et al. Pitfalls of the martini model. J. Chem. Theory Comput. 15, 5448–5460 (2019).
Article CAS PubMed PubMed Central Google Scholar
Larsen, AndreasHaahr et al. Combining molecular dynamics simulations with small-angle X-ray and neutron scattering data to study multi-domain proteins in solution. PLoS Comput. Biol. 16, 1–29 (2020).
Article Google Scholar
Benayad, Z., Von Bülow, S. ören, Stelzl, L. S. & Hummer, G. Simulation of FUS protein condensates with an adapted coarse-grained model. J. Chem. Theory Comput. 17, 525–537 (2021).
Article CAS PubMed Google Scholar
Majumder, A. & Straub, J. E. Addressing the excessive aggregation of membrane proteins in the MARTINI model. J. Chemi. Theory Comput. 17, 2513–2521 (2021).
Article CAS Google Scholar
Lamprakis, C. et al. Evaluating the efficiency of the martini force field to study protein dimerization in aqueous and membrane environments. J. Chem. Theory Comput. 17, 3088–3102 (2021).
Article CAS PubMed Google Scholar
Martin, E. W. et al. Interplay of folded domains and the disordered low-complexity domain in mediating hnRNPA1 phase separation. Nucleic Acids Res. 49, 2931–2945 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Best, R. B., Zheng, W. & Mittal, J. Balanced protein-water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput. 10, 5113–5124 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kjaergaard, M. & Kragelund, B. B. Functions of intrinsic disorder in transmembrane proteins. Cell. Mol. Life Sci. 74, 3205–3224 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zeno, W. F. et al. Synergy between intrinsically disordered domains and structured proteins amplifies membrane curvature sensing. Nat. Commun. 9, 4152 (2018).
Article ADS PubMed PubMed Central Google Scholar
Das, T. & Eliezer, D. Membrane interactions of intrinsically disordered proteins: the example of alpha-synuclein. Biochimica. et Biophysica. Acta (BBA) Proteins Proteom.1867, 879–889 (2019).
Article CAS Google Scholar
Fakhree, MohammadA. A., Blum, C. & Claessens, MireilleM. A. E. Shaping membranes with disordered proteins. Arch. Biochem. Biophys. 677, 108163 (2019).
Article CAS PubMed Google Scholar
Cornish, J., Chamberlain, S. G., Owen, D. & Mott, H. R. Intrinsically disordered proteins and membranes: a marriage of convenience for cell signalling? Biochem. Soc. Trans. 48, 2669–2689 (2020).
Article CAS PubMed PubMed Central Google Scholar
Srinivasan, S., Zoni, V. & Vanni, S. Estimating the accuracy of the MARTINI model towards the investigation of peripheral protein-membrane interactions. Faraday Discuss. 232, 131–148 (2021).
Article ADS PubMed Google Scholar
Srinivasan, S. et al. Conformational dynamics of lipid transfer domains provide a general framework to decode their functional mechanism. bioRxiv https://doi.org/10.1101/2023.04.11.536463 (2023).
Sonntag, M. et al. Segmental, domain-selective perdeuteration and small-angle neutron scattering for structural analysis of multi-domain proteins. Angew. Chemie. Int. Ed. Engl. 56, 9322–9325 (2017).
Article CAS Google Scholar
Michie, K. A., Kwan, A. H., Tung, Chang-Shung, Guss, J. M. & Trewhella, J. A highly conserved yet flexible linker is part of a polymorphic protein-binding domain in myosin-binding protein C. Structure 24, 2000–2007 (2016).
Article CAS PubMed Google Scholar
Nadvi, NaveedAhmed, Michie, K. A., Kwan, A. H., Guss, J. M. & Trewhella, J. Clinically linked mutations in the central domains of cardiac myosin-binding protein C with distinct phenotypes show differential structural effects. Structure 24, 105–115 (2016).
Article CAS PubMed Google Scholar
Jussupow, A. et al. The dynamics of linear polyubiquitin. Sci. Adv. 6, eabc3786 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Moses, D. et al. Structural biases in disordered proteins are prevalent in the cell. Nat. Struct. Mol. Biol. 31, 283–292 (2024).
Lin, Yu-Hao et al. The intrinsically disordered N-terminal domain of galectin-3 dynamically mediates multisite self-association of the protein through fuzzy interactions. J. Biol. Chem. 292, 17845–17856 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ryan, V. H. et al. Mechanistic view of hnRNPA2 low-complexity domain structure, interactions, and phase separation altered by mutation and arginine methylation. Mol. cell 69, 465–479.e7 (2018).
Article PubMed PubMed Central Google Scholar
Monahan, Z. et al. Phosphorylation of the FUS low-complexity domain disrupts phase separation, aggregation, and toxicity. EMBO J. 36, 2951–2967 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dedmon, M. M., Lindorff, K., Christodoulou, J., Vendruscolo, M. & Dobson, C. M. Mapping long-range interactions in α-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J. Am. Chem. Soc. 127, 476–477 (2005).
Article CAS PubMed Google Scholar
Mukrasch, M. D. et al. Structural polymorphism of 441-residue tau at single residue resolution. PLoS Biol. 7, e1000034 (2009).
Article PubMed PubMed Central Google Scholar
Platzer, G. et al. The metastasis-associated extracellular matrix protein osteopontin forms transient structure in ligand interaction sites. Biochemistry 50, 6113–6124 (2011).
Article CAS PubMed Google Scholar
Liu, Z. et al. Noncovalent dimerization of ubiquitin. Angew. Chemie. Int. Ed. Engl. 51, 469–472 (2012).
Article CAS Google Scholar
Brewer, S. H. et al. Effect of modulating unfolded state structure on the folding kinetics of the villin headpiece subdomain. Proc. Natl Acad. Sci. USA 102, 16662–16667 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
De Biasio, A. et al. p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins. Biophys. J. 106, 865–874 (2014).
Article PubMed PubMed Central Google Scholar
Radzicka, A. & Wolfenden, R. Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution. Biochemistry 27, 1664–1670 (1988).
Article CAS Google Scholar
de Jong, D. H., Periole, X. & Marrink, S. J. Dimerization of amino acid side chains: lessons from the comparison of different force fields. J. Chem.Theory Comput. 8, 1003–1014 (2012).
Article PubMed Google Scholar
Springs, B. & Haake, P. Equilibrium constants for association of guanidinium and ammonium ions with oxyanions: the effect of changing basicity of the oxyanion. Bio. Chem. 6, 181–190 (1977).
Article CAS Google Scholar
Bremer, A. et al. Deciphering how naturally occurring sequence features impact the phase behaviors of disordered prion-like domains. bioRxiv https://doi.org/10.1101/2021.01.01.425046 (2021).
Tesei, G., Schulze, T. K., Crehuet, R., & Lindorff-Larsen, K. Accurate model of liquid–liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proc. Natl Acad. Sci. USA 118, e2111696118 (2021).
Christian, S. D. & Tucker, E. E. Importance of heat capacity effects in the association of hydrocarbon moieties in aqueous solution. J. Solution Chem. 11, 749–754 (1982).
Article CAS Google Scholar
Yamamoto, E., Kalli, A. C., Akimoto, T., Yasuoka, K. & Sansom, MarkS. P. Anomalous dynamics of a lipid recognition protein on a membrane surface. Sci. Rep. 5, 18245 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Naughton, F. B., Kalli, A. C. & Sansom, MarkS. P. Association of peripheral membrane proteins with membranes: Free energy of binding of grp1 ph domain with phosphatidylinositol phosphate-containing model bilayers. J. Phys. Chem. Lett. 7, 1219–1224 (2016).
Article CAS PubMed PubMed Central Google Scholar
Marrink, S. J. & Tieleman, D. P. Perspective on the martini model. Chem. Soc. Rev. 42, 6801–6822 (2013).
Article CAS PubMed Google Scholar
Herzog, F. A., Braun, L., Schoen, I. & Vogel, V. Improved side chain dynamics in martini simulations of protein–lipid interfaces. J. Chem. Theory Comput. 12, 2446–2458 (2016).
Article CAS PubMed Google Scholar
Howard, S. B., Twigg, P. J., Baird, J. K. & Meehan, E. J. The solubility of hen egg-white lysozyme. J. Crystal Growth 90, 94–104 (1988).
Article ADS CAS Google Scholar
Buhr, J., Franz, F. & Gräter, F. Intrinsically disordered region of talin’s ferm domain functions as an initial pip2 recognition site. Biophys. J. 122, 1277–1286 (2023).
Goretzki, B. et al. Crosstalk between regulatory elements in disordered trpv4 n-terminus modulates lipid-dependent channel activity. Nat. Commun. 14, 4165 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Snead, D., Wragg, R. T., Dittman, J. S. & Eliezer, D. Membrane curvature sensing by the c-terminal domain of complexin. Nat. Commun. 5, 4955 (2014).
Article ADS CAS PubMed Google Scholar
Bremer, A. et al. Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains. Nat. Chem. 14, 196–207 (2022).
Article CAS PubMed Google Scholar
Riback, J. A. et al. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science (New York, N.Y.) 358, 238–241 (2017).
Article ADS CAS PubMed Google Scholar
Lindorff-Larsen, K. & Ferkinghoff-Borg, J. Similarity measures for protein ensembles. PLoS ONE 4, 1–13 (2009).
Article Google Scholar
Tiberti, M., Papaleo, E., Bengtsen, T., Boomsma, W. & Lindorff-Larsen, K. ENCORE: software for quantitative Ensemble Comparison. PLOS Comput. Biol. 11, e1004415 (2015).
Article ADS PubMed PubMed Central Google Scholar
Robustelli, P., Piana, S. & Shaw, D. E. Developing a molecular dynamics force field for both folded and disordered protein states. Proc. Natl Acad. Sci. USA 115, E4758 LP – E4766 (2018).
Article Google Scholar
Ahmed, MustaphaCarab et al. Refinement of α-synuclein ensembles against SAXS data: comparison of force fields and methods. Front. Mol. Biosci. 8, 1–13 (2021).
Article ADS Google Scholar
Ritsch, I. et al. Phase separation of heterogeneous nuclear ribonucleoprotein A1 upon specific RNA-binding observed by magnetic resonance. Angew. Chemie. (International ed. in English) 61, e202204311–e202204311 (2022).
Article CAS Google Scholar
Chen, L., Merzlyakov, M., Cohen, T., Shai, Y. & Hristova, K. Energetics of ErbB1 transmembrane domain dimerization in lipid bilayers. Biophys. J. 96, 4622–4630 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Artemenko, E. O., Egorova, N. S., Arseniev, A. S. & Feofanov, A. V. Transmembrane domain of EphA1 receptor forms dimers in membrane-like environment. Biochimica. et Biophysica. Acta (BBA) Biomembranes 1778, 2361–2367 (2008).
Article CAS PubMed Google Scholar
Jussupow, A. & Kaila, Ville R I. Effective molecular dynamics from neural network-based structure prediction models. J. Chem. Theory Comput. 7, 1965–1975 (2023).
Go, N. Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. 12, 183–210 (1983).
Article CAS PubMed Google Scholar
Poma, A. B., Cieplak, M. & Theodorakis, P. E. Combining the MARTINI and structure-based coarse-grained approaches for the molecular dynamics studies of conformational transitions in proteins. J. Chem.Theory Comput. 13, 1366–1374 (2017).
Article CAS PubMed Google Scholar
Zerze, Gül H. Optimizing the martini 3 force field reveals the effects of the intricate balance between protein-water Iinteraction strength and salt concentration on biomolecular condensate formation. J. Chem. Theory Comput. 4, 1646–1655 (2023).
Claveras Cabezudo, A., Athanasiou, C., Tsengenes, A. & Wade, R. C. Scaling protein-water interactions in the martini 3 coarse-grained force field to simulate transmembrane helix dimers in different lipid environments. J. Chem. Theory Comput. 7, 2109–2119 (2023).
Yamada, T. et al. Improved protein model in spica force field. J. Chem. Theory Comput. 19, 8967–8977 (2023).
Article CAS PubMed Google Scholar
Abraham, MarkJames et al. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25 (2015).
Article ADS Google Scholar
Wassenaar, T. A. et al. Computational lipidomics with insane: A versatile tool for generating custom membranes for molecular simulations. J. Chem. Theory Comput. 11, 2144–2155 (2015).
Article CAS PubMed Google Scholar
Bussi, G., Donadio, D. & Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 126, 1–7 (2007).
Article Google Scholar
Parrinello, M. & Rahman, A. Polymorphic transitions in single crystals: a new molecular dynamics method. J. Appl. Phys. 52, 7182–7190 (1981).
Article ADS CAS Google Scholar
Flyvbjerg, H. & Petersen, H. G. Error estimates on averages of correlated data. J. Chem. Phys. 91, 461–466 (1989).
Article ADS MathSciNet CAS Google Scholar
Wassenaar, T. A., Pluhackova, K., Böckmann, R. A., Marrink, S. J. & Tieleman, D. P. Going backward: a flexible geometric approach to reverse transformation from coarse grained to atomistic models. J. Chem.Theory Comput. 10, 676–690 (2014).
Article CAS PubMed Google Scholar
Komander, D. et al. Molecular discrimination of structurally equivalent Lys 63-linked and linear polyubiquitin chains. EMBO Rep. 10, 466–473 (2009).
Article CAS PubMed PubMed Central Google Scholar
Šali, A. & Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J Mol. Biol. 234, 779–815 (1993).
Article PubMed Google Scholar
Webb, B. & Sali, A. Comparative protein structure modeling using modeller. Curr. Protocols Bioinforma. 54, 5–6 (2016).
Article Google Scholar
Collins, P. M., Hidari, KazuyaI. P. J. & Blanchard, H. Slow diffusion of lactose out of galectin-3 crystals monitored by X-ray crystallography: possible implications for ligand-exchange protocols. Acta Crystallogr. Section D 63, 415–419 (2007).
Article ADS CAS Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Idowu, S. M., Gautel, M., Perkins, S. J. & Pfuhl, M. Structure, stability and dynamics of the central domain of cardiac myosin binding protein C (MyBP-C): implications for multidomain assembly and causes for cardiomyopathy. J. Mol. Biol. 329, 745–761 (2003).
Article CAS PubMed Google Scholar
Stetten, Davidvon, Noirclerc-Savoye, M., Goedhart, J., Gadella Jr, TheodorusW. J. & Royant, A. Structure of a fluorescent protein from aequorea victoria bearing the obligate-monomer mutation A206K. Acta Crystallogr. Section F 68, 878–882 (2012).
Article Google Scholar
Clavel, D. et al. Structural analysis of the bright monomeric yellow-green fluorescent protein mNeonGreen obtained by directed evolution. Acta Crystallogr. Section D 72, 1298–1307 (2016).
Article ADS CAS Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Article CAS PubMed Google Scholar
Berendsen, H. J. C., Postma, J. P. M., Gunsteren, W. Fvan, DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).
Article ADS CAS Google Scholar
Vijay-Kumar, S., Bugg, C. E. & Cook, W. J. Structure of ubiquitin refined at 1.8Åresolution. J. Mol. Biol. 194, 531–544 (1987).
Article CAS PubMed Google Scholar
McKnight, C. J., Matsudaira, P. T. & Kim, P. S. NMR structure of the 35-residue villin headpiece subdomain. Nat. Struct. Biol. 4, 180–184 (1997).
Article CAS PubMed Google Scholar
Lindorff-Larsen, K., Best, R. B., DePristo, M. A., Dobson, C. M. & Vendruscolo, M. Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 (2005).
Article ADS CAS PubMed Google Scholar
Souza, PauloC. T. et al. Protein-ligand binding with the coarse-grained martini model. Nat. Commun. 11, 3714 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rose, P. W. et al. The rcsb protein data bank: new resources for research and education. Nucleic Acids Res. 41, D475–D482 (2012).
Article PubMed PubMed Central Google Scholar
Varadi, M. et al. Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Article CAS PubMed Google Scholar
Mirdita, M. et al. Colabfold: making protein folding accessible to all. Nat. methods 19, 679–682 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. Ucsf chimera-a visualization system for exploratory research and analysis. J. Computat. Chem. 25, 1605–1612 (2004).
Article CAS Google Scholar
Qi, Y. et al. Charmm-gui martini maker for coarse-grained simulations with the martini force field. J. Chem. Theory Comput. 11, 4486–4494 (2015).
Article CAS PubMed Google Scholar
Borges-Araújo, L., Souza, PauloC. T., Fernandes, F. ábio & Melo, M. N. Improved parameterization of phosphatidylinositide lipid headgroups for the martini 3 coarse-grain force field. J. Chem. Theory Comput. 18, 357–373 (2021).
Article PubMed Google Scholar
Bocharov, E. V. et al. Spatial structure and ph-dependent conformational diversity of dimeric transmembrane domain of the receptor tyrosine kinase epha1. J. Biol. Chem. 283, 29385–29395 (2008).
Article CAS PubMed PubMed Central Google Scholar
Bocharov, E. V. et al. Alternative packing of egfr transmembrane domain suggests that protein–lipid interactions underlie signal conduction across membrane. Biochimica. et Biophysica. Acta (BBA) Biomembranes 1858, 1254–1261 (2016).
Article CAS PubMed Google Scholar
Hub, J. S., De Groot, B. L. & Spoel, Davidvander A free weighted histogram analysis implementation including robust error and autocorrelation estimates. J. Chem. Theory Comput. 6, 3713–3720 (2010).
Article CAS Google Scholar
Grudinin, S., Garkavenko, M. & Kazennov, A. Pepsi-SAXS: An adaptive method for rapid and accurate computation of small-angle X-ray scattering profiles. Acta Crystallogr. Section D Struct. Biol. 73, 449–464 (2017).
Article ADS CAS Google Scholar
Pesce, F. & Lindorff, K. Refining conformational ensembles of flexible proteins against small-angle x-ray scattering data. Biophys. J. 120, 5124–5135 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Hansen, S. Bayesian estimation of hyperparameters for indirect fourier transformation in small-angle scattering. J. Appl. Crystallogr. 33, 1415–1421 (2000).
Article ADS CAS Google Scholar
Larsen, H. A. & Pedersen, M. C. Experimental noise in small-angle scattering can be assessed using the bayesian indirect fourier transformation. J. Appl. Crystallogr. 54, 1281-1289 (2021).
Tesei, G.et al. DEER-PREdict: Software for efficient calculation of spin-labeling EPR and NMR data from conformational ensembles. PLoS Comput. Biol. 17, e1008551 2021.
Polyhach, Y., Bordignon, E. & Jeschke, G. Rotamer libraries of spin labelled cysteines for protein studies. Phys. Chem. Chem. Phys. 13, 2356–2366 (2011).
Article CAS PubMed Google Scholar
Iwahara, J., Schwieters, C. D. & Clore, G. M. Ensemble approach for NMR structure rrefinement against 1H paramagnetic relaxation enhancement data arising from a flexible paramagnetic group aattached to a macromolecule. J. Am. Chem. Soc. 126, 5879–5896 (2004).
Article CAS PubMed Google Scholar
Gillespie, J. R. & Shortle, D. Characterization of long-range structure in the denatured state of staphylococcal nuclease. J. Mol. Biol. 268, 170–184 (1997).
Article CAS PubMed Google Scholar
Battiste, J. L. & Wagner, G. Utilization of site-directed spin labeling and high-resolution heteronuclear nuclear magnetic resonance for global fold determination of large proteins with limited nuclear overhauser effect data. Biochemistry 39, 5355–5365 (2000).
Article CAS PubMed Google Scholar
Petoukhov, M. V., Konarev, P. V., Kikhney, A. G. & Svergun, D. I. ATSAS 2.1 towards automated and web-supported small-angle scattering data analysis. J. Appl. Crystallogr. 40, s223—-s228 (2007).
Article Google Scholar
Scherer, M. K. et al. PyEMMA 2: A software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 11, 5525–5542 (2015).
Article CAS PubMed Google Scholar
Michaud, N., Denning, E. J., Woolf, T. B. & Beckstein, O. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. 32, 2319–2327 (2011).
Article Google Scholar

Download references

Acknowledgements

We acknowledge the use of computational resources from Computerome 2.0, the ROBUST Resource for Biomolecular Simulations (supported by the Novo Nordisk Foundation grant no. NF18OC0032608), and the core facility for biocomputing at the Department of Biology. This research was supported by the Lundbeck Foundation BRAINSTRUC initiative (R155-2015-2666 to K.L.-L.) and the PRISM (Protein Interactions and Stability in Medicine and Genomics) centre funded by the Novo Nordisk Foundation (NNF18OC0033950, to K.L.-L.). SV and AK acknowledge support by the Swiss National Science Foundation through the National Center of Competence in Research Bio-Inspired Materials. This work was supported by grants from the Swiss National Supercomputing Centre (CSCS) under project ID s1176 and s1251. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 803952 to SV).

Author information

These authors contributed equally: F. Emil Thomasen, Tórur Skaalum.

Authors and Affiliations

Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200, Copenhagen N, Denmark
F. Emil Thomasen, Tórur Skaalum & Kresten Lindorff-Larsen
Department of Biology, University of Fribourg, Fribourg, Switzerland
Ashutosh Kumar, Sriraksha Srinivasan & Stefano Vanni
Swiss National Center for Competence in Research (NCCR) Bio-inspired Materials, University of Fribourg, Chemin des Verdiers 4, CH-1700, Fribourg, Switzerland
Ashutosh Kumar & Stefano Vanni

Authors

F. Emil Thomasen
View author publications
You can also search for this author in PubMed Google Scholar
Tórur Skaalum
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Sriraksha Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Vanni
View author publications
You can also search for this author in PubMed Google Scholar
Kresten Lindorff-Larsen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.E.T., S.V. and K.L.-L. conceived the overall study. F.E.T and T.S. performed and analysed simulations of proteins in water under the supervision of K.L.-L., and A.K. and S.S. performed and analysed simulations of proteins interacting with membranes under the supervision of S.V. F.E.T. wrote the first draft of the manuscript with input from K.L.-L. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to F. Emil Thomasen, Stefano Vanni or Kresten Lindorff-Larsen.

Ethics declarations

Competing interests

K.L.-L. holds stock options in and is a consultant for Peptone Ltd. All other authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Gül Zerze and the other, anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Thomasen, F.E., Skaalum, T., Kumar, A. et al. Rescaling protein-protein interactions improves Martini 3 for flexible proteins in solution. Nat Commun 15, 6645 (2024). https://doi.org/10.1038/s41467-024-50647-9

Download citation

Received: 29 May 2023
Accepted: 15 July 2024
Published: 05 August 2024
DOI: https://doi.org/10.1038/s41467-024-50647-9