Introduction

Intrinsically disordered proteins (IDPs), folded proteins with long disordered tails, and multidomain proteins with folded domains connected by flexible linkers, are characterized by their high level of conformational dynamics. Molecular dynamics (MD) simulations provide a valuable tool for studying IDPs and multidomain proteins, as they can be used to determine full conformational ensembles at atomic resolution1. However, there are two central challenges that must be overcome for MD simulations to provide a useful description of such systems: the force field describing all the bonded and non-bonded interactions between atoms in the system must be sufficiently accurate and the conformational space of the protein must be sufficiently sampled2.

One way to address the challenge of sufficient sampling is to use coarse-grained (CG) MD simulations in which groups of atoms are represented as single beads3. Martini is a widely used CG model in which 2–4 non-hydrogen atoms are represented by a single bead4,5. An attractive aspect of Martini is its modular structure and high degree of transferability, which allows the simulation of complex systems containing several different classes of biomolecules. The current version of Martini, Martini 3, shows improvements over previous versions in areas such as molecular packing, transmembrane helix interactions, protein aggregation, and DNA base pairing6.

We have previously shown that Martini 3 simulations of IDPs produce overly compact conformational ensembles, resulting in poor agreement with small-angle X-ray scattering (SAXS) and paramagnetic relaxation enhancement (PRE) experiments7. Using an approach inspired by previous work on assessing and rebalancing non-bonded interactions in Martini8,9,10,11,12,13,14,15,16,17 and atomistic force fields18, we found that agreement with SAXS and PRE data could be significantly improved by uniformly increasing the strength of non-bonded Lennard-Jones interactions between protein and water beads by ~ 10%7. This was also shown to be the case for three multidomain proteins, hnRNPA1, hisSUMO-hnRNPA1, and TIA1; however, due to the small sample size and the similarity between these three proteins, it remains an open question whether the approach generalizes to other multidomain proteins.

Our previous work was concerned with the properties of proteins in aqueous solution in the absence of other classes of biomolecules. Intuitively, increasing the strength of protein-water interactions should affect the affinity between proteins and other biomolecules. As a prototypical example, one would expect that increasing protein-water interactions would decrease the affinity of proteins for lipid membranes, since the interaction is tuned by the relative affinity of proteins for water versus the membrane environment. The extent to which our previously described force field modification affects protein-membrane interactions, however, remains unclear. There is increasing evidence that IDPs and intrinsically disordered regions (IDRs) play important physiological roles at lipid membranes19,20,21,22,23, and so it is important to better understand how force field changes that improve the description of IDPs in solution affect their interactions with membranes. In this context, it is important to note that unmodified Martini 3 has been quite successful at reproducing the specific membrane interactions for peripheral membrane proteins, as we have previously shown24,25.

For previous versions of Martini, problems with overestimated protein-protein interactions have been corrected either by increasing the strength of interactions between protein and water beads10,11,13,17 or by decreasing the strength of interactions between protein beads8,9,14. We hypothesize that for proteins in solution, the two force field corrections likely have similar effects, simply rebalancing the relative energies associated with hydration versus self-interaction. However, in the case of mixed systems, for example with proteins, water, and membranes, we might expect clearer differences between these approaches. For example, decreasing the strength of protein-protein interactions may better retain the affinity between proteins and other molecules as originally parameterized, while increased protein-water interactions may lower this affinity (Fig. 1). It remains an open question whether specificity in protein-membrane interactions is retained when protein-water interactions are increased, and whether rescaling protein-protein interactions provides equivalent or improved agreement with experimental observations, both in comparison with unmodified Martini 3 and Martini 3 with rescaled protein-water interactions. We note that Martini 3 has already been shown to provide good agreement with free energies of dimerization for transmembrane proteins6, so the major focus of this work is to rebalance the interactions of proteins in solution to improve the agreement with experiments.

Fig. 1: Expected effects of proposed force field modifications.
figure 1

Schematic overview showing the expected effects of rescaling protein-water and protein-protein interactions in Martini 3. Overestimated compactness of soluble IDPs and multidomain proteins and specific membrane interactions for peripheral membrane proteins have previously been reported7,24.

Here, we expand upon our previous work to address these questions. First, we have expanded the set of multidomain proteins to include 15 proteins for which SAXS data have previously been collected (Fig. 2). Using this five-times larger set of proteins, we show that, as was the case for IDPs, increasing the strength of protein-water interactions by 10% improves the agreement with SAXS data. We further show that decreasing the strength of non-bonded interactions between protein beads by 12% leads to a comparable improvement in agreement with SAXS and PRE data for IDPs and multidomain proteins in solution, while better preserving the specificity of protein-membrane interactions for peripheral membrane proteins. In contrast, we find that rescaling protein-protein interactions decreases the propensity of transmembrane helices to dimerize, whereas this propensity is mostly unchanged when rescaling protein-water interactions.

Fig. 2: Starting structures for simulations of multidomain proteins.
figure 2

Starting structures of multidomain proteins used for Martini simulations. See the Methods section for a description of the source of the structures and how they were assembled.

Results

Analysis of an expanded set of multidomain proteins

Previously, we tested Martini 3 using a set of three multidomain proteins, TIA1, hnRNPA1, and hisSUMO-hnRNPA1, for which SAXS data have been measured17,26. Given the similarity of the three proteins (all three are RNA-binding proteins, and two of them differ only by a hisSUMO-tag), we wished to expand the set of proteins with mixed regions of order and disorder to include a wider range of sizes and domain architectures. We searched the literature for such proteins with reported SAXS data and identified 12 proteins that we added to our set (Fig. 2): the tri-helix bundle of the m-domain and the C2 domain of myosin-binding protein C (MyBP-CMTHB-C2)27; the C5, C6, and C7 domains of myosin-binding protein C (MyBP-CC5-C6-C7)28; linear di-, tri-, and tetraubiquitin (Ubq2, Ubq3, Ubq4)29; the two fluorescent proteins mTurquoise2 and mNeonGreen connected by a linker region with the insertion of 0, 8, 16, 24, 32, or 48 GS repeats (mTurq-GSX-mNeon)30; and Galectin-3 (Gal-3)31. Apart from Gal-3, these proteins all contain at least two distinct folded domains, connected by linkers of different lengths and composition; three proteins (Gal-3, hnRNPA1, and hisSUMO-hnRNPA1) also contain an IDR attached to a folded domain. Collectively, we will refer to this set as multidomain proteins, though we note that Gal-3 only contains a single folded domain.

We have previously shown that Martini 3 produces conformational ensembles that are more compact than found experimentally for a set of 12 IDPs and for the three multidomain proteins TIA1, hnRNPA1, and hisSUMO-hnRNPA1, and that rescaling ϵ in the Lennard-Jones potential between all protein and water beads by a factor λPW = 1.10 resulted in more expanded ensembles that substantially improved the agreement with SAXS data7. Using our much larger set of multidomain proteins, we examined whether Martini 3 generally produces too compact conformational ensembles of multidomain proteins, and whether our modified force field with rescaled protein-water interactions would generalize to the expanded set of proteins. We ran Martini 3 simulations of the 12 new multidomain proteins with unmodified Martini 3 and with λPW = 1.10 and calculated SAXS intensities from the simulations. We found that, on average across the 15 proteins, increasing the strength of protein-water interactions by λPW = 1.10 substantially improved the direct agreement with the experimental SAXS data, as quantified by the reduced χ2, \({\chi }_{r}^{2}\) (Fig. 3). For only one of the 15 proteins, MyBP-CMTHB-C2, the modified force field gave rise to reduced agreement with the SAXS data. This result shows that our previously proposed modification of protein-water interactions in Martini 3, which was optimized to improve the global dimensions of IDPs, also provides a general improvement in the global dimensions of multidomain proteins.

Fig. 3: Agreement between simulations and SAXS data for multidomain proteins.
figure 3

Mean radii of gyration (Rg) calculated from simulations plotted against Rg determined from Guinier fits to SAXS data for (a) simulations with unmodified Martini 3, (b) simulations with protein-water interactions in Martini 3 rescaled by λPW = 1.10, and (c) simulations with protein-protein interactions in Martini 3 rescaled by λPP = 0.88. Error bars on the experimental Rg-values were determined using Atsas AUTORG and represent a fitting error from the Guinier fit plus a standard deviation over possible selections of the Guinier region. The diagonal is shown as a dashed line and a linear fit with intercept 0 weighted by experimental errors is shown as a solid line. Pearson correlation coefficients (rP) with standard error of the mean from bootstrapping with 9999 resamples are shown on the plots. d Reduced χ2 to experimental SAXS intensities given by SAXS intensities calculated from unmodified Martini 3 simulations (blue) and Martini 3 simulations with protein-water interactions rescaled by λPW = 1.10 (orange) or protein-protein interactions rescaled by λPP = 0.88 (green). Mean and standard error of the mean over all 15 proteins are shown on the plot. Note the logarithmic scale for \({\chi }_{r}^{2}\). e Representative conformation of TIA1 with an Rg corresponding to the average Rg in (blue) simulations with unmodified Martini 3, (orange) simulations with protein-water interactions in Martini 3 rescaled by λPW = 1.10, and (green) simulations with protein-protein interactions in Martini 3 rescaled by λPP = 0.88. Simulations of hnRNPA1, hisSUMO-hnRNPA1 and TIA1 with λPW = 1.10 were taken from Thomasen et al.7. Source data are provided as a Source Data file.

Rescaling protein-protein interactions

Inspired by previous work on earlier versions of the Martini force field8,9,14, we next examined whether rescaling protein-protein interactions instead of protein-water interactions would provide a similar or further improvement in the agreement with the experimental data. To do so, we ran Martini 3 simulations for the set of 12 IDPs with SAXS data available that we had studied previously7 and the new set of 15 multidomain proteins. In these simulations, we rescaled ϵ in the Lennard-Jones potential between all protein beads by a factor λPP. We scanned different values of this parameter, and found λPP = 0.88 to provide the best agreement with experiments (Supplementary Fig. 1). We found that this level of rescaling protein-protein interactions (λPP = 0.88) provides a comparable improvement in the agreement with the experimental data as rescaling protein-water interactions by λPW = 1.10 for both multidomain proteins (Fig. 3) and IDPs (Fig. 4).

Fig. 4: Agreement between simulations and SAXS or PRE data for IDPs.
figure 4

Mean radii of gyration (Rg) calculated from simulations plotted against Rg determined from Guinier fits to SAXS data for (a) simulations with unmodified Martini 3, (b) simulations with protein-water interactions in Martini 3 rescaled by λPW = 1.10, and (c) simulations with protein-protein interactions in Martini 3 rescaled by λPP = 0.88. Error bars on the experimental Rg-values were determined using Atsas AUTORG and represent a fitting error from the Guinier fit plus a standard deviation over possible selections of the Guinier region. The diagonal is shown as a dashed line and a linear fit with intercept 0 weighted by experimental errors is shown as a solid line. Pearson correlation coefficients (rP) with standard error of the mean from bootstrapping with 9999 resamples are shown on the plots. d Reduced χ2 to experimental SAXS intensities given by SAXS intensities calculated from unmodified Martini 3 simulations (blue) and Martini 3 simulations with protein-water interactions rescaled by λPW = 1.10 (orange) or protein-protein interactions rescaled by λPP = 0.88 (green). Mean and standard error of the mean over all 12 proteins are shown on the plot. Note the logarithmic scale for \({\chi }_{r}^{2}\). e Reduced χ2 to experimental PRE NMR data given by unmodified Martini 3 simulations (blue) and Martini 3 simulations with protein-water interactions rescaled by λPW = 1.10 (orange) or protein-protein interactions rescaled by λPP = 0.88 (green). Note the logarithmic scale for \({\chi }_{r}^{2}\). f Representative conformation of TauK25 with an Rg corresponding to the average Rg in (blue) simulations with unmodified Martini 3, (orange) simulations with protein-water interactions in Martini 3 rescaled by λPW = 1.10, and (green) simulations with protein-protein interactions in Martini 3 rescaled by λPP = 0.88. All simulations with λPW=1.10 were taken from Thomasen et al.7. Source data are provided as a Source Data file.

To further test the effect of rescaling protein-protein interactions by λPP = 0.88 and compare with the approach of rescaling protein-water interactions, we ran simulations of five IDPs with intramolecular PRE data available: the LCD of hnRNPA232, the LCD of FUS33, α-synuclein34, full-length tau (hTau40)35, and osteopontin (OPN)36, and calculated PRE data from the simulations (Fig. 4e). Again, λPP = 0.88 provided a comparable level of agreement with the PRE data as we previously found using λPW = 1.107. Specifically, the agreement with the PRE data improved for all proteins except the hnRNPA2 LCD (Fig. 4e).

To further characterize the symmetry between rescaling protein-water and protein-protein interactions, we compared the ensembles produced with the two rescaling approaches, and with unmodified Martini 3, using the distribution of Rg (Supplementary Fig. 2-3) and a principal component analysis (PCA) based on the pairwise distances between backbone beads (Supplementary Fig. 4-5 and Supplementary Table 1-2). This analysis confirmed that the two rescaling approaches produce ensembles which are highly similar to each other in comparison with unmodified Martini 3. We conclude that decreasing the strength of protein-protein interactions by λPP = 0.88 provides an equally good alternative to rescaling protein-water interactions for IDPs and multidomain proteins in solution.

Protein self-association in solution

The observation that multidomain proteins in solution are too compact in Martini 3 simulations suggests that interactions between folded protein domains may be overestimated, at least at the high effective concentration within a single chain. To explore this further, we examined the effect of rescaling protein-protein interactions on the interactions between folded proteins in trans. To this aim, we ran MD simulations of two protein systems that should undergo transient homodimerization, ubiquitin and villin HP36, which we also used in our previous work7. Ubiquitin self-associates with a Kd of 4.9 ± 0.3 mM based on NMR chemical shift perturbations37 and villin HP36 self-associates with a Kd > 1.5 mM based on NMR diffusion measurements38. We ran MD simulations of two copies of the proteins with λPP = 0.88 and calculated the fraction of the time that the proteins were bound (Fig. 5a). For both proteins λPP = 0.88 resulted in decreased self-association, and again we found that λPP = 0.88 gave comparable results to our previously published simulations with λPW = 1.107. Comparing the simulations with the expected fraction bound based on the experimentally determined Kd values, we found that ubiquitin self-association is likely slightly overestimated with unmodified Martini 3 and slightly underestimated with λPW = 1.10 and λPP = 0.88. For villin HP36, all three force fields gave rise to a fraction bound within the expected range. While the overestimated compaction of multidomain proteins suggest that interactions between folded domains may be too strong in Martini 3, our results on the self-association of ubiquitin and villin HP36 do not provide a clear indication that this is the case.

Fig. 5: Protein self-association in solution.
figure 5

a Fraction bound calculated from MD simulations of two copies of the folded proteins ubiquitin and villin HP36, and two copies of the IDPs α-synuclein, p15PAF, hTau40, and FUS LCD, with unmodified Martini 3 (blue), Martini 3 with protein-water interactions rescaled by λPW = 1.10 (orange) (taken from Thomasen et al.7), and Martini 3 with protein-protein interactions rescaled by λPP = 0.88 (green). Box plots show the results of 10 replica simulations. The bound fraction in agreement with Kd = 4.9 mM for ubiquitin self-association is shown as a dashed line37. The bound fraction in agreement with Kd > 1.5 mM for villin HP36 self-association is shown as a shaded gray area38. α-synuclein, p15PAF, and hTau40 should not self-associate under the given conditions based on PRE34,35 or SEC-MALLS39 data, while FUS LCD should transiently self-associate based on PRE data33. Boxplots show the first quartile, median, and third quartile; whiskers extend from the box to the farthest data point lying within 1.5 times the inter-quartile range from the box, and points outside the whiskers are shown individually. b Interchain PREs calculated from the simulations of two copies of FUS LCD from panel a and comparison with experimental PREs (black)33. PREs are shown for the three spin-label sites at residues 16, 86, and 142 marked with dashed black lines. The rotational correlation time, τc, was selected individually for each λ to minimize \({\chi }_{r}^{2}\). Error bars represent the standard error of the mean over 10 replica simulations. c \({\chi }_{r}^{2}\) between calculated and experimental PRE data for two copies of FUS LCD shown in panel (b). Source data are provided as a Source Data file.

To further investigate the effect of rescaling protein-protein interactions on protein self-association, we performed simulations of four IDP systems, which we also used in our previous work7. Specifically, we ran simulations with λPP = 0.88 of two copies of α-synuclein, hTau40, or p15PAF, which should not self-associate under the given conditions based on PRE34,35 or size-exclusion chromatography-multiangle laser-light scattering (SEC-MALLS) data39, as well as two copies of the FUS LCD, which should transiently interact under the given conditions based on PRE data33. We then calculated the fraction of time that the proteins were bound in the simulations (Fig. 5a). Again λPP = 0.88 gave comparable results to our previously published simulations with λPW = 1.10. The results show that unmodified Martini 3 overestimates the self-association of IDPs, and that both rescaling approaches result in lowered self-association and therefore better agreement with experiments. However, none of the force fields give rise to a clear distinction between the FUS LCD and the three IDPs which should not self-associate, suggesting that Martini 3 does not properly capture specificity in IDP-IDP interactions.

To investigate further how well specific interactions between copies of the FUS LCD were captured, we calculated intermolecular PRE data from our simulations for direct comparison with the experimental PRE data33 (Fig. 5b-c). Simulations with λPP = 0.88 and λPW = 1.10 produce similar PREs when calculated with the spin-labels at residue 16 and residue 142, but we observed some discrepancy between the two force fields for PREs calculated with the spin-label at residue 86. These differences could be due to a true difference between the force fields, but may also be due to lack of convergence on the protein-protein contacts, as the bound state is not very populated in the simulations. Both λPP = 0.88 and λPW = 1.10 show slight improvement over unmodified Martini 3 based on the \({\chi }_{r}^{2}\) to the experimental PRE data, but none of the force fields fully capture the variation in interactions across the sequence. For example, interactions with the N-terminal region seem to be underestimated with the rescaled force fields based on the PRE data with the spin-label at residue 16, while the interactions with the central region seem to be overestimated with the unmodified force field based on the PRE data with the spin-label at residue 86. The interpretation of the results is complicated by the fact that the rotational correlation time, τc, providing the best fit to the experimental data is lower for the unmodified force field (1 ns), than for λPW = 1.10 (8 ns) and λPP = 0.88 (9 ns), suggesting that the fit of τc is absorbing some of the true difference between the force fields. Overall, the comparison with intermolecular PRE data for the FUS LCD is consistent with an improvement in the overall strength of IDP-IDP interactions, but a remaining lack of interaction specificity with the rescaled force fields. The results also show that rescaling protein-protein interactions gives as good or better agreement with the intermolecular PRE data when compared with our previous approach of rescaling protein-water interactions.

Rescaling protein-water interactions for backbone beads only

While the overall agreement with SAXS experiments was improved for almost all proteins when rescaling protein-protein or protein-water interactions, some proteins were still too expanded or compact with respect to the experimental Rg, suggesting that some sequence-specific effects on compaction were not fully captured. We reasoned that sequence-specific effects on the ensemble properties would possibly be better captured if we rescaled only the interactions between the protein backbone and water; this approach could lead to the desired expansion of the proteins while retaining the interactions of the amino acid side chains as originally parameterized. We therefore performed simulations of our set of IDPs and multidomain proteins in which we rescaled ϵ in the Lennard-Jones potential between all protein backbone and water beads by a factor λPW-BB, scanning different values of this parameter, and found λPW-BB = 1.22 to provide the best agreement with experiments (Supplementary Fig. 1). However, the simulations of the IDPs and multidomain proteins with λPW-BB = 1.22 showed similar agreement with experiments as when rescaling all protein-water interactions or protein-protein interactions (Supplementary Fig. 67).

We also compared the ensembles produced with λPW-BB = 1.22 to the other force fields using Rg distributions (Supplementary Fig. 2-3) and analyses of the pairwise distances between backbone beads (Supplementary Fig. 4-5 and Supplementary Table 1-2). The results show that ensembles produced with λPW-BB = 1.22, λPW = 1.10, and λPP = 0.88 have a comparable level of similarity to unmodified Martini 3. The ensembles produced with λPW = 1.10 and λPP = 0.88 are slightly more similar to each other than to λPW-BB = 1.22, possibly due to the rebalancing of backbone versus side chain interactions in the λPW-BB = 1.22 force field. Given that rescaling of only protein backbone-water interactions did not show any substantial improvement in the agreement with the experimental data or higher similarity to unmodified Martini 3 with respect to the previous approaches, and that the strong interactions between the protein backbone and water may have undesirable effects on the behaviour of the hydration shell, we decided not to pursue this approach further.

Amino acid side chain analogues

We wished to further investigate the symmetry between rescaling protein-water and protein-protein interactions using simulations of oil/water partitioning, as this was a central approach in the original parameterization of non-bonded interactions in Martini. Inspired by the initial parameterization of Martini proteins, we performed simulations of the cyclohexane/water partitioning of amino acid side chain analogues5. As rescaling protein-protein interactions should not substantially affect the interactions of amino acids with water or cyclohexane, we ran a single set of simulations to represent both unmodified Martini 3 and λPP = 0.88, as well as a set of simulations with protein-water interactions rescaled by λPW = 1.10. We calculated the transfer free energy from cyclohexane to water, ΔGCHEX-W, from our simulations and compared them with experimentally determined ΔGCHEX-W-values (Supplementary Fig. 8-9)5,40. The results show that rescaling protein-water interactions by λPW = 1.10 slightly increases partitioning to the water phase, as would be expected, but the effect is small when compared with the overall discrepancy between simulation and experiment. The two rescaling approaches also provide comparable Pearson correlations with the experimental ΔGCHEX-W-values (rPearson = 0.92 ± 0.04 and rPearson = 0.94 ± 0.03 for λPP = 0.88 and λPW = 1.10 respectively). We conclude that the results from the oil/water partitioning simulations do not clearly favour one rescaling approach over the other. However, the results illustrate that changes in the non-bonded interactions which have a very modest effect on small molecule partitioning may have a much larger effect on protein-protein interactions and the ensemble properties of flexible proteins, highlighting the importance of a direct comparison with experiments that report on protein structure.

Simulations of the dimerization of side chain analogues have previously been used to shed light on similarities and differences across force fields41. We therefore also performed simulations of the self-association of Phe-Phe, Tyr-Phe, Tyr-Tyr, Lys-Asp, and Arg-Asp side chain analogues. Here λPP = 0.88 and λPW = 1.10 both result in a small decrease in self-association when compared with unmodified Martini 3 (Supplementary Fig. 10). The two rescaling approaches also give comparable free energy profiles along the center-of-mass (COM) distance, despite the fact that λPP = 0.88 results in a rebalancing of the Coulomb and Lennard-Jones potentials in the Lys-Asp and Arg-Asp interactions. Comparing with experimentally measured affinities shows that Martini 3 correctly ranks Arg-Asp interactions as stronger than Lys-Asp, and this behaviour is preserved with both λPP = 0.88 and λPW = 1.1042. The ranking of Tyr-Tyr, Tyr-Phe, and Phe-Phe is consistent with previous analyses of Martini41 and show Phe-Phe to be the strongest in all three versions of Martini 3. In contrast, previous analyses of experimental data on IDPs suggest that Tyr-Tyr interactions should be stronger than Tyr-Phe and Phe-Phe43,44, and measurements of vapour pressure show that benzene-phenol interactions are stronger than benzene-benzene interactions45. These results suggest that a rebalancing of aromatic-aromatic interactions in Martini 3 may be necessary to better capture sequence-specific effects in IDPs and multidomain proteins. Additionally, the self-association of Phe-Phe, Tyr-Phe, Lys-Asp, and Arg-Asp side chain analogues is slightly overestimated with all three versions of the force field when compared with the experimentally measured affinities.

Protein-membrane interactions

In the simulations described above, we found that the effects of increasing protein-water interactions or decreasing protein-protein interactions were very similar. We, however, hypothesized that these two force field modifications could have substantially different effects on systems in which proteins interact with other classes of molecules that are not protein or water. We expected that increased protein-water interactions would result in lower affinity for other molecules, which bind in competition with solvation, while decreased protein-protein interactions would not affect the affinity to the same extent, barring any effects of altering the conformational ensemble.

To examine the effect of rescaling the Lennard-Jones interaction parameters on the affinity of proteins for different biomolecules, we chose to investigate protein interactions with lipid membranes. We had two main motivations for this choice: first, protein-membrane interactions have been thoroughly characterized using Martini24,46,47; second, Martini has been particularly focused on lipid membranes and protein-membrane interactions since its early development days9,48,49.

We therefore performed simulations of peripheral proteins in the presence of lipid bilayers, using both unmodified Martini 3 and the two modified versions, λPP = 0.88 and λPW = 1.10, following a protocol we have previously described24. In short, we ran unbiased MD simulations starting with the protein at a minimum distance of 3 nm away from the bilayer. Over the course of the MD simulation, the proteins interact, often transiently and reversibly, with the membrane (Supplementary Fig. 11-12), and membrane binding was quantified as previously described24 based on defining bound states when the minimum distance was lower than or equal to 0.7 nm.

To characterize the effect of our rescaling protocol on a broad set of protein-membrane interactions, we selected a diverse set of proteins: (i) one negative control, hen egg-white lysozyme, which is highly soluble in water and is not expected to interact specifically with the membrane in the absence of negatively charged phospholipids50; (ii) three peripheral membrane proteins consisting of a single folded domain (Phospholipase2, Arf1 in its GTP-bound state, and the C2 domain of Lactadherin) for which we previously characterized the membrane-binding behaviour24; (iii) two membrane-binding multidomain proteins: PTEN (1–351), containing a N-terminal Phosphatase domain and C2 domain that are known to be sufficient for membrane binding, and the Talin FERM domain, which has multiple sub-domains (F0 to F3) and binds to membranes through specific phosphoinositol(4,5)phosphate (PIP2) binding sites present in its F2 and F3 subdomains51; (iv) two IDRs that have been characterized as membrane-binding regions: the N-terminal IDR of TRPV452 and a short C-terminal motif (CTM) of Complexin53. For the two IDRs, simulations in solution with both λPP = 0.88 and λPW = 1.10 result in expanded ensembles and a larger average value of Rg compared to unmodified Martini 3 (Supplementary Fig. 13).

As hypothesized, the different force field modifications have different effects on protein-membrane interactions (Fig. 6). In particular, we found that simulations with decreased protein-protein interactions (λPP = 0.88) provide a similar degree of protein-membrane interaction when compared with unmodified Martini-3. In contrast, simulations with an increased strength of protein-water interactions (λPW = 1.10) show significantly reduced membrane affinity and binding for all proteins, almost always leading to a complete lack of interactions between the protein and the lipid bilayer. Importantly, we observe a clear difference between lysozyme (as a non-interacting negative control) and all other proteins (as membrane binding) in our simulations with unmodified Martini 3 and λPP = 0.88, while this is not the case in our simulations with λPW = 1.10. Given that λPW = 1.10 and λPP = 0.88 provide a comparably good description of IDPs and multidomain proteins in solution, and that λPP = 0.88 more accurately retains the specificity and strength of protein-membrane interactions as originally parameterized in Martini 3, we suggest that λPP = 0.88 is overall a more robust and transferable modification to Martini 3.

Fig. 6: Protein-membrane interactions.
figure 6

MD simulations (four replicas, each 3 μs long) were performed for peripheral membrane proteins, multidomain proteins, and intrinsically disordered regions with appropriate membrane composition (see Methods for details). Simulations were performed with unmodified Martini 3 (blue), protein-water interactions in Martini 3 rescaled by λPW = 1.10 (orange), and protein-protein interactions in Martini 3 rescaled by λPP = 0.88 (green). For each system, the corresponding atomistic structure of the protein is shown on top. Boxplots show the first quartile, median, and third quartile; whiskers extend from the box to the farthest data point lying within 1.5 times the inter-quartile range from the box, and points outside the whiskers are shown individually. Source data are provided as a Source Data file.

Capturing effects of sequence changes

Having selected λPP = 0.88 as the preferred force field modification for proteins in solution, we next examined to what extent this force field could capture more subtle sequence effects in IDPs and multidomain proteins.

The λPP = 0.88 force field provides the same Pearson correlations between experimental and simulation Rg as unmodified Martini 3, initially suggesting that there is no improvement in capturing relative protein-specific differences in Rg (Figs. 3-4). To test this for a series of similar proteins with systematic differences in sequence and structure, we first selected the mTurq-GSX-mNeon proteins, for which the Rg should increase systematically with linker length. We calculated the Pearson correlation between simulation and experimental Rg-values for these proteins with the different force fields (Fig. 7a), and found that the simulations with unmodified Martini 3 only provide a small separation of the Rg-values as a function of linker length, and therefore give a Pearson correlation coefficient with a high degree of uncertainty based on bootstrapping (rPearson = 0.6 ± 0.6), while the simulations with rescaled interactions allow for a clearer separation of Rg as a function of linker length (rPearson = 0.9 ± 0.1 for both λPW = 1.10 and λPP= 0.88). This result suggests that rescaling protein-water or protein-protein interactions allows for a higher sensitivity of ensemble properties to subtle changes in protein sequence and structure, such as differences in interdomain linker length.

Fig. 7: Radii of gyration of mTurq-mNeon and hnRNPA1LCD variants.
figure 7

a Mean radii of gyration (Rg) calculated from simulations with unmodified Martini 3 (blue), Martini 3 with protein-water interactions rescaled by λPW = 1.10 (orange), and Martini 3 with protein-protein interactions rescaled by λPP = 0.88 (green) are plotted against Rg determined by SAXS for mTurquoise2 and mNeonGreen connected by a linker region with the insertion of 0, 8, 16, 24, 32, or 48 GS repeats30. Error bars on the experimental Rg-values were determined using Atsas AUTORG and represent a fitting error from the Guinier fit plus a standard deviation over possible selections of the Guinier region. b Rg calculated from simulations with unmodified Martini 3 (blue) and Martini 3 with protein-protein interactions rescaled by λPP = 0.92 (pink) or λPP = 0.88 (green) are plotted against Rg determined by SAXS for wild-type hnRNPA1LCD and six sequence variants with varied composition of charged and aromatic residues54. Error bars on the experimental Rg-values represent the fitting error of the molecular form factor model54,55. We show a zoom-in for each of the force fields along with the given Pearson correlation coefficient with standard error of the mean from bootstrapping with 9999 resamples. Source data are provided as a Source Data file.

To further investigate the ability of Martini 3 with rescaled protein-protein interactions to capture more subtle sequence effects in IDPs, we performed simulations of six variants of the LCD of hnRNPA1, which have varied composition of charged and aromatic residues while retaining the length of the wild-type sequence54, using unmodified Martini 3 and Martini 3 with λPP = 0.88. We also performed simulations with λPP = 0.92, as this provided the optimal agreement with SAXS data for wild-type hnRNPA1 LCD. We compared the Rg calculated from the simulations with Rg values measured by SAXS for the six variants and wild-type. As expected based on the results presented above, we found that unmodified Martini 3 substantially underestimates the Rg of all variants (Fig. 7b). While modifying protein-protein interactions by λPP = 0.88 gives the best results on average across all proteins we studied, it leads to a slight overestimation of the Rg for the wild-type and variants of the LCD from hnRNPA1 (Fig. 7b). If we instead select λPP = 0.92 as the value of λPP that gives the best result for the wild-type hnRNPA1 LCD (among the values that we examined) we—per construction—find a more accurate level of expansion across the variants. Equally important, we found that unmodified Martini 3 does not accurately capture the variation in Rg associated with the sequence variation (rPearson = –0.1 ± 0.5), while simulations with λPP = 0.92 and λPP = 0.88 result in a more accurate estimate of the effect of the sequence variation on the Rg values (rPearson = 0.7 ± 0.3 and rPearson = 0.9  ± 0.2 respectively). This result suggests that decreasing the strength of protein-protein interactions in Martini 3 improves the sensitivity of IDP ensemble properties to sequence variation.

Comparison with high-resolution ensembles

Next, we aimed to test the effect of our proposed force field modification by comparing our Martini 3 simulations with ensembles produced by higher resolution models. First, we compared our simulations of α-synuclein with extensive atomistic MD simulations produced with state-of-the-art force fields. We used an ensemble similarity metric based on dimensionality reduction of the pairwise RMSD between ensemble conformers56,57 to quantitatively compare our unmodified and λPP = 0.88 Martini 3 simulations with atomistic simulations performed with the Amber03ws and Amber99SB-disp force fields58. We note that both Amber03ws and Amber99SB-disp produce ensembles of α-synuclein which are slightly too compact when compared with Rg from SAXS59, while the ensemble from our Martini 3 simulation with λPP = 0.88 is more expanded, in excellent agreement with SAXS. In spite of this discrepancy, the ensemble comparison shows that rescaling protein-protein interactions in Martini 3 by λPP = 0.88 increases the similarity to the atomistic simulations with both Amber03ws and Amber99SB-disp (Table 1). Interestingly, the λPP = 0.88 Martini 3 simulation is more similar to both atomistic simulations than the atomistic simulations are to each other, suggesting that the agreement is within the expected variation between force fields.

Table 1 Jensen-Shannon divergence based on Cα RMSD between α-synuclein ensembles from Martini simulations and atomistic simulations

Next we wished to perform a similar test for a multidomain protein. We used the same approach to quantify the similarity between our unmodified and λPP = 0.88 Martini 3 simulations of hnRNPA1 with an ensemble that was generated based on data from double electron-electron resonance (DEER) electron paramagnetic resonance, PRE, and SAXS experiments60. Again, the comparison shows that the Martini 3 simulation with λPP = 0.88 is more similar to the experimentally derived atomistic ensemble (Table 2). The results from these two test cases suggest that our proposed force field modification of λPP = 0.88 also improves the agreement with higher resolution simulations and experimentally derived ensemble models.

Table 2 Jensen-Shannon divergence based on Cα RMSD between hnRNPA1 ensembles from Martini simulations and an integrative atomistic model

Protein self-association in the membrane

To test the effect of rescaling protein-protein and protein-water interactions on protein behaviour in a lipid membrane environment, we performed simulations of the homo-dimerization of the transmembrane domain of both EphA1 and ErbB1 from the receptor tyrosine kinase (RTK) domain family, which were used as test systems for Martini 36. RTKs are a well-studied protein class for protein-protein interactions in a membrane environment and, for both proteins, experimental free energies of association have been determined by Förster resonance energy transfer (FRET)61,62. Our results show that unmodified Martini 3 and Martini 3 with λPW = 1.10 produce comparable potentials of mean force (PMFs) (Fig. 8), resulting in overestimated ΔG of association by ~4 kJ/mol for EphA1 and reasonable agreement with the experimental ΔG for ErbB1, consistent with the results from Souza et al.6. Rescaling protein-protein interactions by λPP = 0.88 results in a complete loss of self-association as the PMF profiles becomes repulsive for both proteins (Fig. 8). These results suggest that, while unmodified Martini 3 and Martini 3 with λPW = 1.10 may slightly overestimate protein-protein interactions in the membrane environment, λPP = 0.88 results in a substantial underestimation of protein-protein interactions in the membrane, and is likely not a suitable force field modification for studying oligomerization of transmembrane protein systems.

Fig. 8: Transmembrane protein self-association.
figure 8

a, b Snapshots of simulation systems (left) and corresponding potential of mean force profiles for dimerization (right) for transmembrane domains of EphA1 and ErbB1. The two copies of the protein are shown in green and orange. Lipid headgroups are shown in gray and lipid tails are shown in yellow. Potential of mean force profiles for the transmembrane domains of EphA1 and ErbB1 were calculated from simulations with unmodified Martini 3 (blue), Martini 3 with protein-water interactions rescaled by λPW = 1.10 (orange), and Martini 3 with protein-protein interactions rescaled by λPP = 0.88 (green). Profiles were aligned to zero at the plateau region at r = 3.4 nm, indicated by the dashed black line. The dashed red line corresponds to the experimental values of association free energy (ΔG) from FRET experiments for EphA1 and ErbB1. The errors represents the standard deviation of four profiles calculated from 2 μs blocks. Source data are provided as a Source Data file.

Discussion

We have previously shown that simulations with Martini 3 underestimate the global dimensions of IDPs, and that increasing the strength of protein-water interactions by 10% results in more expanded ensembles and substantially improves the agreement with SAXS data7. Here, we expanded this approach to a set of 15 multidomain proteins for which SAXS data have been recorded. Our results show that Martini 3 on average provides too compact ensembles of these multidomain proteins, and that, as was the case for IDPs, rescaling protein-water interactions by 10% substantially improves the agreement with SAXS data. We also show that decreasing the strength of interactions between protein beads by 12% results in the same expansion of the ensembles and improved agreement with experiments. We also tested the effect of increasing the strength of interactions between only the protein backbone beads and water, but did not find that this provides any further improvement in the agreement with the experimental data. While the different rescaling approaches provide essentially the same results for proteins in solution, we show that rescaling protein-protein interactions is the preferable option in order to best retain the specificity and strength of protein-membrane interactions as originally parameterized in Martini 3. We note, however, that this change to the force field leads to decreased dimerization of proteins within a membrane environment, and a significant underestimation of free energies of dimerization. Therefore, we suggest that decreasing the strength of protein-protein interactions by 12% is suitable for systems with flexible proteins in solution and in proximity to membranes, but likely not for systems with specific protein-protein interactions in the membrane. An important outcome of our work is also the curation of a set of multidomain proteins with available SAXS data and starting structures for simulations, which can be used for future research in force field assessment and development.

One of the challenges when running Martini 3 simulations of multidomain proteins is selecting which regions to keep folded with the elastic network model and which regions to leave unrestrained. In this work, we manually selected the folded domains in the structures using domain annotations and intuition. It is, however, difficult to know a priori whether distinct domains should act as single structural modules due to specific interactions or move freely with respect to one another. Recently, it has been proposed to use the pairwise alignment error output from AlphaFold2 predictions to assign automatically the elastic network restraints63. In future work, this may provide a more accurate distinction between domains that should be relatively rigid or dynamic with respect to each other. Additionally, replacing the elastic network model with a more flexible structure-based model64 may provide the ability to sample both the bound and unbound state in cases where folded domains have specific interactions65. In stronger and more specific interdomain interactions, the resolution of Martini 3 may also play a more important role. For example, water-mediated hydrogen-bonding networks would not be captured with the 4–1 mapping of water beads. As most of the proteins presented in this work likely do not have very specific interactions between domains, the lack of structured water is presumably not an issue.

Although the simple approach of decreasing the strength of protein-protein interactions uniformly by 12% shows an improvement over unmodified Martini 3 in reproducing the global dimensions of IDPs and multidomain proteins, we note that the agreement with the SAXS data is still not perfect (\({\chi }_{r}^{2}\) > 1 in most cases), and there are systematic outliers with respect to the experimental Rg values. Although some of the system-specific deviations could potentially be alleviated by e.g. more accurately assigning and modeling the restraints on the folded domains, the overall deviation from the experimental data suggests that a more fundamental rebalancing of non-bonded interactions, and perhaps also CG mapping scheme, is necessary to describe the behaviour of IDPs and multidomain proteins within the Martini framework. Again, we suggest that the data we have collected here will be useful to test any such changes, and the results obtained with λPP = 0.88 are a useful point of reference for other force field modifications. The increased sensitivity to sequence perturbations observed for the hnRNPA1 LCD sequence variants and the series of mTurq-GSX-mNeon proteins also suggests that λPP = 0.88 could provide a good starting point for rebalancing protein interactions at the amino acid or bead level to improve the specificity in weaker protein-protein interactions.

For other types of systems, it has been suggested that the non-bonded interactions in Martini 3 must be rescaled to a different extent to reach agreement with experimental observations. For example, modifying protein-water interactions in Martini 3 affects the propensity of the disordered LCD of FUS to form condensates in a way that appears to depend on the salt concentration66, while the insertion of transmembrane helices into the phospholipid bilayer may require decreased protein-water interactions67. Additionally, unmodified Martini 3 has been shown to provide accurate free energies of dimerization for transmembrane proteins6. Our results show that this behaviour is preserved when rescaling protein-water interactions, whereas decreasing the strength of protein-protein interactions is likely not suitable for systems with specific protein self-association in the membrane. In light of these results, it seems that uniformly rescaling non-bonded interactions may not be able to provide a universally transferable protein model within the Martini framework, and that a more detailed rebalancing of interactions or CG mapping scheme is necessary. Future work could, for example, examine the combined effects of more modest rescaling of protein-protein and protein-water interactions, or focus on secondary-structure dependent force field parameters as recently proposed for another CG force field68.

Overall, however, our results demonstrate that for soluble proteins decreasing the non-bonded interactions between all protein beads by 12% leads to a more accurate balance of interactions while retaining the specificity of protein-membrane interactions. We foresee that our protocol will be a useful starting point to investigate the interactions of IDPs with lipid membranes using chemically transferable MD simulations, and that these investigations will further provide insights into possible strategies on future force field development efforts. Since CG simulations also play an important role in integrative structural biology1, we also expect that these developments will enable an even tighter link between simulations and experiments to study large and complex biomolecular assemblies.

Methods

IDP simulations

We performed MD simulations of a set of 12 IDPs with SAXS data available (Supplementary Table 4) and five IDPs with intramolecular PRE data available (Supplementary Table 5)7,44 using Gromacs 2020.369. We ran simulations with the Martini 3 force field6 with the well-depth, ϵ, in the Lennard-Jones potential between all protein beads rescaled by a factor λPP or with ϵ in the Lennard-Jones potential between all protein backbone and water beads rescaled by a factor λPW-BB. We generated CG structures using Martinize2 based on initial all-atom structures corresponding to the 95th percentile of the Rg-distributions from simulations in Tesei et al.44. Secondary structure and elastic network restraints were not assigned for IDPs. Structures were placed in a dodecahedral box using Gromacs editconf and solvated, with NaCl concentrations corresponding to the ionic strength used in SAXS or PRE experiments, using the Insane python script70. The systems were equilibrated for 10 ns with a 2 fs time step using the Velocity-Rescaling thermostat71 and Parinello-Rahman barostat72. Production simulations were run for 40 μs with a 20 fs time step using the Velocity-Rescaling thermostat71 and Parinello-Rahman barostat72. The simulation temperature was set to match the SAXS or PRE experiment, and the pressure was set to 1 bar. Non-bonded interactions were treated with the Verlet cut-off scheme. A cut-off of 1.1 nm was used for van der Waals interactions. A dielectric constant of 15 and cut-off of 1.1 nm were used for Coulomb interactions. Simulation frames were saved every 1 ns. Molecule breaks from crossing the periodic boundaries were treated with Gromacs trjconv using the flags: -pbc whole -center. Convergence of the simulations was assessed by block-error analysis73 of Rg calculated from simulation coordinates using the blocking code from: https://github.com/fpesceKU/BLOCKING. All CG trajectories were back-mapped to all-atom structures using a simplified version13 of the Backward algorithm74, in which simulation runs are excluded and the two energy minimization runs are shortened to 200 steps.

Multidomain protein structures

We performed MD simulations of a set of 15 multidomain proteins with SAXS data available (Supplementary Table 6). We built the initial structure of MyBP-CMTHB-C2 based on the NMR structure containing both domains (PDB: 5K6P)27. We built the structures of the linear polyubiquitin chains, Ubq2, Ubq3, and Ubq4, based on the crystal structure of the open conformation of Ubq2 (PDB: 2W9N)75. For Ubq3 and Ubq4, the linker regions between the original and extended structures were remodelled using Modeller76,77. We built the initial structure of Gal-3 based on the crystal structure of the folded C-terminal domain (PDB: 2NMO)78 and the IDR from the AlphaFold structure of full-length Gal3 (AF-P17931-F1)79,80. We built the structure of MyBP-CC5-C6-C7 based on the NMR structure of the C5 domain (PDB: 1GXE)81, and the AlphaFold structure of the full-length MyBP-C (AF-Q14896-F1)79,80. We inserted missing residues in the NMR structure of the C5 domain using Modeller76. For the mTurq-GSX-mNeon constructs, we used structures from Monte-Carlo simulations in30 as starting structures for our simulations. To validate the starting structures, we calculated the RMSD between the two fluorescent protein domains and corresponding crystal structures (mTurquoise2 (PDB: 4AR7)82 and mNeonGreen (PDB: 5LTR)83) using PyMOL align, which gave an RMSD of 0.2-0.3 Å.

Multidomain protein simulations

We ran MD simulations of the set of multidomain proteins using Gromacs 2020.369. We ran simulations with the Martini 3 force field6, as well as several modified versions of Martini 3 in which the well-depth, ϵ, in the Lennard-Jones potential between all protein and water beads was rescaled by a factor λPW, ϵ in the Lennard-Jones potential between all protein beads was rescaled by a factor λPP, or ϵ in the Lennard-Jones potential between all protein backbone and water beads was rescaled by a factor λPW-BB. We assigned secondary structure-specific potentials using DSSP84 and Martinize2. The secondary structure of all residues in linkers and IDRs were manually assigned to coil, turn, or bend. We applied an elastic network model using Martinize2 consisting of harmonic potentials with a force constant of 700 kJ mol-1 nm-2 between all backbone beads within a cut-off distance of 0.9 nm. We removed the elastic network potentials in all linkers and IDRs and between folded domains, so only the structures of individual folded domains were restrained (Supplementary Table 3). Dihedral and angle potentials between sidechain and backbone beads were assigned using the -scfix flag in Martinize2, but removed in all linkers and IDRs. Structures were placed in a dodecahedral box using Gromacs editconf and solvated, with NaCl concentrations corresponding to the ionic strength used in SAXS experiments, using the Insane python script70. The systems were equilibrated for 10 ns with a 2 fs time step using the Berendsen thermostat and Berendsen barostat85. Production simulations were run for at least 40 μs with a 20 fs time step using the Velocity-Rescaling thermostat71 and Parinello-Rahman barostat72. The simulation temperature was set to match the corresponding SAXS experiment and the pressure was set to 1 bar. Non-bonded interactions were treated with the Verlet cut-off scheme. A cut-off of 1.1 nm was used for van der Waals interactions. A dielectric constant of 15 and cut-off of 1.1 nm were used for Coulomb interactions. Simulation frames were saved every 1 ns. Molecule breaks from crossing the periodic boundaries were treated with Gromacs trjconv using the flags: -pbc whole -center. Convergence of the simulations was assessed by block-error analysis73 of Rg calculated from simulation coordinates using the blocking code from: https://github.com/fpesceKU/BLOCKING. All CG trajectories were back-mapped to all-atom structures using a simplified version13 of the Backward algorithm74, in which simulation runs are excluded and the two energy minimization runs are shortened to 200 steps.

Simulations of protein self-association in solution

We ran MD simulations of two copies of the two folded proteins ubiquitin and villin HP36, and the four IDPs FUSLCD, α-synuclein, hTau40, and p15PAF, as previously described7, using the Martini 3 force field6 with the well-depth, ϵ, in the Lennard-Jones potential between all protein beads rescaled by a factor λPP = 0.88. We used PDB ID 1UBQ86 and PDB ID 1VII87 as starting structures for ubiquitin and villin HP36, respectively. The simulations were set up and run using the same protocol as for IDP simulations. Two copies of ubiquitin, villin HP36, FUSLCD, α-synuclein, hTau40, and p15PAF were placed in cubic boxes with side lengths 14.92, 7.31, 40.5, 25.51, 48.02, and 34.15 nm giving protein concentrations of 1000, 8500, 50, 200, 30, and 83.4 μM respectively. NaCl concentrations and temperatures were set according to the corresponding experimental conditions (Supplementary Table 7-8). For ubiquitin and villin HP36 the following steps were also used in the simulation setup: (i) Secondary structure was assigned with DSSP84 in Martinize2. (ii) An elastic network model was applied with Martinize2. The elastic network restraints consisted of a harmonic potential with a force constant of 700 kJ mol-1 nm-2 between backbone beads within a 0.9 nm cut-off. For ubiquitin, we removed elastic restraints from the C-terminus (residues 72–76) to allow for flexibility88. (iv) Dihedral and angular potentials between side chains and backbone beads were added based on the initial structures with the -scfix flag in Martinize2. For ubiquitin, villin HP36, α-synuclein, and p15PAF we ran 10 replica simulations of 40 μs per replica. For hTau40 and FUSLCD, we ran 10 replica simulations of 13 μs and 25 μs per replica respectively.

We analyzed the population of the bound states in our simulations by calculating the minimum distance between beads in the two protein copies over the trajectory with Gromacs mindist. The fraction bound was defined as the fraction of frames where the minimum distance was below 0.8 nm. For ubiquitin and villin HP36, we calculated the expected fraction of bound protein at the concentrations in our simulations based on the respective Kd-values of 4.9 mM and 1.5 mM determined for self-association37,38. The bound fraction was calculated as

$${\phi }_{b}=\frac{4{C}_{p}+{K}_{d}-\sqrt{8{K}_{d}{C}_{p}+{{K}_{d}}^{2}}}{4{C}_{p}}$$
(1)

where ϕb is the bound fraction, Cp is the concentration of protein in the simulation box (using the average box volume over all simulation trajectories), and Kd is the dissociation constant.

Amino acid side chain analogue simulation parameters

The Martini 3 parameters for amino acid side chain analogues were produced based on the existing amino acid parameters by simply removing the backbone bead and any potentials or exclusions involving the backbone bead. For simulations of Arg-Asp side chain analogue self-association, the SC1 bead was also removed from Arg (leaving only the SC2 bead of type SQ3p) in order to best emulate the guanidine-acetate system used to measure the experimental affinity42.

Amino acid side chain analogue self-association simulations

We ran MD simulations of two copies of Tyr and Phe side chain analogues, as well as Tyr-Phe, Arg-Asp, and Lys-Asp side chain analogues using the Martini 3 force field6 either unmodified or with the well-depth, ϵ, in the Lennard-Jones potential between all protein and water beads rescaled by a factor λPW = 1.10 or ϵ in the Lennard-Jones potential between all protein beads rescaled by a factor λPP = 0.88. The simulations were set up and run using the same protocol as for IDP simulations. The two side chain analogues were placed in a cubic box with a side length of 5 nm for the Tyr-Tyr, Arg-Asp, and Lys-Asp systems and a side length of 10 nm for the Phe-Phe and Tyr-Phe systems. A NaCl concentration of 150 mM was used for Phe-Phe, Tyr-Phe, and Tyr-Tyr simulations. No NaCl was added in the Lys-Asp and Arg-Asp systems. The simulations were run for 100 μs each at 300 K.

We used a similar approach as in Souza et al.89 to determine the standard Gibbs free energy of self-association, ΔG0, from simulations. We calculated free energy profiles of self-association as:

$$\Delta G(r)=-RT\ln \left(p(r)\right)+2RT\ln \left(r\right)$$
(2)

where r is the COM distance between side chain analogues calculated with Gromacs distance, p(r) is the probability density of the COM distance, R is the gas constant, and T is the temperature. The second term is an entropic correction. We then determined the association constant, Ka, from the free energy profile using:

$${K}_{a}=\int_{{r}_{min }}^{{r}_{max}}4\pi {r}^{2}\exp \left(\frac{-\Delta G(r)}{RT}\right)dr$$
(3)

where \({r}_{\min }\) and \({r}_{\max }\) define the boundaries of the bound state along the free energy profile. \({r}_{\min }\) and \({r}_{\max }\) were selected for each simulation to encompass the first negative free-energy well of ΔG(r) as shown in Supplementary Fig. 10. We then determined ΔG0 as:

$$\Delta {G}^{0}=-RT\ln \left({K}_{a}{C}^{0}\right)$$
(4)

where C0 is a standard concentration of (1/1.66) nm−3. We also used eq. (4) to calculate ΔG0 from experimental values of Ka using instead C0 = 1 M. Ka-values of 0.4 M−1 for Phe-Phe (benzene-benzene) and 0.6 M−1 for Phe-Tyr (benzene-phenol) were obtained from Christian and Tucker45. Experimental Ka-values of 0.31 M−1 for Lys-Asp (butylammonium-acetate) and 0.37 M−1 for Arg-Asp (guanidine-acetate) were obtained from Springs and Haake42.

Amino acid side chain analogue cyclohexane/water partitioning simulations

We ran MD simulations of the cyclohexane/water partitioning of the uncharged amino acid side chain analogues for which experimental transfer free energies were taken from Radzicka and Wolfenden (as in Monticelli et al.)5,40 using unmodified Martini 3 and Martini 3 with λPW = 1.10. We prepared a simulation box with 716 copies of Martini 3 cyclohexane (CHEX) and water (W) respectively. For each partitioning simulation, we added a single copy of a side chain analogue. Simulations were set up and run using the same protocol as for IDP simulations. Each simulation was run for 100 μs at 300 K.

We centered the trajectories on the cyclohexane phase and calculated the normalized density of water, cyclohexane, and side chain analogue along the z-axis of the simulation box averaged over the simulation frames. We defined the boundaries between the water and cyclohexane phases as the cross-over point of their densities and determined the average density of the side chain analogue in the water and cyclohexane phase, ρW and ρCHEX, excluding the regions close to the phase boundaries (Supplementary Fig. 9). We then calculated the transfer free energy from cyclohexane to water as:

$$\Delta {G}_{{{{\rm{CHEX}}}}-{{{\rm{W}}}}}=RT\ln \left(\frac{{\rho }_{{{{\rm{CHEX}}}}}}{{\rho }_{{{{\rm{W}}}}}}\right)$$
(5)

where R is the gas constant and T is the temperature. The Pearson correlations with experimental transfer free energies were calculated using the pearsonr function in SciPy stats and standard errors were determined with bootstrapping using the bootstrap function in SciPy stats with 9999 resamples90.

hnRNPA1 LCD variant simulations

We ran MD simulations of a set of six variants of the hnRNPA1 LCD (-10R, -10R+10K, -12F+12Y, -6R+6K, +7F-7Y, +7K+12D) for which the Rg has previously been determined by SAXS experiments54. The variants contain substitutions to and from charged and aromatic residues, but have the same sequence length as the wild-type protein, and were selected to have a relatively large deviation in Rg from the wild-type; protein sequences can be found in the supporting information of Bremer et al.54. We ran MD simulations with unmodified Martini 3 and Martini 3 with ϵ in the Lennard-Jones potential between all protein beads rescaled by a factor λPP = 0.92 or λPP = 0.88. Simulations were set up using the same protocol as for the other IDPs described above. The systems were equilibrated for 10 ns with a 2 fs time step using the Berendsen thermostat and Berendsen barostat85. Production simulations were run for 100 μs with a 20 fs time step using the Velocity-Rescaling thermostat71 and Parinello-Rahman barostat72. Simulations were run with 150 mM NaCl at 298 K and 1 bar.

Peripheral membrane protein simulations

We performed MD simulations of one soluble protein as a negative control, three folded peripheral membrane proteins, two multidomain proteins, and two IDRs with lipid bilayers of different compositions (Supplementary Table 9). We ran simulations with the Martini 3 force field6, or with modified force fields in which ϵ in the Lennard-Jones potential between all protein beads were rescaled by a factor λPP = 0.88 or with ϵ in the Lennard-Jones potential between all protein and water beads rescaled by a factor λPW = 1.10.

Initial structures of proteins were obtained either from the RCSB database91 or from the AlphaFold protein structure database92. For Complexin CTM, we used ColabFold v1.5.293 to model the 16-residues long (ATGAFETVKGFFPFGK) disordered region. The N-terminal IDR of TRPV4 (residues 2–134) was taken from the full-length AlphaFold structure of TRPV4 (A0A1D5PXA5). Initial structure of the FERM domains in Talin (PDB:3IVF) had missing residues (134–172), which we modelled using Modeller76,77 via the Chimera interface94. CG structures of proteins were generated using Martinize2, with the DSSP84 flag to assign secondary structure. An elastic network was applied consisting of harmonic potentials with a force constant of 700 kJ mol-1 nm-2 between all backbone beads within a cut-off of 0.8 nm. We removed elastic network potentials between different domains and in linkers and in IDRs of multidomain proteins. Secondary structure and elastic network was not assigned to the two IDRs.

All the lipid bilayers, with initial lateral dimension of 20 nm × 20 nm, were generated using CHARMM-GUI Martini maker95, except in the systems where phosphoinositol-(4,5)-phosphate (PIP2) lipids were needed, which instead were generated using the Insane python script70. We used the parameters for SAP2_45 lipids96 to model PIP2 in the bilayer. The bilayers were then minimized and equilibrated following the 6-step equilibration protocol in CHARMM-GUI. To compute protein-membrane interactions, systems were generated as previously described24, with a minimum distance of 3 nm between any protein and lipid bead. Systems were then solvated and water was removed from the bilayer. In all cases, a water layer of 1.5 nm was kept at the top of the protein and bottom of the bilayer, which results in an initial box size of 20 nm × 20 nm in the x and y directions and variable length of the box in the z direction depending on the size of the protein. Systems were then neutralized and excess NaCl was added as described (Supplementary Table 9). Systems were first energy minimized using steepest descent algorithm after which a short MD run of 200 ps was performed with the protein backbone beads restrained. Production simulations (four replicas for each system) were run for 3 μs with a time step of 20 fs using velocity-rescale thermostat71 and Parrinello-Rahman barostat72.

We performed MD simulations of the two IDRs (Complexin CTM and TRPV4 IDR) in solution with unmodified Martini 3 and both of the modified versions of Martini 3. For these simulations, we took the CG structure and placed it in a cubic box using Gromacs editconf, and solvated with 150 mM NaCl. Then the system was minimized for 10,000 steps with the steepest descent algorithm, and a short equilibration run was performed with the Berendsen thermostat and Berendsen barostat85 with a time step of 2 fs. Production simulations were run for 10 μs with a 20 fs time-step using the Parrinello-Rahman barostat72 and velocity-rescaling thermostat71. All the simulations were performed with GROMACS 2021.569. The initial 100 ns of production run were discarded from all the trajectories for further analysis.

Simulations of transmembrane protein self-association

We performed MD simulations of the transmembrane domains of two protein dimers from the RTK family to calculate the free energy of association, ΔG, using the Martini 3 force field6 either unmodified or with the well-depth, ϵ, in the Lennard-Jones potential between all protein and water beads rescaled by a factor λPW = 1.10 or ϵ in the Lennard-Jones potential between all protein beads rescaled by a factor λPP = 0.88. Simulations were performed with Gromacs 2021.5. PDB 2K1L97 was used as the starting structure for EphA1. We used Charmm-GUI95 to embed the EphA1 dimer in a bilayer of 400 DLPC lipids and 0.5 M NaCl corresponding to the conditions in the reference experiment62, as in Javanainen et al.9. The system was equilibrated using the standard six-step protocol in Charmm-GUI. For ErbB1, the starting structure of the system, based on PDB 2M0B98, was taken from Souza et al.6. The system has 400 DLPC lipids and 0.15 M NaCl corresponding to the conditions in the reference experiment61. The system was equilibrated for 50 ns in the NPT ensemble with position restraints of 1000 kJ/(mol nm2) on both chains of the dimer.

For both systems, pulling simulations were run for 100 ns at a rate of 0.05 nm/ns. The 2D COM distance between the protein subunits, rCOM, was used as the reaction coordinate. Frames ranging from 0.6 nm – 3.4 nm with a spacing of 0.2 nm were extracted from the pulling simulation trajectories as umbrella sampling windows, to be consistent with previous work6. A spring constant of 400 kJ/(mol nm2) was applied as the umbrella potential in production runs. The temperature was maintained at 303 K separately for peptides, lipids, and solvents and semi-isotropic pressure coupling was applied at 1 bar. The production run was performed for 10 μs in each window using a time-step of 20 fs and the Gromacs WHAM tool99 was used to obtain the PMF. A correction term was added to the PMF obtained using WHAM to account for the entropic contribution before plotting the profiles.

$$PMF=PM{F}_{{{{\rm{WHAM}}}}}+RT\ln \left(r\right)$$
(6)

The error in PMF plots represents the standard deviation of 4 profiles calculated from 2 μs blocks, where the first 2 μs were discarded. All PMFs plateaued before rCOM = 3.4 nm and were aligned to zero at this value. ΔG values were estimated from the minimum of zero aligned PMFs for comparison with experimental values.

SAXS calculations

We extracted 20,000 evenly distributed frames from each back-mapped trajectory to calculate SAXS profiles using Pepsi-SAXS100. To avoid overfitting the parameters for the contrast of the hydration layer (δρ) and the displaced solvent (r0) by fitting them individually for each structure, we used the fixed values for these parameters determined in Pesce and Lindorff-Larsen101. We globally fitted the scale and constant background with least-squares regression weighted by the experimental errors using Scikit-learn102. To assess the agreement between the experimental SAXS profiles and those calculated from simulations, we calculated the \({\chi }_{r}^{2}\) between the ensemble-averaged calculated SAXS intensities (Icalc) and the experimental SAXS intensity (Iexp):

$${\chi }_{r}^{2}=\frac{1}{m}{\sum }_{q}^{m}{\left(\frac{{I}_{q}^{calc}-{I}_{q}^{exp}}{{\sigma }_{q}^{exp}}\right)}^{2}$$
(7)

where σexp is the error of the experimental SAXS intensity and m is the number of measured SAXS intensities. We used the Bayesian Indirect Fourier Transform algorithm (BIFT) to rescale the errors of the experimental SAXS intensities, in order to obtain a more consistent error estimate across the different proteins103,104.

PRE calulcations

We used the DEER-PREdict software105 to calculate intrachain PREs from the back-mapped trajectories of α-synuclein, FUSLCD,hnRNPA2LCD, OPN and hTau40 (Supplementary Table 5), and interchain PREs from the back-mapped trajectories of two copies of FUSLCD. DEER-PREDICT uses a rotamer library approach to model the MTSL spin-label106 and a model-free formalism to calculate the spectral density107. We assumed an effective correlation time of the spin label, τt, of 100 ps, a molecular correlation time, τc, of 4 ns108, a transverse relaxation rate for the diamagnetic protein of 10 s−1 and a total INEPT time of the HSQC measurement of 10 ms109. For the simulations of two copies of FUSLCD, τc was not fixed to 4 ns. We instead scanned values of τc from 1 – 20 ns in steps of 1 ns and selected the τc that minimized the \({\chi }_{r}^{2}\) to the experimental PRE data for each force field. The optimal values were 1 ns, 8 ns, and 9 ns for unmodified Martini 3, λPW = 1.10, and λPP = 0.88 respectively. The agreement between calculated and experimental PREs was assessed by calculating the \({\chi }_{r}^{2}\) over all spin-label positions,

$${\chi }_{r}^{2}=\frac{1}{{N}_{labels}{N}_{res}}{\sum }_{j}^{{N}_{labels}}{\sum }_{i}^{{N}_{res}}{\left(\frac{{Y}_{ij}^{exp}-{Y}_{ij}^{calc}}{{\sigma }_{ij}^{exp}}\right)}^{2}$$
(8)

where Nlabels and Nres are the number of spin-labels and residues, \({Y}_{ij}^{exp}\) and \({Y}_{ij}^{calc}\) are the experimental and calculated PRE rates for label j and residue i, and \({\sigma }_{ij}^{exp}\) is the experimental error of the PRE rate for label j and residue i. For the simulations of two copies of the FUSLCD, the \({\chi }_{r}^{2}\) was calculated as an average over the 10 replica simulations.

Radii of gyration

We calculated the Rg from CG simulation trajectories using Gromacs gyrate69 and calculated the error of the average Rg using block-error analysis73 (https://github.com/fpesceKU/BLOCKING). Experimental Rg-values and corresponding error bars were calculated from SAXS profiles by Guinier analysis using ATSAS AUTORG with default settings110, except in the case of the hnRNPA1LCD variants, for which we used the Rg-values reported in Bremer et al.54, which were determined from SAXS data using an empirical molecular form factor approach. Pearson correlation coefficients were calculated using the pearsonr function in SciPy stats and standard errors were determined with bootstrapping using the bootstrap function in SciPy stats with 9999 resamples90.

Principal component analysis

We used PCA based on the pairwise distances between backbone beads to compare our unmodified, λPW = 1.10, λPP = 0.88, and λPW-BB = 1.22 Martini 3 simulations of IDPs and multidomain proteins. PCA was performed with PyEMMA111. For each protein, all four ensembles were pooled for PCA in order to project into the same two principal components. For all IDPs except PRNNT and CoRNID, and for the multidomain proteins Gal-3, MyBP-CMTHB-C2, Ubq2, and Ubq3, the pairwise distances between all backbone beads were used as features for PCA. For the remaining proteins, the pairwise distances between every 5th backbone bead were used as features for PCA. To quantify the similarity between the ensembles, we calculated the Jensen-Shannon divergence between the probability distribution of the principal components produced with each pair of force fields using the jensenshannon function in SciPy90. We calculated the Jensen-Shannon divergence between all pairwise combinations of force fields for each protein, and then averaged these values across the set of proteins to quantify the overall similarity. For a given protein, we used the same set of bins to calculate the probability histogram of the principle components for all force field pairings.

Comparison with atomistic ensembles

We compared our unmodified Martini 3 and λPP = 0.88 Martini 3 simulations of α-synuclein with a 20 μs simulation with the Amber03ws force field and a 73 μs simulation with the Amber99SB-disp force field from Robustelli et al.58. Because of problems caused by interactions between periodic images of the protein in the originally published Amber99SB-disp simulation, we used the corrected version of the simulation also used in Ahmed et al.59. We also compared our unmodified Martini 3 and λPP = 0.88 Martini 3 simulations of hnRNPA1 with an atomistic ensemble from Ritsch et al.60 (Protein Ensemble Database PED00212). We used the dimensionality reduction ensemble similarity (DRES) approach in Encore56,57 implemented in MDAnalysis112 to quantify the ensemble similarity based on Cα RMSD. We used the all-atom back-mapped versions of our Martini simulations (described above). For hnRNPA1, we used every 10th frame from our Martini simulations for a total of 4001 frames per simulation and all structures from the atomistic ensemble. Different constructs of hnRNPA1 were used for our simulations and the experimentally restrained ensemble60, so the Encore DRES calculations were only performed for residues 2-258, which are identical in both constructs17,60. For α-synuclein, we used every 10th frame from each simulation for a total of 4001 frames per Martini simulation, 2998 frames from the Amber03ws simulation, and 2998 frames from the Amber99sb-disp simulation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.