Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 22;19(16):5609-5620.
doi: 10.1021/acs.jctc.3c00190. Epub 2023 Jul 18.

SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins

Affiliations

SOURSOP: A Python Package for the Analysis of Simulations of Intrinsically Disordered Proteins

Jared M Lalmansingh et al. J Chem Theory Comput. .

Abstract

Conformational heterogeneity is a defining hallmark of intrinsically disordered proteins and protein regions (IDRs). The functions of IDRs and the emergent cellular phenotypes they control are associated with sequence-specific conformational ensembles. Simulations of conformational ensembles that are based on atomistic and coarse-grained models are routinely used to uncover the sequence-specific interactions that may contribute to IDR functions. These simulations are performed either independently or in conjunction with data from experiments. Functionally relevant features of IDRs can span a range of length scales. Extracting these features requires analysis routines that quantify a range of properties. Here, we describe a new analysis suite simulation analysis of unfolded regions of proteins (SOURSOP), an object-oriented and open-source toolkit designed for the analysis of simulated conformational ensembles of IDRs. SOURSOP implements several analysis routines motivated by principles in polymer physics, offering a unique collection of simple-to-use functions to characterize IDR ensembles. As an extendable framework, SOURSOP supports the development and implementation of new analysis routines that can be easily packaged and shared.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Architecture and example code for SOURSOP.
(A) Trajectory files are read into an SSTrajectory object. This object automatically parses each polypeptide chain into separate SSProtein objects. Each SSProtein object has a set of object-based analyses associated with them. Each trajectory must have between 1 and n protein chains in it. In addition, various stateless method-specific analysis modules exist for certain types of analysis. Additional stateless methods can be extended to allow new analysis routines to be incorporated in a way that does not alter the SSProtein or SSTrajectory code. (B) Example code illustrating how the apparent scaling exponent can be calculated from an ensemble.
Figure 2:
Figure 2:
Global conformational analysis of 10 disordered protein ensembles analyzed with SOURSOP. (A) The two-dimensional density plots for instantaneous asphericity (δ) and normalized dimensions (t) reveal a broad range of conformational landscapes. Ash1, p53, p27, NTL9, Notch, and A1-LCD are ensembles generated by Monte Carlo ensembles with the ABSINTH implicit solvent model. ACTR, drkN, NTail, and Asn (alpha synuclein) are ensembles generated by molecular dynamics simulations with Amber99-disp forcefield. Note that NTL9 is not an IDP, but the ensemble reported here represents an unfolded-state ensemble obtained under native conditions. (B) Normalized chain dimensions were calculated by normalizing the instantaneous radius of gyration from ensembles by the expected radius of gyration from a sequence-matched chain in the theta state, whereby chain-chain and chain-solvent interactions are counterbalanced ,,.
Figure 3:
Figure 3:
Local chain compaction with residue chemistry superimposed over the local radius of gyration (Rg). (A-J) Individual plots showing analysis for each protein ensemble as introduced in Figure 2. Local Rg is calculated using a 14-residue sliding window. Colored circles on each plot represent different amino acid chemistry groups, highlighted in the legend below panel I. (K) Pearson’s correlation coefficient between local Rg obtained for each windowed fragment reported in panels A-J and the amino acid chemistry within the window in question (see also Fig. S2). Specific sequence properties reported are the Fraction of Charged residues (FCR), absolute net charge per residue (|NCPR|), mean disorder score as predicted by metapredict (Disorder), fraction of proline residues (F. proline), mean predicted Local Distance Difference Test (pLDDT - a measure of predicted AlphaFold2 structure confidence), fraction of aliphatic residues (F. aliphatic), fraction of aromatic residues (F. aromatic), Kyte Doolitle hydrophobicity (hydrophobicity) and fraction of polar residues (F. polar). The fraction of charged residues (FCR) is the strongest positive determinant of expansion, closely followed by the absolute net charge per residue (|NCPR|). While polar residues, in principle, correlate as negative determinants of expansion, the negative correlation is driven by subregions deficient in charged residues and enriched in only polar residues.
Figure 4:
Figure 4:
Preferential attraction and repulsion quantified via scaling maps that report the normalized distance between every pair of residues in the protein. (A-J) Individual plots of analysis for each protein ensemble as introduced in Figure 2. Normalized distances are calculated by dividing ensemble-average inter-residue distance by the distance obtained for the EV model. Attractive interactions emerge as darker colors, while repulsive interactions are lighter. Along the diagonal, subsets of residues are colored using the same color scheme used in Fig. 3.
Figure 5:
Figure 5:
Comparison of changes in local and global dimensions for wildtype vs. phosphomimetic versions of p53. (A) Scaling maps where inter-residue distances for the phosphomimetic version of p53 N-terminal domain (p53) are normalized by distances for the wild-type protein. Despite differing by only three residues in the N-terminal quarter of the protein, the phosphomimetic version of p53 shows substantial differences in long-range and local dimensions, as shown by the emergence of both attractive (blue) and repulsive (red) interactions. (B) Despite these rearrangements, a relatively small change in overall global dimensions is observed. While the wildtype ensemble-average Rg is 29.4 Å, the phosphomimetic variant is 29.1 Å, a difference below the statistical detection limits for most experimental techniques.
Figure 6:
Figure 6:
Normalized local solvent-accessible surface area (SASA) using an eight-residue sliding window and a 10 Å probe size. Normalization is done using excluded volume (EV) reference simulations to account for side-chain-dependent differences in solvent accessibility. Amino acid residues are colored as in Fig 3. Distinct patterns of accessibility are observed across different proteins, indicating long- and short-range intramolecular interactions can influence the accessibility of local binding sites.

Update of

Similar articles

Cited by

References

    1. van der Lee R; Buljan M; Lang B; Weatheritt RJ; Daughdrill GW; Dunker AK; Fuxreiter M; Gough J; Gsponer J; Jones DT; Kim PM; Kriwacki RW; Oldfield CJ; Pappu RV; Tompa P; Uversky VN; Wright PE; Babu MM Classification of Intrinsically Disordered Regions and Proteins. Chem. Rev 2014, 114 (13), 6589–6631. - PMC - PubMed
    1. Sigler PB Acid Blobs & Negative Noodles. Nature 1988, 333, 210–212. - PubMed
    1. Ptitsyn OB; Uversky VN The Molten Globule Is a Third Thermodynamical State of Protein Molecules. FEBS Lett. 1994, 341 (1), 15–18. - PubMed
    1. Wright PE; Dyson HJ Intrinsically Unstructured Proteins: Re-Assessing the Protein Structure-Function Paradigm. J. Mol. Biol 1999, 293 (2), 321–331. - PubMed
    1. Babu MM; Kriwacki RW; Pappu RV Structural Biology. Versatility from Protein Disorder. Science 2012, 337 (6101), 1460–1461. - PubMed

Substances

LinkOut - more resources