Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 25;19(14):4711-4727.
doi: 10.1021/acs.jctc.3c00224. Epub 2023 Jun 20.

Clustering Heterogeneous Conformational Ensembles of Intrinsically Disordered Proteins with t-Distributed Stochastic Neighbor Embedding

Affiliations

Clustering Heterogeneous Conformational Ensembles of Intrinsically Disordered Proteins with t-Distributed Stochastic Neighbor Embedding

Rajeswari Appadurai et al. J Chem Theory Comput. .

Abstract

Intrinsically disordered proteins (IDPs) populate a range of conformations that are best described by a heterogeneous ensemble. Grouping an IDP ensemble into "structurally similar" clusters for visualization, interpretation, and analysis purposes is a much-desired but formidable task, as the conformational space of IDPs is inherently high-dimensional and reduction techniques often result in ambiguous classifications. Here, we employ the t-distributed stochastic neighbor embedding (t-SNE) technique to generate homogeneous clusters of IDP conformations from the full heterogeneous ensemble. We illustrate the utility of t-SNE by clustering conformations of two disordered proteins, Aβ42, and α-synuclein, in their APO states and when bound to small molecule ligands. Our results shed light on ordered substates within disordered ensembles and provide structural and mechanistic insights into binding modes that confer specificity and affinity in IDP ligand binding. t-SNE projections preserve the local neighborhood information, provide interpretable visualizations of the conformational heterogeneity within each ensemble, and enable the quantification of cluster populations and their relative shifts upon ligand binding. Our approach provides a new framework for detailed investigations of the thermodynamics and kinetics of IDP ligand binding and will aid rational drug design for IDPs.

PubMed Disclaimer

Figures

FIG. 1:
FIG. 1:
Hyperparameter optimization based on integrated Silhouette score for the (a) APO, and (b) G5-bound ensembles of Aβ42. The t-SNE maps obtained with selected optimal (green and cyan squares) and sub-optimal (Red, Black, Pink, and Orange squares) values of the perplexity and number of clusters K are shown in (c) and (d) for APO and G5 bound ensembles. The maps illustrate how these parameters affect clustering efficiency. In t-SNE projections with sub-optimal parameter values that lead to too few clusters (Red and Pink squares), we observe clearly distinguishable groups of points merged into single cluster assignments. In t-SNE projections with sub-optimal parameter values that lead to too many clusters (Black and Orange squares), we observe indistinguishable groups of points merged into different cluster assignments.
FIG. 2:
FIG. 2:
t-SNE based conformational clustering of Aβ42 ensembles in the absence and presence of G5 in a and b respectively. The clusters are labeled with the average pairwise RMSD of Cartesian coordinates between snapshots. The cluster-wise population statistics is shown in Fig 2c and d.
FIG. 3:
FIG. 3:
Quantification of inter molecular interactions between Aβ42 and G5: a) Cluster-wise inter molecular contact probabilities. b) Boxplot illustrating the distribution of inter molecular binding energy across clusters as measured using MMPBSA analysis. Where, the box indicates the middle two quartiles of the distribution (25 percentile to 75 percentile) and the whiskers extend to include the rest of the data set except the outliers. Outliers are determined to be the points outside 1.5 times the inter-quartile region. The average of the distribution is shown by the line inside the box. c) shows the residue-wise decomposed energy contribution for the cluster that shows the most favorable binding (cluster no: 14)). The error bars represent 99% confidence interval of the estimated mean. The superposition of ten central conformations from this specific cluster is shown in (d) and the interacting residues are shown in stick representation in (e)
FIG. 4:
FIG. 4:
t-SNE based conformational clustering of (a) full-length α-synuclein (140 residues) and (b) a 20 residue C-terminal fragment of α-synuclein. The clusters are labeled with the average pairwise RMSD of Cartesian coordinates between snapshots
FIG. 5:
FIG. 5:
t-SNE based conformational clustering of APO (top), fasudil-bound (middle) and ligand47-bound (bottom) ensembles. The conformational subspace of the t-SNE projections is subdivided into 20 clusters (Fig 5a, 5d and 5g). The clusters of conformations are displayed in order of a decreasing bend angles (b,e and h). The distribution of the bend angle in each cluster is also shown as a box plot (in c,f, and i) for the apo, Fasudil-bound, and Ligand47-bound ensembles.
FIG. 6:
FIG. 6:
Per-residue intermolecular contact probabilities between (a) αSCterm and fasudil and (b) αSCterm and ligand 47. The clusters are sorted in the decreasing order of bend angle. Actual cluster indices are indicated in the alternate Y-axis in red. Figures 6c–h represent the correlations among the average bend angle, total aromatic stacking propensity, and dissociation constant, (KD), measured from individual clusters. The corresponding Pearson correlation coefficient is indicated within each plot. Representative snapshots from the top 5 clusters containing acutely bent hairpin-like conformations of Ligand bound αSCterm illustrating how the bent conformations orient the aromatic side chains of Tyr-125, Tyr-133, and Tyr-136 towards better stacking interaction with Fasudil (i) and Ligand 47 (j) that in turn lead to better inter-molecular affinity. The snapshots from left to right were taken from cluster numbers 4, 6, 19, 9, and 2 in the case of Fasudil-bound αSCterm (i) and cluster numbers 1, 7, 10, 8, and 11 in case of Ligand-bound αSCterm (j).

Similar articles

Cited by

References

    1. Ulmer Kevin M.. Protein engineering. Science, 219(4585):666–671, 1983. - PubMed
    1. Knowles Jeremy R..Tinkering with enzymes: What are we learning? Science, 236(4806):1252–1258, 1987. - PubMed
    1. Gellman Samuel H.. Introduction: molecular recognition. Chemical Reviews, 97(5):1231–1232, 1997. - PubMed
    1. Mobley David and Dill Ken. Binding of small-molecule ligands to proteins: “what you see” is not always “what you get”. Structure (London, England : 1993), 17:489–98, May 2009. - PMC - PubMed
    1. Boehr David, Nussinov Ruth, and Wright Peter. The role of conformational ensembles in biomolecular recognition. Nature chemical biology, 5:789–96, November 2009. - PMC - PubMed

LinkOut - more resources