Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 5;7(1):10480.
doi: 10.1038/s41598-017-09654-8.

Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology

Affiliations

Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology

Martino Bertoni et al. Sci Rep. .

Abstract

Cellular processes often depend on interactions between proteins and the formation of macromolecular complexes. The impairment of such interactions can lead to deregulation of pathways resulting in disease states, and it is hence crucial to gain insights into the nature of macromolecular assemblies. Detailed structural knowledge about complexes and protein-protein interactions is growing, but experimentally determined three-dimensional multimeric assemblies are outnumbered by complexes supported by non-structural experimental evidence. Here, we aim to fill this gap by modeling multimeric structures by homology, only using amino acid sequences to infer the stoichiometry and the overall structure of the assembly. We ask which properties of proteins within a family can assist in the prediction of correct quaternary structure. Specifically, we introduce a description of protein-protein interface conservation as a function of evolutionary distance to reduce the noise in deep multiple sequence alignments. We also define a distance measure to structurally compare homologous multimeric protein complexes. This allows us to hierarchically cluster protein structures and quantify the diversity of alternative biological assemblies known today. We find that a combination of conservation scores, structural clustering, and classical interface descriptors, can improve the selection of homologous protein templates leading to reliable models of protein complexes.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
PPI Fingerprint concept. (A) The idealized sequence space of fructose bisphosphate aldolase represented as a phylogenetic tree rooted on a specific sequence. In this family of proteins, we observe either dimeric (blue) or tetrameric quaternary structures (green). The red concentric circles represent the sequence identity thresholds used to calculate the interface conservation score (Cscore). (B) The PPI fingerprint curves of several homologs with dimeric (blue) or tetrameric (green) quaternary structures (standard error is used for the error area). The MSA is obtained running HHblits against the non-redundant (20% sequence identity) NCBI database with a threshold of 70% as minimum coverage. Considering the complete MSA (below 20% sequence identity threshold) the support for a conserved interface is stronger for dimers, while with more stringent threshold (50–60%) the tetrameric option has a stronger conservation signal.
Figure 2
Figure 2
PPI fingerprints of the proteins in the Duarte et al. dataset. 83 biological interfaces (bio) are shown in blue, 82 crystal contacts (xtal) in grey. We see how the conservation score (y-axis), computed on MSAs generated with different sequence identity inclusion thresholds (x-axis), is helping to discriminate between crystal contacts and biological relevant interfaces. Using an inclusive MSA (0–25% sequence identity thresholds) the two non-normal distributions overlap to a large extent (Mann-Whitney p-values between 8.12 × 10−7 and 3.82 × 10−8), while in the threshold range between 35–55% they are clearly separable (Mann-Whitney p-values between 7.47 × 10−11 and 4.56 × 10−13).
Figure 3
Figure 3
Heterogeneity of quaternary structures available in the Protein Data Bank (PDB). Assemblies from the PDB were clustered by sequence identity (90% sequence identity). All the assemblies within one sequence cluster were compared using QS-score. The resulting distance matrix was used to perform hierarchical clustering using different distance thresholds. With a distance threshold (x-axis) of 0 all assemblies are clustered together so that the fraction of sequence clusters (y-axis) having only one QS cluster is 100%. As the threshold is increased the structural heterogeneity of the sequence clusters is evident and the fraction of sequence clusters having multiple QS clusters (in shades of blue) increases.
Figure 4
Figure 4
Stoichiometry of 807 target proteins in the TARGET dataset. Homo-oligomers are represented in shades of red, while hetero-oligomers in shades of blue. In shades of gray are the heteromeric targets for which no template could be identified. Each wedge of the pie chart is annotated with the fraction of the total dataset for the most common stoichiometries.
Figure 5
Figure 5
QS-score distribution for all generated models compared to the native structure. For both, model with a correct (blue) or incorrect (yellow) stoichiometry, a sizable fraction of models have an interface different from the native one as they are based on a template having a different, i.e. incorrect quaternary structure.
Figure 6
Figure 6
Fraction of top scoring models in each quality category using different ranking criteria. The evaluation scheme “incorrect” (QS-score < 0.1), “low” (0.1 ≤ QS-score < 0.3), “medium” (0.3 ≤ QS-score < 0.7) and “high” (QS-score > 0.7) resembles the scheme used in CAPRI measures. Five ranking criteria are considered: a physics-based docking score (Docking Score), the co-evolution predicted contact agreement (Co-evolution Agreement), the naïve sequence identity (Seq.Id.), our SVM prediction (Pred. QS-score) and the hypothetical “perfect” ranking based on the QS-score distance from the native structure (QS-score). The fraction of validation target is computed for the ten different cross-validation iterations.
Figure 7
Figure 7
Comparison of model quality for three servers participating in CAMEO. The approach described in the current study (SWISS-MODEL Oligo) is compared to the classic SWISS-MODEL and Robetta servers. Common set of 111 homo-oligomeric models produced by all servers is compared to the native structure using two distance measures: QS-score (representing interface accuracy) and TM-score (representing global fold accuracy).
Figure 8
Figure 8
Quaternary structure analysis of H.volcanii fructose bisphosphate aldolase (FBA). (A) Structural clustering tree of H.volcanii FBA homologs with known structure. Each leaf is a template labeled with the PDB code and a bar indicating sequence identity and coverage (darker shades of blue refer to higher sequence identity). The decision tree follows the described levels of clustering: oligomeric state, stoichiometry (the topology of the complexes is also shown), and QS-score clustering. The green thread indicates templates with a predicted conserved QS. (B) The PPI fingerprint curves of the dimeric (green) and tetrameric (red) sets (the area plot spans between the 25th and 75th percentiles). The dimeric forms of FBA have a stronger interface conservation signal with respect to the tetrameric form. This stronger conservation is observable using different evolutionary distance thresholds, notably taking into account the entire MSA would not highlight a diverse conservation pattern.

Similar articles

Cited by

References

    1. Beck F, et al. Near-atomic resolution structural model of the yeast 26S proteasome. Proc Natl Acad Sci USA. 2012;109:14870–14875. doi: 10.1073/pnas.1213333109. - DOI - PMC - PubMed
    1. Itsathitphaisarn O, Wing RA, Eliason WK, Wang J, Steitz TA. The hexameric helicase DnaB adopts a nonplanar conformation during translocation. Cell. 2012;151:267–277. doi: 10.1016/j.cell.2012.09.014. - DOI - PMC - PubMed
    1. Lyu K, et al. Human enterovirus 71 uncoating captured at atomic resolution. J Virol. 2014;88:3114–3126. doi: 10.1128/JVI.03029-13. - DOI - PMC - PubMed
    1. Walhout AJ, Vidal M. High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods. 2001;24:297–306. doi: 10.1006/meth.2001.1190. - DOI - PubMed
    1. Terradot L, et al. Biochemical characterization of protein complexes from the Helicobacter pylori protein interaction map: strategies for complex formation and evidence for novel interactions within type IV secretion systems. Mol Cell Proteomics. 2004;3:809–819. doi: 10.1074/mcp.M400048-MCP200. - DOI - PubMed

Publication types

Substances