Study of specific nanoenvironments containing α-helices in all-α and (α+β)+(α/β) proteins

doi:10.1371/journal.pone.0200018

. 2018 Jul 10;13(7):e0200018.

doi: 10.1371/journal.pone.0200018. eCollection 2018.

Study of specific nanoenvironments containing α-helices in all-α and (α+β)+(α/β) proteins

Ivan Mazoni¹, Luiz César Borro², José Gilberto Jardine³, Inácio Henrique Yano¹, José Augusto Salim⁴, Goran Neshich¹

Affiliations

¹ Embrapa Agricultural Informatics, Campinas, São Paulo, Brazil.
² Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil.
³ Embrapa Territorial Management, Campinas, São Paulo, Brazil.
⁴ Research Center on Biodiversity and Computing, University of São Paulo, São Paulo, São Paulo, Brazil.

PMID: 29990352
PMCID: PMC6039001
DOI: 10.1371/journal.pone.0200018

Study of specific nanoenvironments containing α-helices in all-α and (α+β)+(α/β) proteins

Ivan Mazoni et al. PLoS One. 2018.

. 2018 Jul 10;13(7):e0200018.

doi: 10.1371/journal.pone.0200018. eCollection 2018.

Authors

Ivan Mazoni¹, Luiz César Borro², José Gilberto Jardine³, Inácio Henrique Yano¹, José Augusto Salim⁴, Goran Neshich¹

Affiliations

¹ Embrapa Agricultural Informatics, Campinas, São Paulo, Brazil.
² Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil.
³ Embrapa Territorial Management, Campinas, São Paulo, Brazil.
⁴ Research Center on Biodiversity and Computing, University of São Paulo, São Paulo, São Paulo, Brazil.

PMID: 29990352
PMCID: PMC6039001
DOI: 10.1371/journal.pone.0200018

Abstract

Protein secondary structure elements (PSSEs) such as α-helices, β-strands, and turns are the primary building blocks of the tertiary protein structure. Our primary interest here is to reveal the characteristics of the nanoenvironment formed by both PSSEs and their surrounding amino acid residues (AARs), which might contribute to the general understanding of how proteins fold. The characteristics of such nanoenvironments must be specific to each secondary structure element, and we have set our goal here to gather the fullest possible description of the α-helical nanoenvironment. In general, this postulate (the existence of specific nanoenvironments for specific protein substructures/neighbourhoods/regions with distinct functionality) was already successfully explored and confirmed for some protein regions, such as protein-protein interfaces and enzyme catalytic sites. Consequently, PSSEs were the obvious next choice for additional work for further evidence showing that specific nanoenvironments (having characteristics fully describable by means of structural and physical chemical descriptors) do exist for the corresponding and determined intraprotein regions. The nanoenvironment of α-helices (nEoαH) is defined as any region of the protein where this secondary structure element type is detected. The nEoαH, therefore, includes not only the α-helix amino acid residues but also the residues immediately around the α-helix. The hypothesis that motivated this work is that it might in fact be possible to detect a postulated "signal" or "signature" that distinguishes the specific location of α-helices. This "signal" must be discernible by tracking differences in the values of physical, chemical, physicochemical, structural and geometric descriptors immediately before (or after) the PSSE from those in the region along the α-helices. The search for this specific nanoenvironment "signal" was made possible by aligning previously selected α-helices of equal length. Afterward, we calculated the average value, standard deviation and mean square error at each aligned residue position for each selected descriptor. We applied Student's t-test, the Kolmogorov-Smirnov test and MANOVA statistical tests to the dataset constructed as described above, and the results confirmed that the hypothesized "signal"/"signature" is both existing/identifiable and capable of distinguishing the presence of an α-helix inside the specific nanoenvironment, contextualized as a specific region within the whole protein. However, such conclusion might rarely be reached if only one descriptor is considered at a time. A more accurate signal with broader coverage is achieved only if one applies multivariate analysis, which means that several descriptors (usually approximately 10 descriptors) should be considered at the same time. To a limited extent (up to a maximum of 15% of cases), such conclusion is also possible with only a single descriptor, and the conclusion is also possible in general for up to 50-80% of cases when no less than 5 nonlinear descriptors are selected and considered. Using all the descriptors considered in this work, provided all assumptions about data characteristics for this analysis are met, multivariate analysis regularly reached a coverage and accuracy above 90%. Understanding how secondary structure elements are formed and maintained within a protein structure could enable a more detailed understanding of how proteins reach their final 3D structure and consequently, their function. Likewise, this knowledge may also improve the tools used to determine how good a structure is by means of comparing the "signal" around a selected PSSE with the one obtained from the best (resolution and quality wise) protein structures available.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1. An example of an α-helix (in a specific (α+β) protein) and its nanoenvironment: The synthetic gene encoded DcpS bound to the inhibitor DG157493 (3bl9.pdb) has fourteen α-helices, and each helix has its own nanoenvironment.
Highlighted inside the transparent spheres is an α-helix (ribbon, purple). The nanoenvironment includes the amino acid residues of the α-helix and the amino acid residues around the helix that are within reach of the probing sphere, whose radius was previously selected. The pre- and postregions (extension by 32 AARs each) are not shown here for the sake of clarity of the basic definition.

**Fig 2. The p-value of Student's t-test evaluation for a selected descriptor value along the “sliding window” for positionally aligned PSSE sequences.**
The coverage of the sequence containing a PSSE is from the N- to the C-terminal ends (± 32 AAR). The sequences includes the PSSE plus 32 residues before its N-terminal and 32 residues after its C-terminal. The “sliding window” size in this particular case is the same size as the selected PSSE length (12 AAR). Student’s t-test is used for each position of the sliding window. This test measures how much the data inside the “sliding window” differ from the data outside the windows. The p-values are shown along the y-axes. A p-value that approaches zero in any particular region means that within this region, the descriptor values differ from the values outside the region in a statistically significant manner. The arrows indicate the direction of movement for the “sliding window box” (shown here before, at and after the PSSE), and the solid arrow indicates the exact position of the N-terminal of the PSSE. Shadowed boxes indicate the size of the sliding window placed at three specific positions. The region with a p-value approximating zero coincides with the positional alignment of the α-helix that has the exact same size. The sharp invagination around AAR position 52 is not as representative (too short compared to the PSSE under investigation) as the one directly on top and over the whole analysed PSSE.

**Fig 3. Grouping of same-length α-helices using consensus definitions based on the PDB, DSSP and Stride classifications.**
There are four possible consensus groups. (A) PDB-DSSP-Stride: when the secondary structure element starts and finishes at the same corresponding amino acid residue location and hence, has the same length according to the PDB, DSSP and Stride definitions. (B), (C) and (D) when the secondary structure elements start but do NOT finish at the same amino acid residue, as defined by one of the three criteria used: PDB-DSSP, PDB-Stride, and DSSP-Stride definitions, respectively.

**Fig 4. Positional alignment of α-helices. For the example above, five all-α protein structures with α-helices of length = 12 AARs (H) were aligned.**
Structures B, D and E have a second α-helix with 12 amino acid residues (marked with an “h”–all in bold), and these structures were aligned too, as shown in the bottom three lines. To the left and the right of the PSSE N- and C-terminal ends, respectively, all positions are extended to the 32nd position. Some posts are filled with a “-”, meaning a gap. Those gaps were introduced at the corresponding spots due to a lack of occupation of those loci in selected proteins.

**Fig 5. The Protein Secondary Structure Sting Analyzer (PS3A) panels contain four types of plots.**
In the case shown, 987 α-helices that are 15 amino acid residues long were examined from the datamart in which we removed 70% of the redundancy at the whole protein sequence level, and all instances of α-helices were taken from both all-α and (α+β)+(α/β) proteins. The consensus definition used to determine the presence of an α-helical structure within proteins was the PDB-DSSP-Stride–the most rigorous one. The total number of such proteins is indicated in the Supporting Information in Figure B in S1 File. Plots produced by the PS3A software: A) XY plot for average values (± SEM) for the selected descriptor: electrostatic potential at the α-carbon atom (CA). Negative numbers along the x-axes indicate locations to the left of the N-terminal of the examined/central PSSE, and positive ones follow its C-terminal end. B) The degree of occupancy per AAR position or “reliability”, which is the estimate of how accurately the signal may be observed in A) above. This estimate is only based on how many amino acid residues are present at any location of the positional alignment of the PSSE. The maximum value (100% reliability) is assumed for the ensemble of studied samples along the PSSE. Outside the PSSE, the reliability is usually lower than 100%. C) The sequence logo presents which amino acid type is more frequently found at each positional alignment location–basically indicating the consensus sequence of the PSSE for a selected length (also shown at the bottom part of the logo). The amino acid position numbers (shown on the upper part of the plot) follow the same convention described for A) above. D) The ECDF curve shows how the descriptor average values inside the PSSE region are different from the corresponding values outside the selected PSSE. All of these plots (for each selected PSSE length, type of protein and redundancy level) may be accessed at https://www.ps3a.cbi.cnptia.embrapa.br.

Fig 6. Comparison of the average values of 8 descriptors, normalized (by inverse coefficient of variation) done by dividing the parameter values with the corresponding standard deviation, and calculated for regions inside (17 AAARs) and outside the PSSE.
The following descriptors are likely to show the postulated “signal” (the differences between the inside and outside descriptor values per position are higher than 1): 1. Hbmm, 15. Hbmm_WNADist, 29. Hbmm_WNASurf, 61. Number_Unused_Contact_WNADist, 62. Number_Unused_Contact_WNASurf, 63. Dihedral_Angle_PHI, 64. Dihedral_Angle_PSI, 66. Density. The two shadowed descriptors are expected to show differences, as these descriptors are basically part of the definition of the investigated PSSE.

Fig 7. Comparison of the average values of 34 descriptors, normalized (by inverse coefficient of variation) done by dividing the parameter values with corresponding standard deviation, and calculated for regions inside (17 AAARs) and outside the PSSE.
The following descriptors are likely to show the postulated “signal” (the differences between the inside and outside descriptor values per position are higher than 0.1 and lower than 1): 2. Hbmwm, 4. Hbms, 5. Hbmws, 9. Hbswws, 12. Disulfide, 13. Ch_attractive, 16.hbmwm_WNADist, 17. Hbmwwm_WNADist, 18. Hbms_WNADist, 19. Hbmws_WNADist, 20. Hbmwws_WNADist, 21. Hbss_WNADist, 23. Hbswws_WNADist, 26. Disulfide_WNADist, 27. ch_attractive_WNADist, 31. Hbmwwm_WNASurf, 32. Hbms_WNASurf, 34. Hbmwwm_WNASurf, 35. Hbss_WNASurf, 37. Hbswws_WNASurf, 40. Disulfide_WNASurf, 41. Ch_attractive_WNASurf, 43. Cross_Link_Order_CA, 45. Dihedral_Chi1, 50. Electrostatic_Potential_at_CA, 54. Electrostatic_Potential_at_CA_WNADist, 55. Electrostatic_Potential_at_LHA_WNADist, 56. Electrostatic_Potential_Average_WNADist, 57. Electrostatic_Potential_at_CA_WNASurf, 58. Electrostatic_Potential_at_LHA_WNASurf, 59. Electrostatic_Potential_Average_WNASurf, 65. Temperature_Factor_CA, 70. SC_Clash, 71. SC_Percent.

**Fig 8. Variation in the number of descriptors that passed both the normal distribution test and the no mutual correlation test for different helix sizes.**

**Fig 9. Representation of 42 different descriptors used for the MANOVA input and then filtered by a double test: A normal distribution of data and a lack of mutual correlation.**
The points, which are plotted for each helical size (x-axes), represent those descriptors used by MANOVA for that particular size. The 42 descriptors found on the y-axes are as follows: 1. Hbmm, 2. Hbmwm, 3. Hbms, 4. Hbmws, 5. Hbswws, 6. Disulfide, 7. Ch_attractive, 8. Hbmm_WNADist, 9. hbmwm_WNADist, 10. Hbmwwm_WNADist, 11. Hbms_WNADist, 12. Hbmws_WNADist, 13. Hbmwws_WNADist, 14. Hbss_WNADist, 15. Hbswws_WNADist, 16. Disulfide_WNADist, 17. ch_attractive_WNADist, 18. Hbmm_WNASurf, 19. Hbmwm_WNASurf, 20. Hbmwwm_WNASurf, 21. Hbms_WNASurf, 22. Hbss_WNASurf, 23. Hbswws_WNASurf, 24. Disulfide_WNASurf, 25. Ch_attractive_WNASurf, 26. Electrostatic_Potential_at_CA, 27. Electrostatic_Potential_at_CA_WNADist, 28. Electrostatic_Potential_at_LHA_WNADist, 29. Electrostatic_Potential_Average_WNADist, 30. Electrostatic_Potential_at_CA_WNASurf, 31. Electrostatic_Potential_at_LHA_WNASurf, 32. Electrostatic_Potential_Average_WNASurf, 33. Number_Unused_Contact_WNADist, 34. Number_Unused_Contact_WNASurf, 35. Cross_Link_Order_CA, 36. Dihedral_Chi1, 37. Dihedral_Angle_PHI, 38. Dihedral_Angle_PSI, 39. Temperature_Factor_CA, 40. Internal_CA_3, 41. Clash, 42. Percent. Finally, the three most frequently plotted descriptors are as follows (designated by the three horizontal dashed lines, from top to bottom): 1. Electrostatic_Potential_Average_WNASurf (order number: 32) ≥ (30, 65%), 2. Number_Unused_Contact_WNADist (order number: 33) ≥ (30, 67%) and 3. Hbms_WNASurf (order number: 21) ≥ (26, 58%).

Fig 10. Composite graphs showing the following: Descriptor variation along the regions before, at and after the analysed PSSE; the reliability value (or % of helical structure at each loci) and the p-value for the descriptor: Number of contacts, type “HBMM”. Data are drawn from the datamart containing PSSEs of length = 12 AARs; the consensus definition of a helix element is from “PDB-DSSP-Stride”, and the redundancy is 70% similarity at the sequence level.

**Fig 11. Differences in the variation behaviour of two selected descriptors around α-helices (solid lines) and β-strands (dotted lines).**
The plots above present the behaviour of the A) EP@Cα average values for 1811 α-helices in (α+β)+(α/β) proteins and 7773 β-strands in (α+β)+(α/β) proteins. B) HBMM_WNASurf average values for α-helices in (α+β)+(α/β) proteins and β-strands in (α+β)+(α/β) proteins. The average number of this contact type is higher in and around α-helices than in and around β-strands. As shown, there are clear differences in signal pattern in the cases presented in A and B.

**Fig 12. The very same PSSE defined in two protein structures: 1fw4.pdb, with a resolution of 1.7 Å (top sequence), and 1trc.pdb, with a resolution of 3.6 Å (bottom sequence).**
The 1FW4 structure has one extra α-helix. Both structures have identical AAR sequences. The region starting at residue #117 and ending at residue #127 was used in our experiment as this region has the most obvious discrepancy in SSE assignment.

**Fig 13. The superposition of two identical proteins whose structures were solved at two very different resolutions.**
The 1fw4.pdb (red ribbon) structure has a 1.7 Å resolution and 1trc.pdb (blue ribbon), 3.6 Å. Both structures have the very same amino acid sequence, but 1trc.pdb is an older structure, and its low resolution (3.6 Å) causes some errors in the α-helix definition and positioning. The region between 117 and 127 AAR, at the top right, demonstrate that in both cases there is a helical element there but the lower resolution structure does not have a corresponding assignment for it.

See this image and copyright information in PMC

Cited by

A comparison between internal protein nanoenvironments of α-helices and β-sheets.
Mazoni I, Salim JA, de Moraes FR, Borro L, Neshich G. Mazoni I, et al. PLoS One. 2020 Dec 30;15(12):e0244315. doi: 10.1371/journal.pone.0244315. eCollection 2020. PLoS One. 2020. PMID: 33378364 Free PMC article.

References

1. Crick FHC. On protein synthesis. 1958: p. 138–63. - PubMed
1. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973: p. 223–230. - PubMed
1. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000: p. 291–325. doi: 10.1146/annurev.biophys.29.1.291 - DOI - PubMed
1. Benjamin W, Sali A. Comparative protein structure modeling using Modeller. Current protocols in bioinformatics. 2014: p. 5–6. - PubMed
1. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006: p. 195–201. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

Funded by Embrapa Internal Financing.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

[1] Crick FHC. On protein synthesis. 1958: p. 138–63. - PubMed

[2] Crick FHC. On protein synthesis. 1958: p. 138–63. - PubMed

[3] Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973: p. 223–230. - PubMed

[4] Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973: p. 223–230. - PubMed

[5] Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000: p. 291–325. doi: 10.1146/annurev.biophys.29.1.291 - DOI - PubMed

[6] Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000: p. 291–325. doi: 10.1146/annurev.biophys.29.1.291 - DOI - PubMed

[7] Benjamin W, Sali A. Comparative protein structure modeling using Modeller. Current protocols in bioinformatics. 2014: p. 5–6. - PubMed

[8] Benjamin W, Sali A. Comparative protein structure modeling using Modeller. Current protocols in bioinformatics. 2014: p. 5–6. - PubMed

[9] Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006: p. 195–201. - PubMed

[10] Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006: p. 195–201. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Study of specific nanoenvironments containing α-helices in all-α and (α+β)+(α/β) proteins

Affiliations

Study of specific nanoenvironments containing α-helices in all-α and (α+β)+(α/β) proteins

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous