Positions of cysteine residues reveal local clusters and hidden relationships to Sequons and Transmembrane domains in Human proteins

doi:10.1038/s41598-024-77056-8

. 2024 Oct 29;14(1):25886.

doi: 10.1038/s41598-024-77056-8.

Positions of cysteine residues reveal local clusters and hidden relationships to Sequons and Transmembrane domains in Human proteins

Manthan Desai^{1

2}, Bingyun Sun^{3

4}

Affiliations

¹ Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
² Department of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
³ Department of Chemistry, Simon Fraser University, Burnaby, BC, Canada. bingyun_sun@sfu.ca.
⁴ Simon Fraser University, Burnaby, BC, V5A 1S6, Canada. bingyun_sun@sfu.ca.

PMID: 39468182
PMCID: PMC11519667
DOI: 10.1038/s41598-024-77056-8

Positions of cysteine residues reveal local clusters and hidden relationships to Sequons and Transmembrane domains in Human proteins

Manthan Desai et al. Sci Rep. 2024.

. 2024 Oct 29;14(1):25886.

doi: 10.1038/s41598-024-77056-8.

Authors

Manthan Desai^{1

2}, Bingyun Sun^{3

4}

Affiliations

¹ Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
² Department of Computing Science, Simon Fraser University, Burnaby, BC, Canada.
³ Department of Chemistry, Simon Fraser University, Burnaby, BC, Canada. bingyun_sun@sfu.ca.
⁴ Simon Fraser University, Burnaby, BC, V5A 1S6, Canada. bingyun_sun@sfu.ca.

PMID: 39468182
PMCID: PMC11519667
DOI: 10.1038/s41598-024-77056-8

Abstract

Membrane proteins often possess critical structural features, such as transmembrane domains (TMs), N-glycosylation, and disulfide bonds (SS bonds), which are essential to their structure and function. Here, we extend the study of the motifs carrying N-glycosylation, i.e. the sequons, and the Cys residues supporting the SS bonds, to the whole human proteome with a particular focus on the Cys positions in human proteins with respect to those of sequons and TMs. As the least abundant amino acid residue in protein sequences, the positions of Cys residues in proteins are not random but rather selected through evolution. We discovered that the frequency of Cys residues in proteins is length dependent, and the frequency of CC gaps formed between adjacent Cys residues can be used as a classifier to distinguish proteins with special structures and functions, such as keratin-associated proteins (KAPs), extracellular proteins with EGF-like domains, and nuclear proteins with zinc finger C2H2 domains. Most importantly, by comparing the positions of Cys residues to those of sequons and TMs, we discovered that these structural features can form dense clusters in highly repeated and mutually exclusive modalities in protein sequences. The evolutionary advantages of such complementarity among the three structural features are discussed, particularly in light of structural dynamics in proteins that are lacking from computational predictions. The discoveries made here highlight the sequence-structure-function axis in biological organisms that can be utilized in future protein engineering toward synthetic biology.

Keywords: Cysteine residues; Disulfide bonds; N-glycosylation; Posttranslational modifications; Protein sequence; Protein structure and function; Sequons; Transmembrane domains.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Protein length distribution of cysteine-containing proteins (A) and cysteine-free proteins (B) in the human proteome, as well as (C) the UniProt keyword enrichment analysis of the cysteine-free proteins, in which the x axis displays the -log10 transformed enrichment p value.

**Fig. 2**
Count and density analysis of Cys residues in human proteins. (A) Distribution of the average fragment length of a Cys residue in a protein with a bin width of 10. The open bars represent proteins with average fragment lengths shorter than 60 residues, and the filled bars represent the remaining cysteine-containing proteins. (B) Distribution of Cys counts in proteins with a bin size of 5 counts. The filled bars indicate proteins with fewer than 15 Cys residues (C-low proteins), the rest are C-high proteins. The pie chart displays the proportions and the percentages of C-free, C-low and C-high proteins in the human proteome. (C) Feature counts and average protein length in the first 10 bins in Fig. 2 A. The analyzed features included cysteine residues, annotated disulfide bonds, sequons, and predicted transmembrane domains. (D) Expanded view of Fig. 2 B with the bin size of a single count of Cys residues.

**Fig. 3**
Interpro-annotated SCO-spondin domains are shown, and the red boxes highlight the regions containing free cysteine residues. The length of each highlighted region is labeled above the corresponding box. Only 2 out of 357 free cysteine residues are outside of these boxed regions.

**Fig. 4**
Distribution of CC gap frequency and the corresponding predicted structures for selected C-dense and C-rich proteins. A & B, Keratin-associated protein 28 − 5 (KAP); C & D, Metallothionein-1B (MT); E & F, Late cornified envelope protein 2 A (LCE); G & H, SCO-spondin.

**Fig. 5**
Distributions of gap/loop lengths among adjacent cysteine residues and disulfide (SS) bonds. A, Distance between two cysteine residues forming an SS bond (SS loop); B, distance between two adjacent SS bonds (SS gap); C, absolute SS gap; D, distance between two adjacent cysteine residues (CC gap).

**Fig. 6**
Clustering of high-C proteins (open bars in Fig. 2B) and their corresponding frequencies of CC gaps. (A) Comparison of nonsupervised hierarchical clustering with CC gaps progressively reduced from a gap length of 100 to 10 benchmarked by the EGF-like domain highlighted by the white circle. Quantitative evaluation of the clustering efficiency is represented by the pie chart above each corresponding heatmap. (B) 3D principal component analysis of the CC gap frequency considering 25 CC gaps. Cluster 1 includes Gaps 1 and 4; Cluster 2 includes Gaps 3 and 25; Cluster 3 includes Gaps 2, 5–7, and 9; and Cluster 4 includes the remaining gaps. (C) The corresponding 2D PCA of proteins in the analysis of 25 CC gaps. Green indicates proteins with zinc finger C2H2 domains, blue indicates proteins with EGF-like domains, red indicates KAP proteins, and gray indicates the remaining proteins.

**Fig. 7**
Comparisons of sequon and transmembrane domains (TMs) for verification of the observations of cysteine residues in terms of the lengths of fragments (A & B), gaps (E and G), and loops (F), corresponding feature counts (C & D) and enriched protein functions (H). The height of the bar denotes the population percentage of the specified proteins relative to the total proteins analyzed. KAP, keratin-associated protein; Ig, immunoglobulin; FT III, fibronectin type III; LRR, leucine-rich repeat; MFS, major facilitator superfamily.

See this image and copyright information in PMC

References

1. Bakshi, T., et al., Hidden Relationships between N-Glycosylation and Disulfide Bonds in Individual Proteins. Int J Mol Sci, 2022. 23(7). - PMC - PubMed
1. Desai, M., et al., Discovery and Visualization of the Hidden Relationships among N-Glycosylation, Disulfide Bonds, and Membrane Topology. Int J Mol Sci, 2023. 24(22). - PMC - PubMed
1. Petersen, M.T., P.H. Jonson, and S.B. Petersen, Amino acid neighbours and detailed conformational analysis of cysteines in proteins. Protein Eng, 1999. 12(7): p. 535 − 48. - PubMed
1. Gupta, R. and S. Brunak, Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput, 2002: p. 310 − 22. - PubMed
1. Pakhrin, S.C., et al., DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction. Molecules, 2021. 26(23). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Bakshi, T., et al., Hidden Relationships between N-Glycosylation and Disulfide Bonds in Individual Proteins. Int J Mol Sci, 2022. 23(7). - PMC - PubMed

[2] Bakshi, T., et al., Hidden Relationships between N-Glycosylation and Disulfide Bonds in Individual Proteins. Int J Mol Sci, 2022. 23(7). - PMC - PubMed

[3] Desai, M., et al., Discovery and Visualization of the Hidden Relationships among N-Glycosylation, Disulfide Bonds, and Membrane Topology. Int J Mol Sci, 2023. 24(22). - PMC - PubMed

[4] Desai, M., et al., Discovery and Visualization of the Hidden Relationships among N-Glycosylation, Disulfide Bonds, and Membrane Topology. Int J Mol Sci, 2023. 24(22). - PMC - PubMed

[5] Petersen, M.T., P.H. Jonson, and S.B. Petersen, Amino acid neighbours and detailed conformational analysis of cysteines in proteins. Protein Eng, 1999. 12(7): p. 535 − 48. - PubMed

[6] Petersen, M.T., P.H. Jonson, and S.B. Petersen, Amino acid neighbours and detailed conformational analysis of cysteines in proteins. Protein Eng, 1999. 12(7): p. 535 − 48. - PubMed

[7] Gupta, R. and S. Brunak, Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput, 2002: p. 310 − 22. - PubMed

[8] Gupta, R. and S. Brunak, Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput, 2002: p. 310 − 22. - PubMed

[9] Pakhrin, S.C., et al., DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction. Molecules, 2021. 26(23). - PMC - PubMed

[10] Pakhrin, S.C., et al., DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction. Molecules, 2021. 26(23). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Positions of cysteine residues reveal local clusters and hidden relationships to Sequons and Transmembrane domains in Human proteins

Affiliations

Positions of cysteine residues reveal local clusters and hidden relationships to Sequons and Transmembrane domains in Human proteins

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials