FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids

doi:10.1186/1471-2105-15-54

. 2014 Feb 24:15:54.

doi: 10.1186/1471-2105-15-54.

FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids

Pawel Gasior, Malgorzata Kotulska¹

Affiliations

PMID: 24564523
PMCID: PMC3941796
DOI: 10.1186/1471-2105-15-54

FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids

Pawel Gasior et al. BMC Bioinformatics. 2014.

. 2014 Feb 24:15:54.

doi: 10.1186/1471-2105-15-54.

Authors

Pawel Gasior, Malgorzata Kotulska¹

Affiliation

¹ Institute of Biomedical Engineering and Instrumentation, Wroclaw University of Technology, 50-370 Wroclaw, Poland. malgorzata.kotulska@pwr.wroc.pl.

PMID: 24564523
PMCID: PMC3941796
DOI: 10.1186/1471-2105-15-54

Abstract

Background: Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer's and Parkinson's diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties.

Results: Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences.

Conclusions: We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics.

PubMed Disclaimer

Figures

**Figure 1**
**Construction of the co-occurrence matrix.** Construction of the co-occurrence matrix (for the simplicity windows are of length 4, and 3 sub-matrices are generated in each direction of the general matrix). Coordinates of the general matrix (large numbers) represent the location of aminoacids in the sequences. Each aminoacid is represented by a number between 1 and 20 (ordered alphabetically), located within sub-matrices. For example, the point highlighted in red would indicate a high co-occurrence score between lysine (K) at position 1 of the sequence and tryptophan (W) at position 3 of the sequence.

**Figure 2**
**Training algorithm.** Training algorithm of the method. Here *YES* (NO) denotes the set of positive (negative) training sequences, including *nYES* (*nNO*) number of instances, which are tested with a window of a length n; *MatrixYES* (*MatrixNO*) are corresponding co-occurrence matrices with coordinates i and j; k denotes the subsequent number of a positive training sequence, M_k is a temporary positive correlation matrix obtained up to the k-th sequence, a denotes the beginning position of a tested window; X is the normalized sum of all previously calculated matrices M; l is an iteration counter; w denotes distance between current positive and negative co-occurrence matrices, w_d is the maximal distance later used in the classification.

**Figure 3**
**Classification of long proteins.** The results of our classification on 4 amyloid proteins. The method was trained on Waltz dataset. Black blocks indicate location of amyloidogenic segments obtained with w_l = 0.14, which was equivalent to the specificity of 60% on Waltz dataset. The brown blocks at the top indicate where the amyloidigenic segments would begin if a different w_l value would be assumed. The circles show amyloidogenic segments obtained experimentally by different groups, working on protein fragments of various lengths (green – above 16, blue -11, red - 7).

**Figure 4**
**Classification performance on a complete experimental dataset.** ROC obtained with FISH Amyloid on all available experimental data (all datasets with peptides 4-10 aminoacids long and experimental fragments from sup35). The total AUC ROC is 0.80 and the diagonal classification point has both *sensitivity* and *specificity* of 74%. The curve is based on average values of 40 independent trials from 4-fold cross-validations. The quantiles 0.95, 0.85 and median are presented as a boxplot at the diagonal classification point.

**Figure 5**
**Final co-occurrence matrix.** Graphical representation of the final co-localization matrix on extended experimental dataset. Large matrix coordinates represent the location of aminoacids couples, obtained from the 5-residue sliding window. The most frequent couples of aminoacids, which indicate the classification pattern, assume the darkest colors of dots. Aminoacids are denoted with small numbers, ordered alphabetically (A = 1, C = 2, D = 3, E = 4, F = 5, G = 6, H = 7, I = 8, K = 9,L = 10, M = 11, N = 12, P = 13, Q = 14, R = 15, S = 16, T = 17, V = 18, W= 19, Y = 20).

See this image and copyright information in PMC

Cited by

AggreProt: a web server for predicting and engineering aggregation prone regions in proteins.
Planas-Iglesias J, Borko S, Swiatkowski J, Elias M, Havlasek M, Salamon O, Grakova E, Kunka A, Martinovic T, Damborsky J, Martinovic J, Bednar D. Planas-Iglesias J, et al. Nucleic Acids Res. 2024 Jul 5;52(W1):W159-W169. doi: 10.1093/nar/gkae420. Nucleic Acids Res. 2024. PMID: 38801076 Free PMC article.
Identification of fibrillogenic regions in human triosephosphate isomerase.
Carcamo-Noriega EN, Saab-Rincon G. Carcamo-Noriega EN, et al. PeerJ. 2016 Feb 4;4:e1676. doi: 10.7717/peerj.1676. eCollection 2016. PeerJ. 2016. PMID: 26870617 Free PMC article.
Assessment of Therapeutic Antibody Developability by Combinations of In Vitro and In Silico Methods.
Wolf Pérez AM, Lorenzen N, Vendruscolo M, Sormanni P. Wolf Pérez AM, et al. Methods Mol Biol. 2022;2313:57-113. doi: 10.1007/978-1-0716-1450-1_4. Methods Mol Biol. 2022. PMID: 34478132
Quantitating denaturation by formic acid: imperfect repeats are essential to the stability of the functional amyloid protein FapC.
Christensen LFB, Nowak JS, Sønderby TV, Frank SA, Otzen DE. Christensen LFB, et al. J Biol Chem. 2020 Sep 11;295(37):13031-13046. doi: 10.1074/jbc.RA120.013396. Epub 2020 Jul 21. J Biol Chem. 2020. PMID: 32719003 Free PMC article.
Amyloidogenic motifs revealed by n-gram analysis.
Burdukiewicz M, Sobczyk P, Rödiger S, Duda-Madej A, Mackiewicz P, Kotulska M. Burdukiewicz M, et al. Sci Rep. 2017 Oct 11;7(1):12961. doi: 10.1038/s41598-017-13210-9. Sci Rep. 2017. PMID: 29021608 Free PMC article.

See all "Cited by" articles

References

1. Jaroniec CP, MacPhee CE, Bajaj VS, McMahon MT, Dobson CM, Griffin RG. High-resolution molecular structure of a peptide inan amyloid fibril determined by magic angle spinning NMR spectroscopy. Proc Natl Acad Sci U S A. 2004;101:711–716. doi: 10.1073/pnas.0304849101. - DOI - PMC - PubMed
1. Makin OS, Atkins E, Sikorski P, Johansson J, Serpell LC. Molecular basis for amyloid fibril formation and stability. Proc Natl Acad Sci U S A. 2005;102:315–320. doi: 10.1073/pnas.0406847102. - DOI - PMC - PubMed
1. Nelson R, Sawaya MR, Balbirnie M, Madsen AO, Riekel C, Grothe R, Eisenberg D. Structure of the cross- beta spine of amyloid-like fibrils. Nature. 2005;435:773–778. doi: 10.1038/nature03680. - DOI - PMC - PubMed
1. Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI, Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D. Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. - DOI - PubMed
1. Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D. Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] Jaroniec CP, MacPhee CE, Bajaj VS, McMahon MT, Dobson CM, Griffin RG. High-resolution molecular structure of a peptide inan amyloid fibril determined by magic angle spinning NMR spectroscopy. Proc Natl Acad Sci U S A. 2004;101:711–716. doi: 10.1073/pnas.0304849101. - DOI - PMC - PubMed

[2] Jaroniec CP, MacPhee CE, Bajaj VS, McMahon MT, Dobson CM, Griffin RG. High-resolution molecular structure of a peptide inan amyloid fibril determined by magic angle spinning NMR spectroscopy. Proc Natl Acad Sci U S A. 2004;101:711–716. doi: 10.1073/pnas.0304849101. - DOI - PMC - PubMed

[3] Makin OS, Atkins E, Sikorski P, Johansson J, Serpell LC. Molecular basis for amyloid fibril formation and stability. Proc Natl Acad Sci U S A. 2005;102:315–320. doi: 10.1073/pnas.0406847102. - DOI - PMC - PubMed

[4] Makin OS, Atkins E, Sikorski P, Johansson J, Serpell LC. Molecular basis for amyloid fibril formation and stability. Proc Natl Acad Sci U S A. 2005;102:315–320. doi: 10.1073/pnas.0406847102. - DOI - PMC - PubMed

[5] Nelson R, Sawaya MR, Balbirnie M, Madsen AO, Riekel C, Grothe R, Eisenberg D. Structure of the cross- beta spine of amyloid-like fibrils. Nature. 2005;435:773–778. doi: 10.1038/nature03680. - DOI - PMC - PubMed

[6] Nelson R, Sawaya MR, Balbirnie M, Madsen AO, Riekel C, Grothe R, Eisenberg D. Structure of the cross- beta spine of amyloid-like fibrils. Nature. 2005;435:773–778. doi: 10.1038/nature03680. - DOI - PMC - PubMed

[7] Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI, Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D. Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. - DOI - PubMed

[8] Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI, Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D. Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. - DOI - PubMed

[9] Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D. Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. - DOI - PubMed

[10] Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AØ, Riekel C, Eisenberg D. Atomic structures of amyloid cross β-spines reveal varied steric zippers. Nature. 2007;447:453–457. doi: 10.1038/nature05695. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids

Affiliation

FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases