The rapid generation of mutation data matrices from protein sequences
- PMID: 1633570
- DOI: 10.1093/bioinformatics/8.3.275
The rapid generation of mutation data matrices from protein sequences
Abstract
An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years. The method is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fast enough to generate a matrix from a specific family or class of proteins in minutes. Differences observed between our 250 PAM mutation data matrix and the matrix calculated by Dayhoff et al. are briefly discussed.
Similar articles
-
PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.BMC Res Notes. 2015 May 7;8:187. doi: 10.1186/s13104-015-1152-6. BMC Res Notes. 2015. PMID: 25947299 Free PMC article.
-
A bank of protein family patterns for rapid identification of possible functions of amino acid sequences.Comput Appl Biosci. 1997 Apr;13(2):115-22. doi: 10.1093/bioinformatics/13.2.115. Comput Appl Biosci. 1997. PMID: 9146957
-
A new approach for displaying identities and differences among aligned amino acid sequences.Comput Appl Biosci. 1992 Jun;8(3):261-5. doi: 10.1093/bioinformatics/8.3.261. Comput Appl Biosci. 1992. PMID: 1633568
-
A set-theoretic approach to database searching and clustering.Bioinformatics. 1998 Jun;14(5):430-8. doi: 10.1093/bioinformatics/14.5.430. Bioinformatics. 1998. PMID: 9682056
-
Protein database searches using compositionally adjusted substitution matrices.FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review.
Cited by
-
High-quality chromosome-level genome assembly of female Artemia franciscana reveals sex chromosome and Hox gene organization.Heliyon. 2024 Sep 28;10(19):e38687. doi: 10.1016/j.heliyon.2024.e38687. eCollection 2024 Oct 15. Heliyon. 2024. PMID: 39435060 Free PMC article.
-
Unique venom proteins from Solenopsis invicta x Solenopsis richteri hybrid fire ants.Toxicon X. 2021 May 7;9-10:100065. doi: 10.1016/j.toxcx.2021.100065. eCollection 2021 Jul. Toxicon X. 2021. PMID: 34027387 Free PMC article.
-
Evolution of gene expression after gene amplification.Genome Biol Evol. 2015 Apr 24;7(5):1303-12. doi: 10.1093/gbe/evv075. Genome Biol Evol. 2015. PMID: 25912045 Free PMC article.
-
NB-LRR Lineage-Specific Equipment Is Sorted Out by Sequence Pattern Adaptation and Domain Segment Shuffling.Int J Mol Sci. 2022 Nov 17;23(22):14269. doi: 10.3390/ijms232214269. Int J Mol Sci. 2022. PMID: 36430746 Free PMC article.
-
Allelic variation in a simple sequence repeat element of neisserial pglB2 and its consequences for protein expression and protein glycosylation.J Bacteriol. 2013 Aug;195(15):3476-85. doi: 10.1128/JB.00276-13. Epub 2013 May 31. J Bacteriol. 2013. PMID: 23729645 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Other Literature Sources
Miscellaneous