Dot-plot comparisons by multivariate analysis (DOCMA): a tool for classifying protein sequences
- PMID: 8481822
- DOI: 10.1093/bioinformatics/9.2.191
Dot-plot comparisons by multivariate analysis (DOCMA): a tool for classifying protein sequences
Abstract
A method aimed at classifying protein sequences without resorting to pairwise alignment is presented. Called DOCMA (DOt-plot Comparisons by Multivariate Analysis), it is based on a multivariate analysis of the pairwise dot-plots between all the sequences in the set. The dot-plots are first simplified by considering only the projections of the 'diagonal' segments of similarity onto the axes. From these projections a data matrix is built, in which each column is representative of the comparisons of one given sequence with all the other ones. This data matrix is then transformed into a distance matrix by a chi-squared analysis, from which the coordinates of the sequences in an orthonormal Euclidean space are obtained. The sequences are finally classified by a dynamic clustering procedure followed by a search for strong clusters. Application of this method to protein families such as the globins, the cytochromes c and the aminoacyl-tRNA synthetases shows that it is quite effective in delineating subgroups that contain even distantly related sequences.
Similar articles
-
A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis.Nucleic Acids Res. 1992 Jul 25;20(14):3631-7. doi: 10.1093/nar/20.14.3631. Nucleic Acids Res. 1992. PMID: 1641329 Free PMC article.
-
Motif recognition and alignment for many sequences by comparison of dot-matrices.J Mol Biol. 1991 Mar 5;218(1):33-43. doi: 10.1016/0022-2836(91)90871-3. J Mol Biol. 1991. PMID: 1900535
-
GATA: a graphic alignment tool for comparative sequence analysis.BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9. BMC Bioinformatics. 2005. PMID: 15655071 Free PMC article.
-
Multidimensional dot-matrices.Comput Appl Biosci. 1994 Dec;10(6):605-11. doi: 10.1093/bioinformatics/10.6.605. Comput Appl Biosci. 1994. PMID: 7704659
-
A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis.Gene. 1995 Dec 29;167(1-2):GC1-10. doi: 10.1016/0378-1119(95)00714-8. Gene. 1995. PMID: 8566757
Cited by
-
A comparative analysis of three classes of bacterial non-specific Acid phosphatases and archaeal phosphoesterases: evolutionary perspective.Acta Inform Med. 2012 Sep;20(3):167-73. doi: 10.5455/aim.2012.20.167-173. Acta Inform Med. 2012. PMID: 23322973 Free PMC article.
-
Genomics Analysis of Replicative Helicase DnaB Sequences in Proteobacteria.Acta Inform Med. 2014 Aug;22(4):249-54. doi: 10.5455/aim.2014.22.249-254. Epub 2014 Aug 21. Acta Inform Med. 2014. PMID: 25395727 Free PMC article.