A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis
- PMID: 1641329
- PMCID: PMC334011
- DOI: 10.1093/nar/20.14.3631
A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis
Abstract
The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.
Similar articles
-
Dot-plot comparisons by multivariate analysis (DOCMA): a tool for classifying protein sequences.Comput Appl Biosci. 1993 Apr;9(2):191-6. doi: 10.1093/bioinformatics/9.2.191. Comput Appl Biosci. 1993. PMID: 8481822
-
Rapid and sensitive sequence comparison with FASTP and FASTA.Methods Enzymol. 1990;183:63-98. doi: 10.1016/0076-6879(90)83007-v. Methods Enzymol. 1990. PMID: 2156132
-
Improved tools for biological sequence comparison.Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8. doi: 10.1073/pnas.85.8.2444. Proc Natl Acad Sci U S A. 1988. PMID: 3162770 Free PMC article.
-
[Structure of aminoacyl-tRNA-synthetase of higher eukaryotes from molecular cloning data].Mol Biol (Mosk). 1994 Sep-Oct;28(5):978-90. Mol Biol (Mosk). 1994. PMID: 7990843 Review. Russian.
-
The aminoacyl-tRNA synthetase family: modules at work.Bioessays. 1993 Oct;15(10):675-87. doi: 10.1002/bies.950151007. Bioessays. 1993. PMID: 8274143 Review.
Cited by
-
An analysis of the sequence of part of the right arm of chromosome II of S. cerevisiae reveals new genes encoding an amino-acid permease and a carboxypeptidase.Curr Genet. 1994 Jul;26(1):1-7. doi: 10.1007/BF00326297. Curr Genet. 1994. PMID: 7954890
-
The human EBNA-2 coactivator p100: multidomain organization and relationship to the staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development.Biochem J. 1997 Jan 1;321 ( Pt 1)(Pt 1):125-32. doi: 10.1042/bj3210125. Biochem J. 1997. PMID: 9003410 Free PMC article.
-
Heterospecific cloning of Arabidopsis thaliana cDNAs by direct complementation of pyrimidine auxotrophic mutants of Saccharomyces cerevisiae. I. Cloning and sequence analysis of two cDNAs catalysing the second, fifth and sixth steps of the de novo pyrimidine biosynthesis pathway.Mol Gen Genet. 1994 Jul 8;244(1):23-32. doi: 10.1007/BF00280183. Mol Gen Genet. 1994. PMID: 8041358
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources