Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 1992 Jul 25;20(14):3631-7.
doi: 10.1093/nar/20.14.3631.

A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis

Affiliations
Free PMC article
Comparative Study

A comparison of several similarity indices used in the classification of protein sequences: a multivariate analysis

C Landès et al. Nucleic Acids Res. .
Free PMC article

Abstract

The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Nucleic Acids Res. 1991 Jan 25;19(2):265-9 - PubMed
    1. J Mol Biol. 1969 May 28;42(1):65-86 - PubMed
    1. J Biol Chem. 1990 Oct 25;265(30):18248-55 - PubMed
    1. Gene. 1988 Dec 15;73(1):237-44 - PubMed
    1. J Mol Evol. 1988;27(3):236-49 - PubMed

Publication types

MeSH terms