Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 15;27(24):3356-63.
doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.

An assessment of substitution scores for protein profile-profile comparison

Affiliations

An assessment of substitution scores for protein profile-profile comparison

Xugang Ye et al. Bioinformatics. .

Abstract

Motivation: Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied.

Results: A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales.

Contact: altschul@ncbi.nlm.nih.gov

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Two profile–profile alignments. (A) A putatively correct alignment. (B) One column misaligned.
Fig. 2.
Fig. 2.
Scoring system quality. Scoring systems are assessed using the measure formula image on 2400 weighted profile pairs for each column. The color panel shows the difference between the maximum value in each column, formula image, and individual values of formula image. Some differences exceed the upper limit of the color scale.

Similar articles

Cited by

References

    1. Altschul S.F. Gap costs for multiple sequence alignment. J. Theor. Biol. 1989;138:297–309. - PubMed
    1. Altschul S.F. Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 1991;219:555–565. - PMC - PubMed
    1. Altschul S.F., Lipman D.J. Trees, stars, and multiple biological sequence alignment. SIAM J. Appl. Math. 1989;49:197–209.
    1. Altschul S.F., et al. Weights for data related by a tree. J. Mol. Biol. 1989;207:647–653. - PubMed
    1. Altschul S.F., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed

Publication types