Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Nov 30:10:394.
doi: 10.1186/1471-2105-10-394.

Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior

Affiliations

Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior

Yohan Kim et al. BMC Bioinformatics. .

Abstract

Background: Experts in peptide:MHC binding studies are often able to estimate the impact of a single residue substitution based on a heuristic understanding of amino acid similarity in an experimental context. Our aim is to quantify this measure of similarity to improve peptide:MHC binding prediction methods. This should help compensate for holes and bias in the sequence space coverage of existing peptide binding datasets.

Results: Here, a novel amino acid similarity matrix (PMBEC) is directly derived from the binding affinity data of combinatorial peptide mixtures. Like BLOSUM62, this matrix captures well-known physicochemical properties of amino acid residues. However, PMBEC differs markedly from existing matrices in cases where residue substitution involves a reversal of electrostatic charge. To demonstrate its usefulness, we have developed a new peptide:MHC class I binding prediction method, using the matrix as a Bayesian prior. We show that the new method can compensate for missing information on specific residues in the training data. We also carried out a large-scale benchmark, and its results indicate that prediction performance of the new method is comparable to that of the best neural network based approaches for peptide:MHC class I binding.

Conclusion: A novel amino acid similarity matrix has been derived for peptide:MHC binding interactions. One prominent feature of the matrix is that it disfavors substitution of residues with opposite charges. Given that the matrix was derived from experimentally determined peptide:MHC binding affinity measurements, this feature is likely shared by all peptide:protein interactions. In addition, we have demonstrated the usefulness of the matrix as a Bayesian prior in an improved scoring-matrix based peptide:MHC class I prediction method. A software implementation of the method is available at: http://www.mhc-pathway.net/smmpmbec.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The peptide:MHC binding energy covariance (PMBEC) matrix. The 20 amino acid residues are shown at the top and right. Each matrix entry corresponds to the covariance in peptide:MHC binding energies between two residues. Values greater than 0.05 indicate similarity between residues, and are colored green. Values less than -0.05 indicate dissimilarity between residues, and are colored red. Note that the diagonal values are the residue specific statistical variances (defined as the average squared values), which indicate how much the binding energies associated with the residue varies over all alleles and positions. Cysteine (C), Glycine (G), Asparagine (N), and Glutamine (Q) are relative outliers because they have no partner residue with absolute covariance > 0.05. Agglomerative clustering with complete linkage was used to group the amino acid residues, corresponding to ordering the matrix rows and columns. The distance measure between two residues aa and aa' used for clustering is (K - PMBEC(aa, aa')), where K is the maximum value in the PMBEC matrix. The resulting dendrogram on the right provides a classification of amino acids which largely corresponds to classical groupings of amino acids by physicochemical properties.
Figure 2
Figure 2
A scatter plot of non-diagonal elements of PMBEC versus those of BLOSUM62. The two matrices were centered as described in the method section.
Figure 3
Figure 3
Comparisons of amino acid similarity profiles of PMBEC and BLOSUM62. Each amino acid profile of 20 elements was normalized to a length of 1.0 with zero mean to allow direct comparison between the two matrices. Serine-specific amino acid similarity profiles of the two matrices share a high correlation. Glutamic Acid-specific ones, however, significantly differ for the substitutions involving charged residues: (Glutamic Acid (E) -> Lysine (K)) and (Glutamic Acid (E) -> Arginine (R)).
Figure 4
Figure 4
Comparison of binding contributions of 20 amino acids at position 1 of the scoring matrices generated by SMM, SMMPMBEC, and SMMBLOSUM. SMM was trained on the 9-mer peptide binding data set (total of 1869 data points) for HLA A*3101, yielding a single scoring matrix with dimensions 20 × 9, where the rows represent 20 residues while the columns represent 9 positions of a peptide. The scoring matrix generated by SMM serves as a reference point when binding data is well covered. SMMPMBEC and SMMBLOSUM, on the other hand, were trained on the 20 derived data sets, each one lacking peptides containing a residue at position 1. The figure plots the scoring matrix values for the residue specified on the x-axis in the second column of the scoring matrix of SMM alongside corresponding elements from the 20 scoring matrices of SMMPMBEC and SMMBLOSUM.
Figure 5
Figure 5
Prediction performances of SMMPMBEC and SMM, trained on data sets with variable amounts of peptide binding data. For each data set size, 20 data sets were randomly drawn from the peptide binding data of HLA A*1101. The average AUC (Area-Under-Curve) values of the two prediction methods are plotted as a function of the dataset size.

Similar articles

Cited by

References

    1. Henikoff S, Henikoff JG. Amino Acid Substitution Matrices from Protein Blocks. Proc Natl Acad Sci USA. 1992;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. - DOI - PMC - PubMed
    1. Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure. 1978. pp. 345–352.
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Peters B, Bulik S, Tampe R, van Endert PM, Holzhutter H-G. Identifying MHC Class I Epitopes by Predicting the TAP Transport Efficiency of Epitope Precursors. J Immunol. 2003;171(4):1741–1749. - PubMed
    1. Burgevin A, Saveanu L, Kim Y, Barilleau E, Kotturi M, Sette A, van Endert P, Peters B. A Detailed Analysis of the Murine TAP Transporter Substrate Specificity. PLoS ONE. 2008;3(6):e2402. doi: 10.1371/journal.pone.0002402. - DOI - PMC - PubMed

Publication types