MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

doi:10.1021/pr0604054

. 2007 Feb;6(2):654-61.

doi: 10.1021/pr0604054.

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

David L Tabb¹, Christopher G Fernando, Matthew C Chambers

Affiliations

Affiliation

¹ Mass Spectrometry Research Center / Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, TN 37232-8575, USA. david.l.tabb@vanderbilt.edu

PMID: 17269722
PMCID: PMC2525619
DOI: 10.1021/pr0604054

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

David L Tabb et al. J Proteome Res. 2007 Feb.

. 2007 Feb;6(2):654-61.

doi: 10.1021/pr0604054.

Authors

David L Tabb¹, Christopher G Fernando, Matthew C Chambers

Affiliation

¹ Mass Spectrometry Research Center / Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, TN 37232-8575, USA. david.l.tabb@vanderbilt.edu

PMID: 17269722
PMCID: PMC2525619
DOI: 10.1021/pr0604054

Abstract

Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.

PubMed Disclaimer

Figures

**Figure 1**
During preprocessing, the fragment ions for each spectrum are sorted by intensity in decreasing order. The cumulative intensity from the most intense peak to each other peak is computed (represented here by the curve). A fraction of the original TIC is retained (in this case, 95%), with the remaining peaks stripped out as noise. The remaining peaks are split into classes that double in population at each step; in this three class example, each is marked with a letter.

**Figure 2**
Sample and instrument configuration can lead to very different numbers of peak per tandem mass spectrum. The LTQ produced spectra with the highest density and the lowest density. Tandem mass spectra from an Orbitrap will differ by the mass analyzer in which they were collected. In this figure, the number of peaks remaining after peak filtering is shown for the spectrum at the third quartile of peak counts. Peak filtering on the basis of TIC retention reveals the exponential distribution of peak intensities; the least intense peaks are most common, while intense peaks are least numerous. Filtering out only 8% of total intensity in the LTQ Human sample reduced peak counts by half.

**Figure 3**
Figure 3a-3d. MyriMatch performance was evaluated on tandem spectra from LCQ, LTQ, and Orbitrap mass analyzers. Scoring models with one, two, three, and four intensity classes were tested. The higher the curve, the more peptides were confidently identified at this configuration of the software. The horizontal position reflects the extent of preprocessing, with the heaviest filtering on the left and the least filtering on the right. A scoring system that counts only matches and mismatches (represented by the red curves) performs worse with little peak filtering in LCQ and LTQ data. Only Orbitrap tandem mass spectra reveal no difference in performance when peaks are stratified by their intensities. In the final panel, the performance of MyriMatch (retaining 98% of and segregating peaks into three intensity classes) is compared to that of Sequest and X!Tandem for all six evaluated samples.

**Figure 4**
Figure 4 (TOC Graphic). This Venn diagram shows the overlap in peptide identifications among the three tested algorithms for the human standard protein mixture. MyriMatch (394 peptides) and X!Tandem (369 peptides) outperformed Sequest (309 peptides) for this sample. The gained peptides for MyriMatch tended to be shorter sequences. Of all observed peptides, 51% were identified by all three algorithms.

See this image and copyright information in PMC

Cited by

Proteomic Analysis of a Poplar Cell Suspension Culture Suggests a Major Role of Protein S-Acylation in Diverse Cellular Processes.
Srivastava V, Weber JR, Malm E, Fouke BW, Bulone V. Srivastava V, et al. Front Plant Sci. 2016 Apr 12;7:477. doi: 10.3389/fpls.2016.00477. eCollection 2016. Front Plant Sci. 2016. PMID: 27148305 Free PMC article.
NuRD subunit CHD4 regulates super-enhancer accessibility in rhabdomyosarcoma and represents a general tumor dependency.
Marques JG, Gryder BE, Pavlovic B, Chung Y, Ngo QA, Frommelt F, Gstaiger M, Song Y, Benischke K, Laubscher D, Wachtel M, Khan J, Schäfer BW. Marques JG, et al. Elife. 2020 Aug 3;9:e54993. doi: 10.7554/eLife.54993. Elife. 2020. PMID: 32744500 Free PMC article.
Obesity and altered glucose metabolism impact HDL composition in CETP transgenic mice: a role for ovarian hormones.
Martinez MN, Emfinger CH, Overton M, Hill S, Ramaswamy TS, Cappel DA, Wu K, Fazio S, McDonald WH, Hachey DL, Tabb DL, Stafford JM. Martinez MN, et al. J Lipid Res. 2012 Mar;53(3):379-389. doi: 10.1194/jlr.M019752. Epub 2012 Jan 3. J Lipid Res. 2012. PMID: 22215797 Free PMC article.
Active anaerobic methane oxidation and sulfur disproportionation in the deep terrestrial subsurface.
Bell E, Lamminmäki T, Alneberg J, Qian C, Xiong W, Hettich RL, Frutschi M, Bernier-Latmani R. Bell E, et al. ISME J. 2022 Jun;16(6):1583-1593. doi: 10.1038/s41396-022-01207-w. Epub 2022 Feb 16. ISME J. 2022. PMID: 35173296 Free PMC article.
Proteomic parsimony through bipartite graph analysis improves accuracy and transparency.
Zhang B, Chambers MC, Tabb DL. Zhang B, et al. J Proteome Res. 2007 Sep;6(9):3549-57. doi: 10.1021/pr070230d. Epub 2007 Aug 4. J Proteome Res. 2007. PMID: 17676885 Free PMC article.

See all "Cited by" articles

References

1. Eng JK, McCormack AL, Yates JR., 3rd J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed
1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Electrophoresis. 1999;20:3551–67. - PubMed
1. Fridman T, Razumovskaya J, Verberkmoes N, Hurst G, Protopopescu V, Xu Y. J Bioinform Comput Biol. 2005;3:455–76. - PubMed
1. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. J Proteome Res. 2004;3:958–64. - PubMed
1. Zhang N, Aebersold R, Schwikowski B. Proteomics. 2002;2:1406–12. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] Eng JK, McCormack AL, Yates JR., 3rd J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed

[2] Eng JK, McCormack AL, Yates JR., 3rd J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed

[3] Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Electrophoresis. 1999;20:3551–67. - PubMed

[4] Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Electrophoresis. 1999;20:3551–67. - PubMed

[5] Fridman T, Razumovskaya J, Verberkmoes N, Hurst G, Protopopescu V, Xu Y. J Bioinform Comput Biol. 2005;3:455–76. - PubMed

[6] Fridman T, Razumovskaya J, Verberkmoes N, Hurst G, Protopopescu V, Xu Y. J Bioinform Comput Biol. 2005;3:455–76. - PubMed

[7] Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. J Proteome Res. 2004;3:958–64. - PubMed

[8] Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. J Proteome Res. 2004;3:958–64. - PubMed

[9] Zhang N, Aebersold R, Schwikowski B. Proteomics. 2002;2:1406–12. - PubMed

[10] Zhang N, Aebersold R, Schwikowski B. Proteomics. 2002;2:1406–12. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

Affiliation

MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases