Faster SEQUEST searching for peptide identification from tandem mass spectra

doi:10.1021/pr101196n

. 2011 Sep 2;10(9):3871-9.

doi: 10.1021/pr101196n. Epub 2011 Jul 29.

Faster SEQUEST searching for peptide identification from tandem mass spectra

Benjamin J Diament¹, William Stafford Noble

Affiliations

PMID: 21761931
PMCID: PMC3166376
DOI: 10.1021/pr101196n

Faster SEQUEST searching for peptide identification from tandem mass spectra

Benjamin J Diament et al. J Proteome Res. 2011.

. 2011 Sep 2;10(9):3871-9.

doi: 10.1021/pr101196n. Epub 2011 Jul 29.

Authors

Benjamin J Diament¹, William Stafford Noble

Affiliation

¹ Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States.

PMID: 21761931
PMCID: PMC3166376
DOI: 10.1021/pr101196n

Abstract

Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.

PubMed Disclaimer

Figures

**Figure 1. Data flow in Tide before and after optimization**

**Figure 2**
**Profile of various development stages of Tide** for the worm benchmark (10,000 spectra). Each profile shows how much computing time was spent in each of the major phases of Tide’s operation at various points during development. Such profiles aided in deciding how best to proceed with optimization efforts. Profiles shown are (a) Tide-v0; (b) before and after linearizing background subtraction (Supplement Section 3); (c) before and after fivefold sparser representation, and after storing d to disk (Supplement Section 7); and (d) the current version of Tide. For each plot, the (diminishing) total execution time is indicated via the y-axis scale.

**Figure 3**
Performance of Tide compared to SEQUEST, Crux, OMSSA, Indexed SEQUEST (11/2009), and X!Tandem. Performance was measured in eight settings, varying the percur-sor mass tolerance window, the digest (fully tryptic candidate peptides or semi-tryptic), and the dataset (*C. elegans*, “worm dataset” or *S. cerevisiae*, “yeast dataset”—see Methods). Bar heights in log scale show spectra processed per second, with numerical results given below. Each experiment was repeated at least 3 times with average timings shown, except for the X!Tandem experiments. Because SEQUEST runs relatively slowly, all SEQUEST experiments, as well as Crux experiments using semi-tryptic digestion, were performed with 100 randomly-selected spectra. The remaining experiments, including all Tide experiments, were performed using 10,000 benchmark spectra.

**Figure 4**
Performance of Tide compared to SEQUEST and Indexed SEQUEST (11/2009) on benchmark datasets with variable modifications. Bar heights in log scale show the number of spectra processed per second. The same benchmark datasets were used as in Figure 3, but with up to two occurrences per peptide of phosphorylated residues serine, threonine, or tyrosine. Tests were run with a ±3.0 Dalton mass window and full tryptic digestion. As in Figure 3, SEQUEST experiments were run with 100 randomly-selected spectra, whereas Tide experiments used 10,000 benchmark spectra.

**Figure 5. Comparison ofX_Corr scores from Tide and from two versions of SEQUEST**
From two different data sets (yeast and worm), 100 spectra were selected at random for analysis by SEQUEST and by Tide. Searches were performed using a database of tryptic peptides from the corresponding organism, allowing up to two phosphorylations per peptide at occurrences of STY. The figure includes the top five PSMs per spectrum, as reported by SEQUEST. For each PSM, we plot the SEQUEST *X_Corr* versus the *X_Corr* computed by Tide. In the case of the bottom figures, we plot the SEQUEST 1993 *X_Corr* scores against those computed by SEQUEST 2009.

See this image and copyright information in PMC

Cited by

The SEQUEST family tree.
Tabb DL. Tabb DL. J Am Soc Mass Spectrom. 2015 Nov;26(11):1814-9. doi: 10.1007/s13361-015-1201-3. Epub 2015 Jun 30. J Am Soc Mass Spectrom. 2015. PMID: 26122518 Free PMC article.
Bioinformatic Workflows for Metaproteomics.
Holstein T, Muth T. Holstein T, et al. Methods Mol Biol. 2024;2820:187-213. doi: 10.1007/978-1-0716-3910-8_16. Methods Mol Biol. 2024. PMID: 38941024
Schistosoma haematobium Extracellular Vesicle Proteins Confer Protection in a Heterologous Model of Schistosomiasis.
Mekonnen GG, Tedla BA, Pickering D, Becker L, Wang L, Zhan B, Bottazzi ME, Loukas A, Sotillo J, Pearson MS. Mekonnen GG, et al. Vaccines (Basel). 2020 Jul 24;8(3):416. doi: 10.3390/vaccines8030416. Vaccines (Basel). 2020. PMID: 32722279 Free PMC article.
ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.
Xu T, Park SK, Venable JD, Wohlschlegel JA, Diedrich JK, Cociorva D, Lu B, Liao L, Hewel J, Han X, Wong CCL, Fonslow B, Delahunty C, Gao Y, Shah H, Yates JR 3rd. Xu T, et al. J Proteomics. 2015 Nov 3;129:16-24. doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11. J Proteomics. 2015. PMID: 26171723 Free PMC article.
PeptideShaker Online: A User-Friendly Web-Based Framework for the Identification of Mass Spectrometry-Based Proteomics Data.
Farag YM, Horro C, Vaudel M, Barsnes H. Farag YM, et al. J Proteome Res. 2021 Dec 3;20(12):5419-5423. doi: 10.1021/acs.jproteome.1c00678. Epub 2021 Oct 28. J Proteome Res. 2021. PMID: 34709836 Free PMC article.

See all "Cited by" articles

References

1. Eng JK, McCormack AL, Yates JR., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
1. Nesvizhskii AI, Vitek O, Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods. 2007;4(10):787–797. - PubMed
1. Eng JK, Fischer B, Grossman J, MacCoss MJ. A fast SEQUEST cross correlation algorithm. Journal of Proteome Research. 2008;7(10):4598–4602. - PubMed
1. Park CY, Klammer AA, Käll L, MacCoss MP, Noble WS. Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research. 2008;7(7):3022–3027. - PMC - PubMed
1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 EB007057-04/EB/NIBIB NIH HHS/United States

LinkOut - more resources

Full Text Sources

[1] Eng JK, McCormack AL, Yates JR., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed

[2] Eng JK, McCormack AL, Yates JR., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed

[3] Nesvizhskii AI, Vitek O, Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods. 2007;4(10):787–797. - PubMed

[4] Nesvizhskii AI, Vitek O, Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods. 2007;4(10):787–797. - PubMed

[5] Eng JK, Fischer B, Grossman J, MacCoss MJ. A fast SEQUEST cross correlation algorithm. Journal of Proteome Research. 2008;7(10):4598–4602. - PubMed

[6] Eng JK, Fischer B, Grossman J, MacCoss MJ. A fast SEQUEST cross correlation algorithm. Journal of Proteome Research. 2008;7(10):4598–4602. - PubMed

[7] Park CY, Klammer AA, Käll L, MacCoss MP, Noble WS. Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research. 2008;7(7):3022–3027. - PMC - PubMed

[8] Park CY, Klammer AA, Käll L, MacCoss MP, Noble WS. Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research. 2008;7(7):3022–3027. - PMC - PubMed

[9] Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed

[10] Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Faster SEQUEST searching for peptide identification from tandem mass spectra

Affiliation

Faster SEQUEST searching for peptide identification from tandem mass spectra

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources