Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 2;10(9):3871-9.
doi: 10.1021/pr101196n. Epub 2011 Jul 29.

Faster SEQUEST searching for peptide identification from tandem mass spectra

Affiliations

Faster SEQUEST searching for peptide identification from tandem mass spectra

Benjamin J Diament et al. J Proteome Res. .

Abstract

Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Data flow in Tide before and after optimization
Figure 2
Figure 2
Profile of various development stages of Tide for the worm benchmark (10,000 spectra). Each profile shows how much computing time was spent in each of the major phases of Tide’s operation at various points during development. Such profiles aided in deciding how best to proceed with optimization efforts. Profiles shown are (a) Tide-v0; (b) before and after linearizing background subtraction (Supplement Section 3); (c) before and after fivefold sparser representation, and after storing d to disk (Supplement Section 7); and (d) the current version of Tide. For each plot, the (diminishing) total execution time is indicated via the y-axis scale.
Figure 3
Figure 3
Performance of Tide compared to SEQUEST, Crux, OMSSA, Indexed SEQUEST (11/2009), and X!Tandem. Performance was measured in eight settings, varying the percur-sor mass tolerance window, the digest (fully tryptic candidate peptides or semi-tryptic), and the dataset (C. elegans, “worm dataset” or S. cerevisiae, “yeast dataset”—see Methods). Bar heights in log scale show spectra processed per second, with numerical results given below. Each experiment was repeated at least 3 times with average timings shown, except for the X!Tandem experiments. Because SEQUEST runs relatively slowly, all SEQUEST experiments, as well as Crux experiments using semi-tryptic digestion, were performed with 100 randomly-selected spectra. The remaining experiments, including all Tide experiments, were performed using 10,000 benchmark spectra.
Figure 4
Figure 4
Performance of Tide compared to SEQUEST and Indexed SEQUEST (11/2009) on benchmark datasets with variable modifications. Bar heights in log scale show the number of spectra processed per second. The same benchmark datasets were used as in Figure 3, but with up to two occurrences per peptide of phosphorylated residues serine, threonine, or tyrosine. Tests were run with a ±3.0 Dalton mass window and full tryptic digestion. As in Figure 3, SEQUEST experiments were run with 100 randomly-selected spectra, whereas Tide experiments used 10,000 benchmark spectra.
Figure 5
Figure 5. Comparison ofXCorr scores from Tide and from two versions of SEQUEST
From two different data sets (yeast and worm), 100 spectra were selected at random for analysis by SEQUEST and by Tide. Searches were performed using a database of tryptic peptides from the corresponding organism, allowing up to two phosphorylations per peptide at occurrences of STY. The figure includes the top five PSMs per spectrum, as reported by SEQUEST. For each PSM, we plot the SEQUEST XCorr versus the XCorr computed by Tide. In the case of the bottom figures, we plot the SEQUEST 1993 XCorr scores against those computed by SEQUEST 2009.

Similar articles

Cited by

References

    1. Eng JK, McCormack AL, Yates JR., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
    1. Nesvizhskii AI, Vitek O, Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods. 2007;4(10):787–797. - PubMed
    1. Eng JK, Fischer B, Grossman J, MacCoss MJ. A fast SEQUEST cross correlation algorithm. Journal of Proteome Research. 2008;7(10):4598–4602. - PubMed
    1. Park CY, Klammer AA, Käll L, MacCoss MP, Noble WS. Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research. 2008;7(7):3022–3027. - PMC - PubMed
    1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed

Publication types

Substances