Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search
- PMID: 12403597
- DOI: 10.1021/ac025747h
Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search
Abstract
We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.
Similar articles
-
A Multivariate Mixture Model to Estimate the Accuracy of Glycosaminoglycan Identifications Made by Tandem Mass Spectrometry (MS/MS) and Database Search.Mol Cell Proteomics. 2017 Feb;16(2):255-264. doi: 10.1074/mcp.M116.062588. Epub 2016 Dec 9. Mol Cell Proteomics. 2017. PMID: 27941081 Free PMC article.
-
A statistical model for identifying proteins by tandem mass spectrometry.Anal Chem. 2003 Sep 1;75(17):4646-58. doi: 10.1021/ac0341261. Anal Chem. 2003. PMID: 14632076
-
Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.J Proteome Res. 2008 Jan;7(1):254-65. doi: 10.1021/pr070542g. Epub 2007 Dec 27. J Proteome Res. 2008. PMID: 18159924
-
Protein identification by tandem mass spectrometry and sequence database searching.Methods Mol Biol. 2007;367:87-119. doi: 10.1385/1-59745-275-0:87. Methods Mol Biol. 2007. PMID: 17185772 Review.
-
Modes of inference for evaluating the confidence of peptide identifications.J Proteome Res. 2008 Jan;7(1):35-9. doi: 10.1021/pr7007303. Epub 2007 Dec 8. J Proteome Res. 2008. PMID: 18067248 Free PMC article. Review.
Cited by
-
Domain requirements of the JIL-1 tandem kinase for histone H3 serine 10 phosphorylation and chromatin remodeling in vivo.J Biol Chem. 2013 Jul 5;288(27):19441-9. doi: 10.1074/jbc.M113.464271. Epub 2013 May 30. J Biol Chem. 2013. PMID: 23723094 Free PMC article.
-
When target-decoy false discovery rate estimations are inaccurate and how to spot instances.J Proteome Res. 2013 Feb 1;12(2):1062-4. doi: 10.1021/pr301063v. Epub 2013 Jan 18. J Proteome Res. 2013. PMID: 23298186 Free PMC article.
-
Carbonylation of mitochondrial aconitase with 4-hydroxy-2-(E)-nonenal: localization and relative reactivity of addition sites.Biochim Biophys Acta. 2013 Jun;1834(6):1144-54. doi: 10.1016/j.bbapap.2013.03.005. Epub 2013 Mar 18. Biochim Biophys Acta. 2013. PMID: 23518448 Free PMC article.
-
Cyclin-dependent kinase 6 phosphorylates NF-κB P65 at serine 536 and contributes to the regulation of inflammatory gene expression.PLoS One. 2012;7(12):e51847. doi: 10.1371/journal.pone.0051847. Epub 2012 Dec 26. PLoS One. 2012. PMID: 23300567 Free PMC article.
-
Fast Quantitative Analysis of timsTOF PASEF Data with MSFragger and IonQuant.Mol Cell Proteomics. 2020 Sep;19(9):1575-1585. doi: 10.1074/mcp.TIR120.002048. Epub 2020 Jul 2. Mol Cell Proteomics. 2020. PMID: 32616513 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources