Key Points
-
For mass spectrometry (MS) analysis, the proteins of interest are proteolytically digested — the resulting peptides are easier to handle, easier to sequence and have better detection efficiencies than intact proteins.
-
Thousands of peptides can be introduced to the mass spectrometer through 'on-line' capillary chromatography. Using MS, their masses can be measured and they can be fragmented to yield partial amino-acid-sequence information (tandem MS).
-
Powerful algorithms can match the data from tandem MS against possible peptide sequences in amino-acid databases. The resulting protein probability scores need to be studied carefully to avoid over-interpreting the identification results, and unbiased statistical techniques are now helping to address such problems.
-
Protein modifications are amenable to MS analysis, as these modifications normally induce mass shifts. However, due to the substoichiometric amounts of protein modifications, selective enrichment and detection methods are usually necessary and there is no guarantee that the complete primary structure of the protein will be covered.
-
Proteins can be quantified by MS using stable-isotope labels. If the relative abundance of a protein in two samples is to be compared, labelling with stable isotopes is the method of choice. The use of isotopically labelled internal standards is recommended for absolute quantification. However, peak intensities and the number of peptides that are observed during a liquid-chromatography–MS experiment (versus the number of theoretically observable peptides that can be derived from the protein of interest) can also be used to estimate protein abundance.
-
There has been great progress in the proteomic analysis of multiprotein complexes and subcellular organelles. However, routine, in-depth proteome analyses of whole-cell lysates, tissue samples and plasma still elude the dynamic-range capabilities and sensitivity of the instruments that are available at present.
Abstract
Proteomics is an increasingly powerful and indispensable technology in molecular cell biology. It can be used to identify the components of small protein complexes and large organelles, to determine post-translational modifications and in sophisticated functional screens. The key — but little understood — technology in mass-spectrometry-based proteomics is peptide sequencing, which we describe and review here in an easily accessible format.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wilm, M. et al. Femtomole sequencing of proteins from polyacrylamide gels by nano electrospray mass spectrometry. Nature 379, 466–469 (1996). Showed that MS could identify gel-separated proteins using a much smaller quantity of the sample than was required by chemical techniques such as Edman degradation.
Tyers, M. & Mann, M. From genomics to proteomics. Nature 422, 193–197 (2003).
Zhu, H., Bilgin, M. & Snyder, M. Proteomics. Annu. Rev. Biochem. 72, 783–812 (2003).
Phizicky, E., Bastiaens, P. I., Zhu, H., Snyder, M. & Fields, S. Protein analysis on a proteomic scale. Nature 422, 208–215 (2003).
Sali, A., Glaeser, R., Earnest, T. & Baumeister, W. From words to literature in structural proteomics. Nature 422, 216–225 (2003).
Hanash, S. Disease proteomics. Nature 422, 226–232 (2003).
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Figeys, D. Proteomics in 2002: a year of technical development and wide-ranging applications. Anal. Chem. 75, 2891–2905 (2003).
Romijn, E. P., Krijgsveld, J. & Heck, A. J. Recent liquid chromatographic–(tandem) mass spectrometric applications in proteomics. J. Chromatogr. A 1000, 589–608 (2003).
Lin, D., Tabb, D. L. & Yates, J. R. 3rd. Large-scale protein identification using mass spectrometry. Biochim. Biophys. Acta 1646, 1–10 (2003).
Wu, C. C. & Yates, J. R. 3rd. The application of mass spectrometry to membrane proteomics. Nature Biotechnol. 21, 262–267 (2003).
Mann, M. & Jensen, O. N. Proteomic analysis of post-translational modifications. Nature Biotechnol. 21, 255–261 (2003).
Patterson, S. D. & Aebersold, R. H. Proteomics: the first decade and beyond. Nature Genet. 33 (Suppl.), 311–323 (2003).
Ferguson, P. L. & Smith, R. D. Proteome analysis by mass spectrometry. Annu. Rev. Biophys. Biomol. Struct. 32, 399–424 (2003).
Mo, W. & Karger, B. L. Analytical aspects of mass spectrometry and proteomics. Curr. Opin. Chem. Biol. 6, 666–675 (2002).
Mørtz, E. et al. Sequence tag identification of intact proteins by matching tandem mass spectral data against sequence data bases. Proc. Natl Acad. Sci. USA 93, 8264–8267 (1996).
Horn, D. M., Zubarev, R. A. & McLafferty, F. W. Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. Proc. Natl Acad. Sci. USA 97, 10313–10317 (2000).
Sze, S. K., Ge, Y., Oh, H. & McLafferty, F. W. Top-down mass spectrometry of a 29-kDa protein for characterization of any posttranslational modification to within one residue. Proc. Natl Acad. Sci. USA 99, 1774–1779 (2002).
Taylor, G. K. et al. Web and database software for identification of intact proteins using 'top down' mass spectrometry. Anal. Chem. 75, 4081–4086 (2003).
Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F. & Whitehouse, C. M. Electrospray ionization for mass spectrometry of large biomolecules. Science 246, 64–71 (1989).
Lasonder, E. et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419, 537–542 (2002).
Schirle, M., Heurtier, M. A. & Kuster, B. Profiling core proteomes of human cell lines by 1D PAGE and LC–MS/MS. Mol. Cell. Proteomics 2, 1297–1305 (2003).
Washburn, M. P., Wolters, D. & Yates, J. R. 3rd. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnol. 19, 242–247 (2001). Established the 'shotgun' technology by showing that many proteins in a yeast-cell lysate could be identified in a single experiment.
Hillenkamp, F., Karas, M., Beavis, R. C. & Chait, B. T. Matrix-assisted laser desorption/ionization mass spectrometry of biopolymers. Anal. Chem. 63, 1193A–1202A (1991).
Mann, M. A shortcut to interesting human genes: peptide sequence tags, ESTs and computers. Trends Biochem. Sci. 21, 494–495 (1996).
Taylor, J. A. & Johnson, R. S. Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 11, 1067–1075 (1997).
Liska, A. J. & Shevchenko, A. Expanding the organismal scope of proteomics: cross-species protein identification by mass spectrometry and its implications. Proteomics 3, 19–28 (2003). This and other papers from this group address the important issue of using cross-species identification for proteins if the genome of the organism of interest has not been sequenced (see also reference 32).
Perkins, D. N., Pappin, D. J., Creasy, D. M. & Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
MacCoss, M. J., Wu, C. C. & Yates, J. R. 3rd. Probability-based validation of protein identifications using a modified SEQUEST algorithm. Anal. Chem. 74, 5593–5599 (2002).
Olsen, J. V., Ong, S. E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 3, 608–614 (2004). Shows that trypsin is an exceedingly specific protease (non-tryptic peptides are produced by protein degradation or by the decomposition of peptides at labile bonds before tandem MS).
Keller, A. et al. Experimental protein mixture for validating tandem mass spectral analysis. Omics 6, 207–212 (2002).
Shevchenko, A. et al. Charting the proteomes of organisms with unsequenced genomes by MALDI–quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal. Chem. 73, 1917–1926 (2001).
Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J. & Gygi, S. P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC–MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003). Reports the large-scale identification of yeast proteins and, using searches in sequence-reversed databases, it establishes a statistical description for false-positive identification. Finally, by re-analysing the data with the cut-off values that have been used in some studies, they show that error rates can be very high.
Keller, A., Nesvizhskii, A. I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Nesvizhskii, A. I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
Nesvizhskii, A. I. & Aebersold, R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS. Drug Discov. Today 9, 173–181 (2004). References 34–36 establish an objective and powerful statistical framework to assess the probability of correct protein identification in proteomics experiments. The procedures can be used on any data set independent of the type of mass spectrometer used and could be the basis of a common identification standard in proteomics.
Barr, J. R. et al. Isotope dilution — mass spectrometric quantification of specific proteins: model application with apolipoprotein A-I. Clin. Chem. 42, 1676–1682 (1996).
Stemmann, O., Zou, H., Gerber, S. A., Gygi, S. P. & Kirschner, M. W. Dual inhibition of sister chromatid separation at metaphase. Cell 107, 715–726 (2001).
Gerber, S. A., Rush, J., Stemman, O., Kirschner, M. W. & Gygi, S. P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl Acad. Sci. USA 100, 6940–6945 (2003). References 37–39 introduce the so-called 'AQUA' (absolute quantification) technology for absolute peptide quantification, which involves mixing stable-isotope-labelled peptide analogues into the peptide mixture.
Aebersold, R. Constellations in a cellular universe. Nature 422, 115–116 (2003).
Lahm, H. W. & Langen, H. Mass spectrometry: a tool for the identification of proteins separated by gels. Electrophoresis 21, 2105–2114 (2000).
Oda, Y., Huang, K., Cross, F. R., Cowburn, D. & Chait, B. T. Accurate quantitation of protein expression and site-specific phosphorylation. Proc. Natl Acad. Sci. USA 96, 6591–6596 (1999).
Ong, S. E. et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 (2002).
Ong, S. E., Kratchmarova, I. & Mann, M. Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC). J. Proteome Res. 2, 173–181 (2003).
Sechi, S. & Chait, B. T. Modification of cysteine residues by alkylation. A tool in peptide mapping and protein identification. Anal. Chem. 70, 5150–5158 (1998).
Munchbach, M., Quadroni, M., Miotto, G. & James, P. Quantitation and facilitated de novo sequencing of proteins by isotopic N-terminal labeling of peptides with a fragmentation-directing moiety. Anal. Chem. 72, 4047–4057 (2000).
Gygi, S. P. et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnol. 17, 994–999 (1999). Introduces the ICAT technology — the first demonstration of a global, quantifiable MS technique that is applicable to mammalian samples.
Tao, W. A. & Aebersold, R. Advances in quantitative proteomics via stable isotope tagging and mass spectrometry. Curr. Opin. Biotechnol. 14, 110–118 (2003).
Lamond, A. I. & Mann, M. Cell biology and the genome projects — a concerted strategy for characterizing multi-protein complexes using mass spectrometry. Trends Cell Biol. 7, 139–142 (1997).
Neubauer, G. et al. Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometry. Proc. Natl Acad. Sci. USA 94, 385–390 (1997).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). Large-scale immunoprecipitations in references 51 and 52 show that protein–protein interaction maps can be obtained by MS and that much of the yeast cell is organized into protein complexes.
Dreger, M. Subcellular proteomics. Mass Spectrom. Rev. 22, 27–56 (2003).
Taylor, S. W., Fahy, E. & Ghosh, S. S. Global organellar proteomics. Trends Biotechnol. 21, 82–88 (2003).
Brunet, S. et al. Organelle proteomics: looking at less to see more. Trends Cell Biol. 13, 629–638 (2003).
Blagoev, B. et al. A proteomics strategy to elucidate functional protein–protein interactions applied to EGF signaling. Nature Biotechnol. 21, 315–318 (2003).
Ranish, J. A. et al. The study of macromolecular complexes by quantitative proteomics. Nature Genet. 33, 349–355 (2003).
Schulze, W. X. & Mann, M. A novel proteomic screen for peptide–protein interactions. J. Biol. Chem. 279, 10756–10764 (2004). References 56–58 show that quantitative methods can identify functionally important protein interactions in the presence of a large excess of background proteins.
Andersen, J. S. et al. Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426, 570–574 (2003). Protein-correlation profiling is introduced as a technology to distinguish true members of complexes and organelles from co-purifying background proteins on the basis of their fractionation profiles.
Gygi, S. P., Rochon, Y., Franza, B. R. & Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 (1999).
Lipton, M. S. et al. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl Acad. Sci. USA 99, 11049–11054 (2002).
Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001).
Bader, G. D. et al. Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell Biol. 13, 344–356 (2003).
Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).
Huh, W. K. et al. Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003).
Mootha, V. K. et al. Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc. Natl Acad. Sci. USA 100, 605–610 (2003).
Mootha, V. K. et al. Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria. Cell 115, 629–640 (2003). References 66 and 67 illustrate the power of combined organelle proteomics and mRNA co-regulation data.
Karas, M. & Hillenkamp, F. Laser desorption ionization of proteins with molecular mass exceeding 10,000 daltons. Anal. Chem. 60, 2299–2301 (1988).
Roepstorff, P. & Fohlman, J. Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed. Mass Spectrom. 11, 601 (1984).
Biemann, K. Mass spectrometry of peptides and proteins. Annu. Rev. Biochem. 61, 977–1010 (1992).
Zhang, Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908–3922 (2004).
Schlosser, A. & Lehmann, W. D. Five-membered ring formation in unimolecular reactions of peptides: a key structural element controlling low-energy collision-induced dissociation of peptides. J. Mass Spectrom. 35, 1382–1390 (2000).
Steen, H., Kuster, B., Fernandez, M., Pandey, A. & Mann, M. Detection of tyrosine phosphorylated peptides by precursor ion scanning quadrupole TOF mass spectrometry in positive ion mode. Anal. Chem. 73, 1440–1448 (2001).
Mann, M. & Wilm, M. S. Error tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66, 4390–4399 (1994).
Eng, J. K., McCormack, A. I. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Tang, N., Tornatore, P. & Weinberger, S. R. Current developments in SELDI affinity technology. Mass Spectrom. Rev. 23, 34–44 (2004).
Wulfkuhle, J. D., Liotta, L. A. & Petricoin, E. F. Proteomic applications for the early detection of cancer. Nature Rev. Cancer 3, 267–275 (2003).
Sorace, J. M. & Zhan, M. A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4, 24 (2003).
Baggerly, K. A., Morris, J. S. & Coombes, K. R. Reproducibility of SELDI–TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20, 777–785 (2004).
Acknowledgements
We thank our colleagues at the Center for Experimental BioInformatics (CEBI) and Harvard Medical School for fruitful discussions and for critically reading the manuscript. Work at the CEBI is supported by generous grants from the Danish National Research Foundation (Grundforskningsfond) and the European Union sixth framework programme.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Related links
Glossary
- MICROSCALE CAPILLARY HPLC COLUMN
-
High-performance liquid chromatography (HPLC) columns have inner diameters of 50–150 μm and a reversed-phase stationary phase. Reversed phase means that the surface is made using long hydrophobic alkyl chains, so they retain hydrophobic compounds better than hydrophilic ones.
- m/z RATIO
-
(mass-to-charge ratio). Mass spectrometry does not measure the mass of molecules, but instead measures their m/z value. Electrospray ionization, in particular, generates ions with multiple charges, such that the observed m/z value has to be multiplied by z and corrected for the number of attached protons (which equals z) to calculate the molecular weight of a particular peptide.
- QUADRUPOLE MASS SPECTROMETER
-
A mass-selective 'quadrupole section' only allows the passage of ions that have a specific mass to charge (m/z) value by applying a particular sinusoidal potential. Stepping through the m/z range by applying different potentials and detecting the ions that pass through at each m/z value generates the mass spectrum.
- TIME OF FLIGHT (TOF) MASS SPECTROMETER
-
This mass analyser is based on the time it takes ions to travel through an electric-field-free flight tube. In the ion source, all the ions are accelerated to the same kinetic energy. As kinetic energy is a function of mass, the lighter ions fly faster than the heavier ones and therefore reach the detector sooner.
- QUADRUPOLE 'ION TRAPS'
-
In ion traps, the ions are first caught (trapped) in a dynamic electric field and are then sequentially — according to their mass to charge (m/z) value — ejected onto the detector with the help of another electric field. Trapped ions can also be isolated and fragmented within the trap.
- DALTON
-
(Da). The unit of the mass scale, which is defined as one twelfth of the mass of the mono-isotopic form of carbon, 12C (1 Da = 1.6605 × 10−27 kg). Other commonly, but not necessarily correctly, used units of relevance to mass spectrometry are the amu (an atomic mass unit that is based on 16O), the Thomson (the proposed unit for the mass to charge (m/z) scale) and the u ('unit', which is the same as Da).
- DE NOVO SEQUENCING
-
Deriving the amino-acid sequence (primary structure) of a peptide solely from the mass-spectrometry, peptide-fragmentation data (that is, without using databases).
- TOTAL ION CURRENT
-
The sum of all the ion signals in a mass spectrum as a function of elution time.
- EXTRACTED ION CURRENT
-
The sum of the ion signal for a particular mass to charge (m/z) value — that is, for a particular peptide-ion species.
- IONIZATION EFFICIENCY
-
The fraction of peptides in solution that is converted to peptide ions in the gas phase.
Rights and permissions
About this article
Cite this article
Steen, H., Mann, M. The abc's (and xyz's) of peptide sequencing. Nat Rev Mol Cell Biol 5, 699–711 (2004). https://doi.org/10.1038/nrm1468
Issue Date:
DOI: https://doi.org/10.1038/nrm1468
This article is cited by
-
Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing
Nature Communications (2024)
-
Peptide sequencing based on host–guest interaction-assisted nanopore sensing
Nature Methods (2024)
-
Molecular sensitised probe for amino acid recognition within peptide sequences
Nature Communications (2023)
-
Functional Peptides from One-bead One-compound High-throughput Screening Technique
Chemical Research in Chinese Universities (2023)
-
False discovery rate estimation using candidate peptides for each spectrum
BMC Bioinformatics (2022)