Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 14;14(3):206-214.
doi: 10.1038/nchembio.2576.

How many human proteoforms are there?

Affiliations

How many human proteoforms are there?

Ruedi Aebersold et al. Nat Chem Biol. .

Abstract

Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Two parsings of post-translational modifications from the SwissProt database of 20,245 human proteins
(a) Histogram of PTMs in SwissProt for Homo sapiens (taxon identifier: 9606). Phosphorylation (phospho) is by far the most frequently annotated PTM at 38,030 (72%). Note that there are ~400 different types of PTMs known in biology (see: http://www.unimod.org). (b) Histogram of PTMs per SwissProt entry. Note that the distribution of PTMs is not uniform with 75% of entries containing two or fewer annotated PTMs; yet only five entries have >90 annotated PTMs.
Figure 2
Figure 2. Graphical depiction of sources of protein variation that combine to make up proteoforms, each of which map back to a single human gene
Depicted is a single human gene and two of its isoforms, which differ by the coding for several different amino acids of a protein primary sequence (at left); isoforms commonly arise from alternative splicing of RNA and from use of different promoters or translational start sites. Isoform variation combines with site-specific changes to generate human proteoforms (at right); three examples of site-specific changes include single-nucleotide polymorphisms (SNPs) and co- or post-translational modifications like N-glycosylation or phosphorylation, respectively.
Figure 3
Figure 3. Contrasting the potential sources of protein variability versus those that actually occur in combination as proteoforms detectable in actual human systems
(a) Common sources of protein variability include alternative splicing of RNA, single-nucleotide polymorphisms (SNPs) in regions of genes coding for amino acids, and PTMs. Note that there are ~33,000 splice isoforms, ~78,000 site-specific amino acid variants (i.e., polymorphisms and mutations) and ~53,000 PTMs in the October 2017 release of the Human SwissProt database. (b) Depiction of two proteoforms from specific combinations of protein variability.
Figure 4
Figure 4. Levels of organization in the human body
Starting from protein primary structure (proteoforms), the complexity of organ systems is built up in layers. A key concept is that diverse measurement approaches in proteomics seeks analysis of protein molecules at the various levels and contexts represented. Proteoform membership in protein complexes and localization within organelles, cells and tissues are all aspirations of measurement technologies to map protein molecules more precisely in molecular composition, across space and through time.
Figure 5
Figure 5. Proteoforms and their families underlie complex traits and molecular mechanisms operative in living systems
In nature, individual proteoforms (left), arising from variable sources of biological variation like PTMs, often exist in groups of related proteoforms. These dynamic ‘proteoform families’ (middle left) are the true protein products from the same human gene that convey information within signaling and regulatory networks (middle right) that underlie complex traits in wellness and disease (right). Discrete proteoforms and their families offer challenging, high-value targets for direct measurement by top-down proteomics.

Similar articles

Cited by

References

    1. Gaudet P, et al. The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res. 2017;45:D177–D182. - PMC - PubMed
    1. Uhlén M, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347:1260419. - PubMed
    1. Aken BL, et al. Ensembl 2017. Nucleic Acids Res. 2017;45:D635–D642. - PMC - PubMed
    1. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45:D158–D169. This manuscript introduces UniProt, a centralized, authoritative resource for protein sequences. - PMC - PubMed
    1. Duek P, Bairoch A, Gateau A, Vandenbrouck Y, Lane L. Missing protein landscape of human chromosomes 2 and 14: progress and current status. J. Proteome Res. 2016;15:3971–3978. - PubMed

Publication types