Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 27:2:e270.
doi: 10.7717/peerj.270. eCollection 2014.

System wide analyses have underestimated protein abundances and the importance of transcription in mammals

Affiliations

System wide analyses have underestimated protein abundances and the importance of transcription in mammals

Jingyi Jessica Li et al. PeerJ. .

Abstract

Large scale surveys in mammalian tissue culture cells suggest that the protein expressed at the median abundance is present at 8,000-16,000 molecules per cell and that differences in mRNA expression between genes explain only 10-40% of the differences in protein levels. We find, however, that these surveys have significantly underestimated protein abundances and the relative importance of transcription. Using individual measurements for 61 housekeeping proteins to rescale whole proteome data from Schwanhausser et al. (2011), we find that the median protein detected is expressed at 170,000 molecules per cell and that our corrected protein abundance estimates show a higher correlation with mRNA abundances than do the uncorrected protein data. In addition, we estimated the impact of further errors in mRNA and protein abundances using direct experimental measurements of these errors. The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausser et al. (2011), though because one major source of error could not be estimated the true percent contribution should be higher. We also employed a second, independent strategy to determine the contribution of mRNA levels to protein expression. We show that the variance in translation rates directly measured by ribosome profiling is only 12% of that inferred by Schwanhausser et al. (2011), and that the measured and inferred translation rates correlate poorly (R(2) = 0.13). Based on this, our second strategy suggests that mRNA levels explain ∼81% of the variance in protein levels. We also determined the percent contributions of transcription, RNA degradation, translation and protein degradation to the variance in protein abundances using both of our strategies. While the magnitudes of the two estimates vary, they both suggest that transcription plays a more important role than the earlier studies implied and translation a much smaller role. Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimate that approximately 40% of genes in a given cell within a population express no mRNA. Since there can be no translation in the absence of mRNA, we argue that differences in translation rates can play no role in determining the expression levels for the ∼40% of genes that are non-expressed.

Keywords: Gene expression; Mass spectrometry; Protein abundance; Transcription; Translation.

PubMed Disclaimer

Conflict of interest statement

Mark Biggin is an Academic Editor for PeerJ. The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. The steps regulating protein expression.
The steady state abundances of mRNAs and proteins are each determined by their relative rates of production (i.e., transcription or translation) and their rates of degradation.
Figure 2
Figure 2. A non-linear bias in protein abundance estimates and its correction.
(A) The y axis shows the ratios of 61 individually derived protein abundance estimates each divided by the corresponding abundance estimate from Schwanhausser et al.’s (2011) second whole proteome dataset. The x axis shows the abundance estimate from Schwanhausser et al.’s (2011) second whole proteome dataset. The red line indicates the locally weighted line of best fit (lowess parameter f = 1.0), and the vertical dotted grey lines show the locations of the 1st quartile, median and 3rd quartile of the abundance distribution of the 5,028 proteins detected in the whole proteome analysis. (B) The same as panel A. except that the whole proteome estimates of Schwanhausser et al. (2011) have been corrected using a two-part linear model and the abundances from the 61 individual protein measurements, see Fig. 3B.
Figure 3
Figure 3. Calibrating absolute protein abundances.
(A) The relationship between iBAC mass spectrometry signal (x axis) and the amounts of the 20 ‘spiked in’ protein standards (y axis) used by Schwanhausser et al. (2011) to calibrate their whole proteome abundances (data kindly provided by Matthias Selbach, Dataset S2). The line of best fit is shown (red). (B) The relationship between individually derived estimates for 61 housekeeping proteins (y axis) and Schwanhausser et al.’s (2011) second whole proteome estimates (x axis). The two part line of best fit used to correct the second whole proteome estimates is shown (solid red line) as is the single linear regression (dashed red line). (C) The fit of different regression models for the data in panel b. The y axis shows the leave-one-out cross validation root mean square error for each model. The x axis shows the protein abundance used to separate the data for two part linear regressions. The red curve shows the optimum change point for a two part linear model is at an abundance of ∼106 molecules per cell. The dashed red horizontal line shows the root mean square error for the single linear regression.
Figure 4
Figure 4. Protein abundance estimates versus mRNA abundances.
(A) The relationship between Schwanhausser et al.’s (2011) second protein abundance estimates versus mRNA levels for 4,212 genes in NIH3T3 cells. The linear regression of the data is shown in red, the 50% prediction band by dashed green lines, and the 95% prediction band by dashed blue lines. (B) The relationship between our corrected estimates of protein abundance versus mRNA levels. The linear regression and prediction bands are labeled as in panel A.
Figure 5
Figure 5. Measured versus inferred translation rates.
(A) The relative density of ribosomes per mRNA for each gene directly measured by ribosome profiling (Guo et al., 2010; Ingolia et al., 2011; Subtelny et al., in press) (colored lines) compared to the translation rates for each gene inferred by Schwanhausser et al. (2011) (black lines). The distribution of values from the ribosome profiling experiments was scaled proportionally to have the same median as that of the Schwanhausser et al. (2011) values, and the gene frequencies of the each distribution was normalized to have the same total. The locations of the 2.5 and 97.5 percentiles of the two distributions for NIH3T3 cells are shown as dashed lines. (B) As panel A. except that the data for all genes in the Schwanhausser et al. (2011) dataset are shown in the solid black line and data for the genes in the intersection of the Schwanhausser et al. (2011) and Subtelny et al.’s (in press) datasets are shown in dashed lines. The variances and numbers of genes for each dataset are given in Table S1.
Figure 6
Figure 6. Correlation between measured versus inferred translation rates.
The relationship between the measured rates of translation determined by Subtelny et al. (in press) using ribosome footprinting versus the inferred rates of translation determined by Schwanhausser et al. (2011) for the same set of 3,126 genes in NIH3T3 cells, see Table S1 for further details. The units shown are those provided in the original datasets. The linear regression is shown.
Figure 7
Figure 7. Comparison of corrected and uncorrected whole proteome abundance estimates.
(A) The distributions of protein abundance estimates for 4,680 orthologous proteins in NIH3T3 cells (black lines) or HeLa cells (red lines). The values from Schwanhausser et al.’s (2011) second estimates and Wisniewski et al.’s (2012) estimates are shown as dashed lines. The values for our corrected abundance estimates are shown as solid lines. (B) The ratios of HeLa cell whole proteome abundance estimates divided by individual measurements from the literature for 66 proteins. Results for the original data from Wisniewski et al. (2012) (dashed line) and after these values have been corrected (solid line) are plotted. The green dashed vertical line indicates a ratio of 1.
Figure 8
Figure 8. The relationship between true and measured protein and mRNA levels.

Comment in

Similar articles

Cited by

References

    1. Ahrne E, Molzahn L, Glatter T, Schmidt A. Critical assessment of proteome-wide label-free absolute abundance estimation strategies. Proteomics. 2013;13:2567–2578. doi: 10.1002/pmic.201300135. - DOI - PubMed
    1. Ambros V. MicroRNAs and developmental timing. Current Opinion in Genetics & Development. 2011;21:511–517. doi: 10.1016/j.gde.2011.04.003. - DOI - PMC - PubMed
    1. Aoyagi N, Wassarman DA. Developmental and transcriptional consequences of mutations in Drosophila TAF(II)60. Molecular Cell Biology. 2001;21:6808–6819. doi: 10.1128/MCB.21.20.6808-6819.2001. - DOI - PMC - PubMed
    1. Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of microRNAs on protein output. Nature. 2008;455:64–71. doi: 10.1038/nature07242. - DOI - PMC - PubMed
    1. Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Analytical and Bioanalytical Chemistry. 2012;404:939–965. doi: 10.1007/s00216-012-6203-4. - DOI - PubMed

LinkOut - more resources