Abstract
Large scale surveys in mammalian tissue culture cells suggest that the protein expressed at the median abundance is present at 8,000–16,000 molecules per cell and that differences in mRNA expression between genes explain only 10–40% of the differences in protein levels. We find, however, that these surveys have significantly underestimated protein abundances and the relative importance of transcription. Using individual measurements for 61 housekeeping proteins to rescale whole proteome data from Schwanhausser et al. (2011), we find that the median protein detected is expressed at 170,000 molecules per cell and that our corrected protein abundance estimates show a higher correlation with mRNA abundances than do the uncorrected protein data. In addition, we estimated the impact of further errors in mRNA and protein abundances using direct experimental measurements of these errors. The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausser et al. (2011), though because one major source of error could not be estimated the true percent contribution should be higher. We also employed a second, independent strategy to determine the contribution of mRNA levels to protein expression. We show that the variance in translation rates directly measured by ribosome profiling is only 12% of that inferred by Schwanhausser et al. (2011), and that the measured and inferred translation rates correlate poorly (R2 = 0.13). Based on this, our second strategy suggests that mRNA levels explain ∼81% of the variance in protein levels. We also determined the percent contributions of transcription, RNA degradation, translation and protein degradation to the variance in protein abundances using both of our strategies. While the magnitudes of the two estimates vary, they both suggest that transcription plays a more important role than the earlier studies implied and translation a much smaller role. Finally, the above estimates only apply to those genes whose mRNA and protein expression was detected. Based on a detailed analysis by Hebenstreit et al. (2012), we estimate that approximately 40% of genes in a given cell within a population express no mRNA. Since there can be no translation in the absence of mRNA, we argue that differences in translation rates can play no role in determining the expression levels for the ∼40% of genes that are non-expressed.
Keywords: Transcription, Translation, Mass spectrometry, Gene expression, Protein abundance
Introduction
The protein products of genes are expressed at very different levels from each other in a mammalian cell. Thousands of genes are not detectably expressed. Of those that are, their proteins are present at levels that differ by five orders of magnitude. Cytoplasmic actin, for example, is expressed at 1.5 × 108 molecules per cell (Kislauskis et al., 1997), whereas some transcription factors are expressed at only 4 × 103 molecules per cell (Biggin, 2011). There are four major steps that determine differences in protein expression: the rates at which genes are transcribed, mRNAs are degraded, proteins are translated, and proteins are degraded (Fig. 1). The combined effect of transcription and mRNA degradation together determines mRNA abundances (Fig. 1). The joint effect of protein translation and protein degradation controls the differences between mRNA and protein concentrations (Fig. 1).
Transcription has long been regarded as a dominant step and is controlled by sequence specific transcription factors that differentially interact with cis-regulatory DNA regions. The rates of the other three steps, however, vary significantly between genes as well (Boisvert et al., 2012; Cambridge et al., 2011; Cheadle et al., 2005; de Sousa Abreu et al., 2009; Eden et al., 2011; Guo et al., 2010; Han et al., 2014; Hentze and Kuhn, 1996; Hsieh et al., 2012; Ingolia et al., 2011; Kristensen et al., 2013; Loriaux & Hoffmann, 2013; Rabani et al., 2011; Schwanhausser et al., 2011; Sharova et al., 2009; Yang et al., 2003). MicroRNAs, for example, differentially interact with mRNAs to alter rates of RNA degradation and protein translation (Ambros, 2011; Baek et al., 2008; Elmen et al., 2008; Gennarino et al., 2012; Guo et al., 2010; Hobert, 2008; Krutzfeldt et al., 2005; Pillai et al., 2007; Rajewsky, 2011; Selbach et al., 2008; Subtelny et al., in press; Xiao et al., 2007).
To quantify the relative importance of each of the four steps, label free mass spectrometry methods have been developed that measure the absolute number of protein molecules expressed per cell for thousands of genes (Bantscheff et al., 2012; Beck et al., 2011; Maier, Guell & Serrano, 2009; Schwanhausser et al., 2011; Vogel et al., 2010; Vogel & Marcotte, 2012). By comparing these data to mRNA abundance data, one can determine the importance of transcription and mRNA degradation combined versus the importance of protein translation and protein degradation combined (Maier, Guell & Serrano, 2009; Schwanhausser et al., 2011; Vogel & Marcotte, 2012) (Fig. 1). By measuring mRNA degradation and protein degradation rates as well, the rates of transcription and translation can be additionally inferred indirectly. Using this approach to study mouse NIH3T3 fibroblasts, Schwanhausser et al. (2011) concluded that mRNA levels explain ∼40% of the variability in protein levels; that the cellular abundance of proteins is predominantly controlled at the level of translation; that transcription is the second largest determinant; and that the degradation of mRNAs and proteins play a significant but lesser role.
The above work has provided critically important datasets and an initial framework for analysis. We noticed, however, that Schwanhausser et al.’s (2011) protein abundance estimates are mostly lower than established values for individual proteins in the literature and that statistical methods to quantitate the impact of experimental error had not been employed. We therefore set out to explore if we could refine the analysis of these datasets and to compare our results to those of Schwanhausser et al. (2011) and other systemwide studies.
Results and discussion
A non-linear underestimation of protein abundances
Our starting point was a set of published abundances of 53 mammalian housekeeping proteins, most of which are based on SILAC mass spectrometry or western blot data (Biggin, 2011; Brosi, Hauri & Kramer, 1993; Gregory et al., 2002; Hanamura et al., 1998; Kimura et al., 1999; Kislauskis et al., 1997; Princiotta et al., 2003; Wollfe, 1998; Wong et al., 2011; Zeiler et al., 2012). On average these established estimates are 16 fold higher than those from Schwanhausser et al.’s (2011) original label free mass spectrometry data (Dataset S1). Once we brought this discrepancy to the authors’ attention, they upwardly revised their label free abundance estimates for all 5,028 detected proteins and in addition provided western blot or Selected Reaction Monitoring (SRM) mass spectrometry measurements for eight polypeptides in NIH3T3 cells (see Corrigendum; Schwanhausser et al., 2011). However, Schwanhausser et al.’s (2011) second whole proteome abundance estimates are still lower than individual measurements for proteins expressed below 106 molecules per cell, with the lowest abundance proteins showing the largest discrepancy (Fig. 2A; Dataset S1).
Western blot and SILAC mass spectrometry measurements show the same discrepancy versus the label free whole proteome data (Dataset S1). For example, for proteins expressed below 1 million molecules per cell, the 26 SILAC measurements are a median of 2.95 fold higher than Schwanhausser et al.’s (2011) second estimates, and the 19 western blot measurements are 3.10 fold higher. This suggests that the discrepancy is not due to error in the individual measurements as a similar bias in two independent methods is unlikely.
Of the 61 individual measurements of protein abundance available to us, 15 were made in NIH3T3 cells and 42 were made in HeLa cells. The discrepancy between Schwanhausser et al.’s (2011) second whole proteome abundances and these individual measurements is not due to differences in expression levels between HeLa and NIH3T3 cells for the following reasons. One, it is unlikely that such a difference would only occur for lower abundance proteins. Two, five of the individual measurements for lower abundance proteins (Orc2, Orc4, HDAC3, NFkB1, and NFkB2) were made in NIH3T3 cells and are on average 3.7 fold higher than the second whole proteome estimates in this same cell line (Dataset S1). Three, later in the paper we show that collectively the 61 individual proteins measured have on average the same relationship in expression values versus all other cellular proteins in both NIH3T3 and HeLa cells. Finally, Schwanhausser et al.’s (2011) second estimates for RNA polymerase II and general transcription factors such as TFIIB and TFIIE are only 1.6 fold higher than those in yeast (Borggrefe et al., 2001) and are 7.1 times less than those in HeLa cells (Kimura et al., 1999). Yeast cells have 1/40th the volume, 1/200th the amount of DNA and 1/4 the number of genes of NIH3T3 and HeLa cells (Milo et al., 2010). Two fold reductions in the concentrations of a single general transcription factor have, in some cases, phenotypic consequence (Aoyagi & Wassarman, 2001; Deutschbauer et al., 2005; Eissenberg et al., 2002; Kim et al., 2010). Thus, it is unlikely that a rapidly dividing mammalian cell could function with much larger reductions in the amounts of all of these essential regulators to levels close to those found in yeast.
Correcting the non-linear bias
Schwanhausser et al. (2011) calibrated protein abundances by spiking known amounts of protein standards into a crude protein extract from NIH3T3 cells and then measuring the abundances of several thousand proteins in the mixture by iBAC label free mass spectrometry. The 20 ‘spiked in’ protein standards detected in this experiment, however, were present at the equivalent >8.0 × 105 molecules per cell, a level that represents only the most highly expressed 11% of the proteins detected (Fig. 3A) (M Selbach, personal communication; Schwanhausser et al., 2011). To convert mass spectrometry signals to protein abundances, Schwanhausser et al. (2011) assumed that a linear relationship defined using the 20 ‘spiked in’ standards holds true for proteins at all abundances (Fig. 3A). The discrepancy between the resulting estimates and individual protein measurements (Fig. 2A), however, suggests that this assumption is not valid. A recent benchmarking study also supports this conclusion, showing that in general in the iBAC method ‘low-abundance proteins were dramatically underestimated’ (Ahrne et al., 2013). We therefore employed the 61 individual protein measurements from the literature as they span a much wider abundance range. In a plot of these data versus Schwanhausser et al.’s (2011) second whole proteome estimates, we found that a two-part linear regression gave a statistically better fit over a single regression (Figs. 3B and 3C) (p-value = 0.002, Materials and Methods). We then used this two-part regression to derive new abundance estimates for all 5,028 proteins in Schwanhausser et al.’s (2011) dataset (Dataset S1). As Fig. 2B shows, the correction removes the non-linear bias.
In our rescaled data, the median abundance protein is present at 170,000 molecules per cell (Fig. 2B), considerably higher than Schwanhausser et al.’s (2011) original estimate of 16,000 molecules per cell and significantly above their second estimate of 50,000 molecules per cell. For low abundance proteins the effect is larger. In our corrected data, the median sequence specific transcription factor is present at 71,000 molecules per cell versus Schwanhausser et al.’s (2011) estimates of first 3,500 then 9,300 molecules per cell (Dataset S1). Our correction reduces the range of detected abundances by ∼50 fold (unlogged) compared to Schwanhausser et al.’s second estimates (Dataset S1) and the variance in protein levels from 0.97 (log10) to 0.36 (log10).
Corrected protein abundances show an increased correlation with mRNA abundances
As an independent check on the accuracy of our corrected abundances, we compared them to Schwanhausser et al.’s (2011) RNA-Seq mRNA expression data. Our corrected protein abundances correlate more highly with mRNA abundances than do Schwanhausser et al.’s (2011) second whole proteome estimates (compare Figs. 4A and 4B). The increase in correlation coefficient is highly significant (p-value < 10-29) (Materials and Methods), arguing that our non-linear correction to the whole proteome abundances has increased the accuracy of these estimates. The most dramatic change is that the scatter about the line of best fit is reduced and shows a stronger linear relationship. The 50% prediction band shows that prior to correction the half of proteins whose abundances are best predicted by mRNA levels are expressed over an 11 fold range (unlogged), but after correction they are expressed over a narrower, 4 fold range (Figs. 4A and 4B). The correction reduces the width of the 95% prediction band even further, by 18 fold.
For our corrected data, the median number of proteins translated per mRNA is 9,800 compared to Schwanhausser et al.’s (2011) original estimate of 900 and their second estimate of 2,800. In yeast, the ratio of protein molecules translated per mRNA is 4,200–5,600 (Ghaemmaghami et al., 2003; Lu et al., 2007). Given that mammalian cells have a higher protein copy number than yeast (Milo et al., 2010), it is not unreasonable that the ratio in mammalian cells would be higher.
Estimating the impact of molecule specific measurement error
In addition to the above general error in scaling protein abundances, there are additional sources of experimental error that uniquely affect data for each protein and mRNA differently. As a result of these molecule specific measurement errors, the coefficient of determination between measured mRNA and measured protein levels—i.e., R2 shown in Fig. 4B—is lower than the actual value between true protein and true mRNA levels. With an accurate estimate of the errors, it is possible to calculate the increased correlation expected between true protein and true mRNA abundances. Because the variance in the residuals in Fig. 4B (i.e., the displacement along the y axis of data points about the line of best fit) is composed of both experimental error and the genuine differences in the rates of translation and protein degradation between genes, once the experimental error has been estimated, it is also possible to infer the combined true effects of translation and protein degradation.
There are two classes of molecule specific experimental error: stochastic and systematic. Stochastic error, or imprecision, is the variation between replica experiments and is estimated from this variation. Systematic error, or inaccuracy, is the reproducible under or over estimation of each data point, and is estimated by comparing the results obtained with the assay being used to those from gold standard measurements obtained with the most accurate method available.
Schwanhausser et al. (2011) limited their estimation of experimental error to stochastic errors. Because our correction of the whole proteome abundances reduces the total variance in measured protein expression levels, we first reestimated the proportion of the variance in the residuals in Fig. 4B that is due to stochastic measurement error using replica datasets (Materials and Methods). We find that 7% of this variance results from stochastic protein error and 0.8% from stochastic mRNA error.
Schwanhausser et al. (2011), however, also noted a significant variance between their whole genome RNA-Seq data and NanoString measurements for 79 genes (R2 = 0.79 in Figure S8(A) in Schwanhausser et al., 2011), though they did not take this into account subsequently. RNA-Seq is well known to suffer reproducible several fold biases in the number of DNA sequence reads obtained for different GC content genomic regions (Cheung et al., 2011; Dohm et al., 2008). In contrast, NanoString gives an accurate measure of nucleic acid abundance as correlation coefficients of R2 = 0.99 are obtained when NanoString data are compared to known concentrations of nucleic acid standards (Geiss et al., 2008). Thus, it is reasonable to consider NanoString as a gold standard that can be used to assess the systematic error in the RNA-seq data by assuming that the variance between the two methods is due mostly to systematic error in RNA-seq. Using Analysis of Variance (ANOVA), the variance in Schwanhausser et al.’s (2011) NanoString/RNA-Seq comparison can be shown to be equivalent to 23.3% of the variation in the residuals in Fig. 4B, 29 fold larger than the stochastic component of mRNA error (see Materials and Methods for a discussion of the assumptions used in this analysis).
It is also important to assess the systematic error in the whole proteome abundances as label free mass spectrometry includes such biases (Ahrne et al., 2013; Bantscheff et al., 2012; Kuntumalla et al., 2009; Lu et al., 2007; Peng et al., 2012). In principle the ‘spiked in’ protein standards in Schwanhausser et al.’s (2011) calibration experiment (i.e., the data in Fig. 3A) should provide gold standard data. In practice, however, the variance in mass spectrometry estimates for protein standards present at supposedly the same amounts is too high (i.e., the scatter along the x axis in Fig. 3A). This variance would contribute 61% to the variance in the residuals in Fig. 4B, yet the variance of the residuals between the corrected whole proteome estimates and the 61 individual protein measurements (i.e., the scatter along the x axis about the solid red line in Fig. 3B) would contribute only 44%. Since the western blot and SILAC methods used to make the 61 individual protein measurements introduce some experimental error, it seems likely that the commercial protein standards used by Schwanhausser et al. (2011) were not as accurately prepared at the correct protein concentrations as one would expect. Since no other suitable gold standard is available, we are thus unable to estimate the systematic protein error, though it is likely to be less than 44% of variance in the residuals in Fig. 4B.
Taking the stochastic protein error as a minimum estimate of protein error and the variance from the NanoString/RNA-Seq comparison as an estimate of all RNA errors, it can be shown that true mRNA levels explain at least 56% of true protein levels, and by extension protein degradation and translation combined explain no more than 44% (see Materials and Methods).
Estimating the relative importance of transcription, mRNA degradation, translation and protein degradation
In addition to determining protein and mRNA abundances, Schwanhausser et al. (2011) also directly measured mRNA and protein degradation rates and calculated the percentage that each contributed to the variance in protein abundances. Using this information, it is possible to determine the relative importance of transcription, RNA degradation, translation and protein degradation for different scenarios (Table 1, see Materials and Methods). For the 4,212 genes whose protein and mRNA expression was detected, our analysis suggests that transcription explains ∼38% of the variance in true protein levels, RNA degradation explains ∼18%, translation ∼30%, and protein degradation ∼14% (Table 1). Clearly these estimates are tentative and depend on the particular assumptions we have made. We believe, though, that they will prove more accurate than Schwanhausser et al.’s (2011) suggestion that translation is the predominant determinant of protein expression and that mRNA levels explain around 40% of the variability in protein levels (Table 1).
Table 1. The contribution of different steps in gene expression to the variance in protein abundances between genes.
Variance in protein levels (log10)* | Percent contribution to variance in protein levels | |||||
---|---|---|---|---|---|---|
mRNA(%) | Transcription(%) | RNA degradation(%) | Translation(%) | Protein degradation(%) | ||
Schwanhausser 2nd dataa | 0.97 | 40 | 34 | 6 | 55 | 5 |
Measured protein error strategyb | 0.34 | 56 | 38 | 18 | 30 | 14 |
Measured translation strategyc | 0.61 | 81 | 71 | 10 | 11 | 8 |
In this column, the value given for Schwanhausser et al.’s (2011) 2nd data is the variance in their measured protein abundances; the remaining values are our estimate for the variance in true protein levels for different scenarios.
Estimates from Schwanhausser et al. (2011) based on the 4,212 genes for which NIH3T3 cell protein and mRNA abundance data are available.
Our estimates for the same 4,212 genes studied by Schwanhausser et al. (2011) after correcting the overall scaling of the NIH3T3 cell protein abundance data and taking several sources of molecule specific experimental error into account: stochastic protein error and all mRNA errors.
Our estimates for the same 4,212 genes studied by Schwanhausser et al. (2011) derived using measured translation rates from Subtelny et al. (in press).
Direct measurements of translation rates support our analysis
Direct measurements of system wide translation rates using ribosome profiling (Guo et al., 2010; Ingolia et al., 2011; Subtelny et al., in press) provide independent evidence that translation rates vary less than Schwanhausser et al. (2011) suggest. The distributions of the rates of translation rates measured in mouse embryonic stem cells, mouse neutrophils, mouse NIH3T3 cells and human HeLa cells are all significantly narrower than Schwanhausser et al. (2011) inferred for mouse NIH3T3 cells (Fig. 5A; Table S1). For NIH3T3 cells the translation rates measured by ribosome profiling for 95% of the genes detected vary only 5.8 fold, but the rates inferred for 95% of genes by Schwanhausser et al. (2011) vary 115 fold (Fig. 5A). Because each of these datasets contain differing numbers of genes (Table S1), to provide a more direct comparison we took the intersection of genes detected by Schwanhausser et al. (2011) and by ribosome profiling in NIH3T3 cells (Fig. 5B). The variance in measured translation rates for the genes in the intersection is only 12% of the variance in rates inferred by Schwanhausser et al. (2011) for these same genes (Fig. 5B; Table S1).
Having direct measurements of the variance in translation rates opens up a second strategy to estimate the relative importance of each step in gene expression (Materials and Methods). In our first strategy—the measured protein error strategy—protein degradation rates and errors in protein and mRNA abundances were determined from direct experimental data; and the variance in true protein levels explained by translation was inferred as that part of the variance in the residuals in Figure 4B that is not explained by the three experimentally measured terms. In our second strategy—the measured translation strategy—translation rates, protein degradation rates and mRNA errors are determined from direct experimental data; and the variance in measured protein levels explained by protein error is inferred as that part of the variance in the residuals in Figure 4A that is not explained by the sum of variances of the three experimentally measured components (Materials and Methods). This measured translation strategy is thus independent of our rescaling of Schwanhausser et al.’s (2011) second protein abundance estimates and of our estimate of stochastic protein measurement error.
According to our second strategy, for NIH3T3 cells the variance in true protein levels is 63% of the variance in Schwanhausser et al.’s (2011) measured protein abundances; mRNA levels contribute 81% to the variance in true protein expression; transcription 71%; RNA degradation 10%; translation 11%; and protein degradation 8% (Table 1). Despite the significant differences in the underlying data and assumption used, these results agree broadly with those of our first strategy (Table 1). Both strategies suggest that the variance in Schwanhausser et al.’s (2011) second protein abundance estimates is too high. Both suggest that translation contributes less to protein levels and that transcription contributes more that Schwanhausser et al. (2011) claimed. In effect, the measured rates of translation provide independent support for our rescaling of Schwanhausser et al.’s (2011) protein abundances and our estimates of stochastic protein error, and visa versa.
Our second strategy, though, does estimate that mRNA levels and transcription explain a higher percent of protein expression than the first (Table 1), but this is not entirely unexpected. In our first strategy, we were not able to take account of systematic, molecule specific errors in protein abundances because appropriate control measurements were not available. Thus, this first strategy could well have underestimated error. In contrast, our second strategy estimates all types of protein abundance errors in a single term and thus has the potential to be the more accurate if the error in the ribosome profiling and protein degradation data is not too large.
To further explore the relationship between our two strategies, we compared the correlation between translation rates inferred by Schwanhausser et al. (2011) and those measured by ribosome profiling in NIH3T3 cells (Fig. 6). The coefficient of determination is small (R2 = 0.13), indicating that the ribosome profiling data explain only 13% of the variance in Schwanhausser et al.’s (2011) inferred rates. Considered in isolation this result does not establish if the poor correlation is due to errors in either or both datasets. However, our measured protein error strategy shows that the variance in true translation rates contributes no more than 19% to the variance in Schwanhausser et al.’s (2011) inferred translation rates, with the remaining 81% of the variance being due to experimental error (Table 1; 0.19 = (0.34 × 0.30)∕(0.97 × 0.55)). The close agreement of this estimate with the actual correlation between measured and inferred translation rates (R2 ≤ 0.19 versus R2 = 0.13) suggests that the poor correlation is almost entirely due to error in Schwanhausser et al.’s (2011) inferred rates. In addition, this result provides further evidence that our two strategies broadly agree, with the measured protein error strategy potentially underestimating the degree of error in Schwanhausser et al.’s (2011) data.
Ribosome profiling has also shown that translation rates change only several fold upon cellular differentiation and, with the exception of the translation machinery, the change affects all expressed genes to a similar degree (Ingolia et al., 2011). Other systemwide studies, including a separate analysis by Schwanhausser et al. (2011), also suggest that the differential regulation of translation may be limited to modest changes at a subset of genes (Baek et al., 2008; Hsieh et al., 2012; Kristensen et al., 2013; Schwanhausser et al., 2011; Selbach et al., 2008). This work seems consistent with our analysis and suggests that translation may be used chiefly to fine tune protein expression levels.
Estimating the number of non-transcribed genes
Both Schwanhausser et al.’s (2011) and all of our analyses presented above consider only those genes whose protein and mRNA expression was detected. There are many thousands of other genes, however, which express no mRNA and as a result cannot be translated. To estimate the proportion of such genes in a typical cell, we made use of a detailed analysis by Hebenstreit et al. (2011), Hebenstreit et al. (2012), who showed that there is a trimodal distribution of mRNA expression when the data is derived as an average for a population of cells of a single cell type (Figure S1). The first mode contains Highly Expressed (HE) genes, present at one or more molecules per cell; the second mode is comprised of Low Expressed (LE) genes, which are not expressed in most cells but—as shown by single molecule fluorescent in situ hybridization—are present at one to several molecules per cell in a small percent of cells; and the third mode contains genes that are not detectably expressed (NE genes) and thus, given the assays sensitivity, are present at less than one mRNA molecule per 100 cells. LE genes tend to be closer to HE genes on the chromosome than are NE genes, and it has been suggested that this proximity may allow escape from repressive chromatin structures in a few cells, explaining the stochastic bursts of rare transcription observed (Hebenstreit et al., 2012; Hebenstreit et al., 2011).
To account for variation in the expression of individual genes between cells, which all LE genes at a minimum must suffer, we assume that the general distribution of mRNA expression levels does not vary from cell to cell even when the expression of individual genes does. The mRNA expression of each LE gene was divided into a component representing expression of one mRNA molecule in some cells and a second component representing the remaining cells that express no mRNA (Materials and Methods). This yields 8,763 NE and LE gene equivalents that are not expressed and 12,546 LE and HE gene equivalents that are expressed. For the 8,763 non-expressed gene equivalents, the complete absence of their mRNAs from the cell means that they are not being translated in these cells. Therefore, there can be no variation in the rates at which they are translated. Instead, we assume that the absence of transcription is overwhelmingly the reason why these genes express no protein.
Implication for other system wide studies
Two other systemwide estimates of protein abundance in mammalian cells are, like Schwanhausser et al.’s (2011), lower than ours. These two reports suggest that the median abundance protein detected is present at 8,000 (Vogel et al., 2010) or 9,700 (Beck et al., 2011) molecules per cell versus our estimate of 170,000 molecules per cell. Since these lower estimates provide less than 1/10th of the number of histones needed to cover the diploid genome with nucleosomes and are lower than published estimates for a wide array of other housekeeping proteins, it is unlikely that they are accurate.
Another study by Wisniewski et al. (2012) provided protein abundance estimates for HeLa cells that are generally higher than ours and spread over a broader range (Fig. 7A). These estimates are 240% higher on average than the set of individual protein measurements from the literature (Dataset S3, Fig. 7B). Since over 80% of these individual measurements were made for proteins in HeLa cells, Wisniewski et al.’s (2012) estimates must be incorrectly scaled. Using our two part linear regression strategy, we therefore corrected Wisniewski et al.’s (2012) whole proteome data (Materials and Methods, Figure S2; Dataset S3), bringing the average variation between the whole proteome estimates and individual protein measurements to within 6% of each other (Fig. 7B; Dataset S3). Interestingly, the correction dramatically increases the similarity between the distributions of protein abundances in HeLa and NIH3T3 cells for all orthologous proteins (Fig. 7A). This establishes the important point, mentioned at the beginning of the Results: in aggregate the 60+ housekeeping proteins show a similar relationship to the expression values of all other cellular proteins in both cell lines, and thus the discrepancies with the uncorrected whole proteome data are not due to differences in expression levels in HeLa versus NIH3T3 cells. The correction also increases the correlation between HeLa cell protein and HeLa mRNA abundances to a statistically significant extent (p-value, 6 × 10−20) and reduces the 50% and 95% confidence bounds for this relationship by 1.7 fold and 4.6 fold respectively. Wisniewski et al. (2012) scaled their protein abundances using the total cellular protein content and the sum of the mass spectrometry signals for all detected polypeptides. They assumed that mass spectrometry signals are proportional to protein abundance. In contrast, our scaling strategy makes no such assumption and instead uses many individual measurements of housekeeping proteins to estimate a multipart (spline) function. The increased correlations obtained with individual protein measurements and with mRNA abundances for two cell lines suggests that our scalings are the more accurate.
Other estimates for the contribution of mRNA levels in determining protein expression in mammals are lower than ours, suggesting that mRNA levels contribute 10%–40% (Maier, Guell & Serrano, 2009; Vogel & Marcotte, 2012). In comparison, we estimate that mRNA abundance explains 56%–81% for a set of 4212 detected proteins. We also have suggested that for the 40% of genes in a given cell that express no mRNA, translation rates likely play no role in determining protein expression levels. The other groups neither took systematic experimental errors into account or made use of direct measures of translation rates and generally do not discuss non-transcribed genes. For this reason, their likely analyses underestimate the contribution of transcription.
Conclusions
Quantitative whole proteome analyses can offer profound insights into the control of gene expression and provide baseline parameters for much of systems biology. As these important new technologies continue to be refined, it is critical that the data be correctly scaled, that experimental errors be measured and accounted for as much as possible, that all genes be considered, and that direct measurements of each step in gene expression be made. Additional measurements and controls will be needed to derive a more assured systemwide understanding of protein and mRNA abundances and the relative importance of each of the four steps in gene expression.
Materials and methods
Correcting protein abundance
For NIH3T3 cells, all credible individual protein abundance measurements available to us for housekeeping proteins (a total of 61 proteins, Dataset S1) were log10 transformed along with the corresponding estimates from Schwanhausser et al.’s (2011) second whole proteome dataset. Model selection of different regressive models by leave-one-out cross-validation was used to fit the training data (Bickel & Doksum, 2001). This showed that a plausible two-part linear regression with a change point at 106 molecules per cell (line < 1 × 106… slope = 0.56, intercept = 2.64; line > 1 × 106… slope = 1.06, intercept = −0.41) fit the data far better than by chance (likelihood ratio test bootstrap p-value = 0.002 Bickel & Doksum, 2001; Figs. 3B and 3C). The resulting two-part linear model was used to correct all 5028 protein abundance estimates (Fig. 2B, Dataset S1).
The null hypothesis that the correlation coefficient of the uncorrected Schwanhausser et al. (2011) protein abundance estimates versus mRNA estimates (R1 = 0.626) is equal to that of our corrected protein estimates versus mRNA estimates (R2 = 0.642) was tested. The method for comparing dependent correlation coefficients (Olkin & Finn, 1990) was employed because both correlations involve the same mRNA-seq data and it is reasonable to assume that the uncorrected and corrected protein abundance estimates and the mRNA estimates have a multivariate Gaussian distribution. The resulting two-sided p-value < 10-29 shows that R2 is significantly larger than R1.
To correct protein abundance estimates for HeLa cells (Wisniewski et al., 2012), the same strategy used for NIH3T3 cells was used. A two-part linear regression with a change point at 106.8 molecules per cell fit the data far better than by accident (likelihood ratio test bootstrap p-value = 0.001) (Figure S2). The resulting two-part linear model was used to correct all HeLa cell protein abundance estimates (Fig. 7; Dataset S3). The correlation of HeLa cell protein abundance estimates with mRNA abundances was determined using the mean values of replica HeLa cell RNA-Seq datasets from the ENCODE consortium (The ENCODE Project Consortium, 2011) (GEO Accession ID GSM765402). The hypothesis that our corrected protein abundances correlate more highly with these HeLa mRNA abundances than the uncorrected estimates was tested as above, resulting in a two sided p-value of 6 × 10-20.
The contribution of mRNA to protein levels: measured protein error strategy
The variance term in a linear model between measured protein abundance (MP) (response) and measured mRNA levels (MR) (predictor) is decomposed in a standard way (ANOVA; Bickel & Doksum, 2001) into three components (Fig. 8). These components of the variance in the residuals represent mRNA measurement error (eR), protein measurement error (eP), and the variance in a linear model between true protein abundance (TP) and true mRNA levels (TR) that results from the centered genuine differences in the rates of protein degradation and translation (PDT). The measured protein abundances considered in this case are our rescaled estimates.
Statistically, we can write three linear models from Fig. 8
(1) |
(2) |
(3) |
where TR, MR, TP, MP are abundance values on a log 10 scale; the three sources of variation (eR, eP and PDT) are assumed to be independent random variables with mean 0; the amount of protein degradation and translation (PDT) is taken to be independent of true mRNA levels (TR) on the basis of partial evidence: the variance in the residuals in Fig. 4B is similar for different mRNA abundances; the reversal of the causal relationship between TR and MR in model (1) assumes that TR and MR have an approximately joint Gaussian distribution; the slope of TP in model (3) is assumed to be 1 because the ratios between the 61 protein published abundance measurements and our corrected estimates are close to 1 (Fig. 2B); and finally we note that implicit in the analysis of variance is the assumption that the various datasets employed can be thought of as originating from a relatively homogeneous superpopulation. Combining (1)–(3), we write the linear model between measured protein abundance and measured mRNA levels as
(4) |
Based on model (4)
-
i.We first estimated as var(beR + PDT + eP) as and bbR as from fitting the above model with the 8,424 corrected mass spec and RNA-Seq data points pooled from the two replicates (Dataset S1). By independence, we have
-
ii.
We next estimated var(eR) as and bR as from fitting model (1) with the 77 NanoString (‘TR’) versus RNA-Seq (‘MR’) data points, after removing two outliers (Dataset S2).
-
iii.We could not estimate var(eP) from directly fitting model (3), as TP data is not available. As a surrogate, we estimated var(eP) as from the following linear model that quantifies the stochastic error in mass spec replicate data:
where MPij is the corrected mass spec data for the ith protein in the jth replicate in Schwanhausser et al. (2011), and avgMPi is the average of our corrected protein data for the ith protein, i = 1,…, 4212 (Dataset S1). Please note that is likely an underestimate of the protein error as we only consider the stochastic error, not the systematic error.(5) -
iv.From the estimates , , , and above, we estimate var(PDT) as
Hence, we have successfully decomposed the variance estimate, i.e., the estimated variance of residuals between measured protein levels and measured mRNA levels, into 3 components:
-
∙
—RNA error (23.3% of )
-
∙
—protein error (7% of )
-
∙
—protein degradation and translation (69.6% of ).
From the diagram and the above calculation, we also derived the percentage of variability in the unobserved true protein levels explained by the unobserved true mRNA levels.
where is the variance of the corrected measured protein levels.
We separately estimated the stochastic mRNA error from the replicate RNA-Seq measurements of the 4,212 genes (Dataset S1). The stochastic mRNA error contributes 0.8% of .
The contributions of transcription, translation and protein and mRNA degradation: measured error strategy
To determine the relative contributions of measured RNA degradation (RD) and measured protein degradation (PD) to the variance in true protein expression (TP), we estimated their variances, var(RD) and var(PD). We took Schwanhausser et al.’s (2011) calculated percentages for the contribution of RD and PD to explain the variance of their uncorrected mass whole proteome abundances (6.4% for RD and 4.9% PD, M Selbach, personal communication). Since the variance of the 8,424 uncorrected mass spec data points from the two replicates is 0.97, we thus calculated var(RD) and var(PD) as 0.062 and 0.048 respectively. The relative contributions of var(RD) and var(PD) to var(TP) (estimated as ) was calculated (Table 1). We also determined the contribution of transcription (var(TXN)) to var(TP) as (var(TR) − var(true RD))∕var(TP), where var(TR) was estimated as , and the contribution of translation as (var(TP) − var(TR) − var(true PD))∕var(TP) (Table 1).
The contributions of each step of gene expression to protein levels: measured translation strategy
We calculated the relative contributions of each of the four steps in gene expression by an independent, second approach that does not rely either on our rescaling of Schwanhausser et al.’s (2011) protein abundance estimates or on our estimate of stochastic protein errors. Instead, our second approach infers true protein abundance based on Subtelny et al.’s (in press) direct measurements of translation rates in NIH3T3 cells by ribosome profiling (Subtelny et al., in press) and on our estimate of RNA measurement error. The measured protein abundances considered are thus Schwanhausser et al.’s (2011) second estimates, not our rescaling of these estimates. A central assumption is that since the variance in Subtelny et al.’s (in press) measured translation rates is 12% of the variance in the rates of translation inferred by Schwanhausser et al. (2011), then the contribution of translation to the variance in true protein levels is 12% of the value provided by Schwanhausser et al. (2011).
The variance term in a linear model between measured protein abundance (MP) and measured mRNA levels (MR) was decomposed as before (Fig. 8) except that the variance in the linear model between true protein abundance (TP) and true mRNA levels (TR) that results from the variance in the rates of protein degradation (PD) and protein translation (PT) were considered separately as cPD and dPT respectively. Similar to our measured error strategy, we can write three linear models using the same assumptions.
(6) |
(7) |
(8) |
Thus, we can write the linear model between measured protein abundance (MP) and measured mRNA levels (MR) for the measured translation strategy as
(9) |
Based on this revised model (9).
-
i.We first estimated var(beR + cPD + cPT + eP) as and bbR as from fitting the above model with the 8,424 mass spec and RNA-Seq data points pooled from the two replicates using Schwanhausser’s second estimates (Dataset S1). By independence, we thus have
-
ii.
The values of var(eR) and bR are the same as those derived previously by our measured error strategy. Thus, we can estimate .
-
iii.
We used the estimate of var(cPD) from Schwanhausser et al. (2011), i.e., 0.97 × 5% = 0.0475.
-
iv.
From Schwanhausser et al.’s (2011) results, we have var(dPT) = d2var(PT) estimated as 0.97 × 55% = 0.54. From Schwanhausser et al.’s (2011) estimates for each of 3,633 genes (Dataset S1, second tab, column AG) has an estimate of 0.29. Hence, the estimate of d2 is 1.86. From Subtelny et al. (in press), we have a separate, directly measured estimate of var(PT) as 0.03533, which we obtained by slightly increasing the variance of their data for the 3,126 genes in the intersected dataset (Fig. 5B; Table S1) by the ratio of the variances for Schwanhausser et al.’s (2011) inferred rates for the 3,633 genes and the 3,126 genes (Table S1). Using this value to replace that of Schwanhausser et al. (2011), we obtained a new estimate of var(dPT) = d2var(PT) as 1.86 × 0.03533 = 0.06593132.
-
v.
Now we can estimate var(eP) as where is an estimate of var(cPD) and an estimate of var(dPT).
-
vi.
Given Schwanhausser et al.’s (2011) second 8,424 uncorrected mass spec data, we can also estimate var(TP) as , where is an estimate of var(MP).
Given the estimates and and Schwanhausser et al.’s (2011) estimate of the contribution of the variance in RNA degradation (defined as ), we can decompose as:
-
∙
variance explained by PD:
-
∙
variance explained by PT:
-
∙
variance explained by TR:
-
∙
variance explained by RD:
-
∙
variance explained by TXN: .
The number of genes not transcribed in a typical cell within a population
To estimate the number of genes not transcribed in a typical cell within a population, we employed a deep RNA-Seq dataset that detected polyA + mRNA for 15,325 protein coding genes in mouse Th2 cells (Hebenstreit et al., 2011). To place these abundance estimates on the same scale as those of Schwanhausser et al.’s (2011) data the 3841 mRNAs expressed above 1 RPKM (reads per kilobase of exon per million mapped reads) in common between the two datasets were identified. The Th2 cell data were then scaled to have the same median and variance for these common genes in numbers of mRNA molecules per cell (Figure S3). Following Hebenstreit et al. (2012), we divided the expressed genes into 11,301 Highly Expressed (HE) genes, present at one or more mRNA molecule per cell, and 4,024 Low Expressed (LE) genes, expressed below one molecule per cell. The remaining 5,984 genes whose expression was not detected were designated Not Expressed (NE) genes. We then divided each LE gene into two: a fraction of a gene expressed at 1 molecule per cell with a weight w and a fraction of a gene that is not expressed in any cells with a weight 1 − w. The 4,024 LE genes were thus decomposed into 1,245 gene equivalents expressed at 1 molecules per cell and 2,779 gene equivalents that are not expressed. Combining these with the 11,301 HE genes and 5,984 NE genes, we obtained 12,546 HE and LE expressed gene equivalents and 8,763 NE and LE non-expressed gene equivalents.
Supplemental Information
Acknowledgments
We are indebted to Matthias Selbach for providing his second whole proteome abundance estimates and ancillary data from the Schwanhausser et al. (2011) analysis. We acknowledge his patient answering of our questions about the Schwanhausser et al. (2011) paper. We are particularly grateful to Stephen Eichhorn and David Bartel for generously providing their ribosome profiling data for NIH3T3 cells prior to publication. We also thank Sarah Teichmann for helping us better understand the Hebenstreit et al. (2012) analysis of mRNA expression and Susan Celniker, Ben Brown, and David Knowles for constructive comments on our manuscript.
Funding Statement
This work was supported in part by NIH grant P01 GM009655. Work at Lawrence Berkeley National Laboratory was conducted under Department of Energy contract DEAC02-05CH11231. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
Mark Biggin is an Academic Editor for PeerJ. The authors declare that they have no competing interests.
Author Contributions
Mark D Biggin conceived and designed the experiments, performed the experiments, wrote the paper.
Jingyi Jessica Li conceived and designed the experiments, performed the experiments, analyzed the data, contributed analysis tools, wrote the paper.
Peter J Bickel conceived and designed the experiments, wrote the paper.
References
- Ahrne et al. (2013).Ahrne E, Molzahn L, Glatter T, Schmidt A. Critical assessment of proteome-wide label-free absolute abundance estimation strategies. Proteomics. 2013;13:2567–2578. doi: 10.1002/pmic.201300135. [DOI] [PubMed] [Google Scholar]
- Ambros (2011).Ambros V. MicroRNAs and developmental timing. Current Opinion in Genetics & Development. 2011;21:511–517. doi: 10.1016/j.gde.2011.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aoyagi & Wassarman (2001).Aoyagi N, Wassarman DA. Developmental and transcriptional consequences of mutations in Drosophila TAF(II)60. Molecular Cell Biology. 2001;21:6808–6819. doi: 10.1128/MCB.21.20.6808-6819.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baek et al. (2008).Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of microRNAs on protein output. Nature. 2008;455:64–71. doi: 10.1038/nature07242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bantscheff et al. (2012).Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Analytical and Bioanalytical Chemistry. 2012;404:939–965. doi: 10.1007/s00216-012-6203-4. [DOI] [PubMed] [Google Scholar]
- Beck et al. (2011).Beck M, Schmidt A, Malmstroem J, Claassen M, Ori A, Szymborska A, Herzog F, Rinner O, Ellenberg J, Aebersold R. The quantitative proteome of a human cell line. Molecular Systems Biology. 2011;7:549. doi: 10.1038/msb.2011.82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bickel & Doksum (2001).Bickel PJ, Doksum KA. Mathematical statistics: basic ideas and selected topics. Upper Saddle River: Prentice Hall; 2001. [Google Scholar]
- Biggin (2011).Biggin MD. Animal transcription networks as highly connected, quantitative continua. Developmental Cell. 2011;21:611–626. doi: 10.1016/j.devcel.2011.09.008. [DOI] [PubMed] [Google Scholar]
- Boisvert et al. (2012).Boisvert FM, Ahmad Y, Gierlinski M, Charriere F, Lamont D, Scott M, Barton G, Lamond AI. A quantitative spatial proteomics analysis of proteome turnover in human cells. Molecular & Cellular Proteomics. 2012;11:M111 011429. doi: 10.1074/mcp.M111.011429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borggrefe et al. (2001).Borggrefe T, Davis R, Bareket-Samish A, Kornberg RD. Quantitation of the RNA polymerase II transcription machinery in yeast. The Journal of Biological Chemistry. 2001;276:47150–47153. doi: 10.1074/jbc.M109581200. [DOI] [PubMed] [Google Scholar]
- Brosi, Hauri & Kramer (1993).Brosi R, Hauri HP, Kramer A. Separation of splicing factor SF3 into two components and purification of SF3a activity. The Journal of Biological Chemistry. 1993;268:17640–17646. [PubMed] [Google Scholar]
- Cambridge et al. (2011).Cambridge SB, Gnad F, Nguyen C, Bermejo JL, Kruger M, Mann M. Systems-wide proteomic analysis in mammalian cells reveals conserved, functional protein turnover. Journal of Proteome Research. 2011;10:5275–5284. doi: 10.1021/pr101183k. [DOI] [PubMed] [Google Scholar]
- Cheadle et al. (2005).Cheadle C, Fan J, Cho-Chung YS, Werner T, Ray J, Do L, Gorospe M, Becker KG. Control of gene expression during T cell activation: alternate regulation of mRNA transcription and mRNA stability. BMC Genomics. 2005;6:75. doi: 10.1186/1471-2164-6-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheung et al. (2011).Cheung MS, Down TA, Latorre I, Ahringer J. Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Research. 2011;39:e103. doi: 10.1093/nar/gkr425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Sousa Abreu et al. (2009).de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C. Global signatures of protein and mRNA expression levels. Molecular bioSystems. 2009;5:1512–1526. doi: 10.1039/b908315d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deutschbauer et al. (2005).Deutschbauer AM, Jaramillo DF, Proctor M, Kumm J, Hillenmeyer ME, Davis RW, Nislow C, Giaever G. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics. 2005;169:1915–1925. doi: 10.1534/genetics.104.036871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dohm et al. (2008).Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research. 2008;36:e105. doi: 10.1093/nar/gkn425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eden et al. (2011).Eden E, Geva-Zatorsky N, Issaeva I, Cohen A, Dekel E, Danon T, Cohen L, Mayo A, Alon U. Proteome half-life dynamics in living human cells. Science. 2011;331:764–768. doi: 10.1126/science.1199784. [DOI] [PubMed] [Google Scholar]
- Eissenberg et al. (2002).Eissenberg JC, Ma J, Gerber MA, Christensen A, Kennison JA, Shilatifard A. dELL is an essential RNA polymerase II elongation factor with a general role in development. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:9894–9899. doi: 10.1073/pnas.152193699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elmen et al. (2008).Elmen J, Lindow M, Schütz S, Lawrence M, Petri A, Obad S, Lindholm M, Hedtjärn M, Hansen HF, Berger U, Gullans S, Kearney P, Sarnow P, Straarup EM, Kauppinen S. LNA-mediated microRNA silencing in non-human primates. Nature. 2008;452:896–899. doi: 10.1038/nature06783. [DOI] [PubMed] [Google Scholar]
- Geiss et al. (2008).Geiss GK, Bumgarner RE, Birditt B, Dahl T, Dowidar N, Dunaway DL, Fell HP, Ferree S, George RD, Grogan T, James JJ, Maysuria M, Mitton JD, Oliveri P, Osborn JL, Peng T, Ratcliffe AL, Webster PJ, Davidson EH, Hood L, Dimitrov K. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nature Biotechnology. 2008;26:317–325. doi: 10.1038/nbt1385. [DOI] [PubMed] [Google Scholar]
- Gennarino et al. (2012).Gennarino VA, D’Angelo G, Dharmalingam G, Fernandez S, Russolillo G, Sanges R, Mutarelli M, Belcastro V, Ballabio A, Verde P, Sardiello M, Banfi S. Identification of microRNA-regulated gene networks by expression analysis of target genes. Genome Research. 2012;22:1163–1172. doi: 10.1101/gr.130435.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghaemmaghami et al. (2003).Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O’Shea EK, Weissman JS. Global analysis of protein expression in yeast. Nature. 2003;425:737–741. doi: 10.1038/nature02046. [DOI] [PubMed] [Google Scholar]
- Gregory et al. (2002).Gregory SG, Sekhon M, Schein J, Zhao S, Osoegawa K, Scott CE, Evans RS, Burridge PW, Cox TV, Fox CA, Hutton RD, Mullenger IR, Phillips KJ, Smith J, Stalker J, Threadgold GJ, Birney E, Wylie K, Chinwalla A, Wallis J, Hillier L, Carter J, Gaige T, Jaeger S, Kremitzki C, Layman D, Maas J, McGrane R, Mead K, Walker R, Jones S, Smith M, Asano J, Bosdet I, Chan S, Chittaranjan S, Chiu R, Fjell C, Fuhrmann D, Girn N, Gray C, Guin R, Hsiao L, Krzywinski M, Kutsche R, Lee SS, Mathewson C, McLeavy C, Messervier S, Ness S, Pandoh P, Prabhu AL, Saeedi P, Smailus D, Spence L, Stott J, Taylor S, Terpstra W, Tsai M, Vardy J, Wye N, Yang G, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Shvartsbeyn A, Gebregeorgis E, Krol M, Russell D, Overton L, Malek JA, Holmes M, Heaney M, Shetty J, Feldblyum T, Nierman WC, Catanese JJ, Hubbard T, Waterston RH, Rogers J, de Jong PJ, Fraser CM, Marra M, McPherson JD, Bentley DR. A physical map of the mouse genome. Nature. 2002;418:743–750. doi: 10.1038/nature00957. [DOI] [PubMed] [Google Scholar]
- Guo et al. (2010).Guo H, Ingolia NT, Weissman JS, Bartel DP. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature. 2010;466:835–840. doi: 10.1038/nature09267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han et al. (2014).Han K, Jaimovich A, Dey G, Ruggero D, Meyuhas O, Sonenberg N, Meyer T. Parallel measurement of dynamic changes in translation rates in single cells. Nature Methods. 2014;11:86–93. doi: 10.1038/nmeth.2729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanamura et al. (1998).Hanamura A, Cáceres JF, Mayeda A, Franza BR, Jr, Krainer AR. Regulated tissue-specific expression of antagonistic pre-mRNA splicing factors. RNA. 1998;4:430–444. [PMC free article] [PubMed] [Google Scholar]
- Hebenstreit et al. (2012).Hebenstreit D, Deonarine A, Babu MM, Teichmann SA. Duel of the fates: the role of transcriptional circuits and noise in CD4+ cells. Current Opinion in Cell Biology. 2012;24:350–358. doi: 10.1016/j.ceb.2012.03.007. [DOI] [PubMed] [Google Scholar]
- Hebenstreit et al. (2011).Hebenstreit D, Fang M, Gu M, Charoensawan V, van Oudenaarden A, Teichmann SA. RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Molecular Systems Biology. 2011;7:497. doi: 10.1038/msb.2011.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hentze and Kuhn (1996).Hentze MW, Kuhn LC. Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress. Proceedings of the National Academy of Sciences of the United States of America. 1996;93:8175–8182. doi: 10.1073/pnas.93.16.8175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hobert (2008).Hobert O. Gene regulation by transcription factors and microRNAs. Science. 2008;319:1785–1786. doi: 10.1126/science.1151651. [DOI] [PubMed] [Google Scholar]
- Hsieh et al. (2012).Hsieh AC, Liu Y, Edlind MP, Ingolia NT, Janes MR, Sher A, Shi EY, Stumpf CR, Christensen C, Bonham MJ, Wang S, Ren P, Martin M, Jessen K, Feldman ME, Weissman JS, Shokat KM, Rommel C, Ruggero D. The translational landscape of mTOR signalling steers cancer initiation and metastasis. Nature. 2012;485:55–61. doi: 10.1038/nature10912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ingolia et al. (2011).Ingolia NT, Lareau LF, Weissman JS. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 2011;147:789–802. doi: 10.1016/j.cell.2011.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim et al. (2010).Kim Dong-Uk, Hayles J, Kim D, Wood V, Park H-O, Won M, Yoo H-S, Duhig T, Nam M, Palmer G, Han S, Jeffery L, Baek S-T, Lee H, Shim YS, Lee M, Kim L, Heo K-S, Noh EJ, Lee A-R, Jang Y-J, Chung K-S, Choi S-J, Park J-Y, Park Y, Kim HM, Park S-K, Park H-J, Kang E-J, Kim HB, Kang H-S, Park H-M, Kim K, Song K, Song KB, Nurse P, Hoe K-L. Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe. Nature Biotechnology. 2010;28:617–623. doi: 10.1038/nbt.1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kimura et al. (1999).Kimura H, Tao Y, Roeder RG, Cook PR. Quantitation of RNA polymerase II and its transcription factors in an HeLa cell: little soluble holoenzyme but significant amounts of polymerases attached to the nuclear substructure. Molecular Cell Biology. 1999;19:5383–5392. doi: 10.1128/mcb.19.8.5383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kislauskis et al. (1997).Kislauskis EH, Zhu X, Singer RH. Beta-Actin messenger RNA localization and protein synthesis augment cell motility. Journal of Cell Biology. 1997;136:1263–1270. doi: 10.1083/jcb.136.6.1263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kristensen et al. (2013).Kristensen AR, Gsponer J, Foster LJ. Protein synthesis rate is the predominant regulator of protein expression during differentiation. Molecular Systems Biology. 2013;9:689. doi: 10.1038/msb.2013.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krutzfeldt et al. (2005).Krutzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M. Silencing of microRNAs in vivo with ‘antagomirs’. Nature. 2005;438:685–689. doi: 10.1038/nature04303. [DOI] [PubMed] [Google Scholar]
- Kuntumalla et al. (2009).Kuntumalla S, Braisted JC, Huang S-T, Parmar PP, Clark DJ, Alami H, Zhang Q, Donohue-Rolfe A, Tzipori S, Fleischmann RD, Peterson SN, Pieper R. Comparison of two label-free global quantitation methods, APEX and 2D gel electrophoresis, applied to the Shigella dysenteriae proteome. Proteome Science. 2009;7:22. doi: 10.1186/1477-5956-7-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loriaux & Hoffmann (2013).Loriaux PM, Hoffmann A. A protein turnover signaling motif controls the stimulus-sensitivity of stress response pathways. PLOS Computational Biology. 2013;9:e1002932. doi: 10.1371/journal.pcbi.1002932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu et al. (2007).Lu P, Vogel C, Wang R, Yao X, Marcotte EM. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nature Biotechnology. 2007;25:117–124. doi: 10.1038/nbt1270. [DOI] [PubMed] [Google Scholar]
- Maier, Guell & Serrano (2009).Maier T, Guell M, Serrano L. Correlation of mRNA and protein in complex biological samples. FEBS Letters. 2009;583:3966–3973. doi: 10.1016/j.febslet.2009.10.036. [DOI] [PubMed] [Google Scholar]
- Milo et al. (2010).Milo R, Jorgensen P, Moran U, Weber G, Springer M. BioNumbers–the database of key numbers in molecular and cell biology. Nucleic Acids Research. 2010;38:D750–D753. doi: 10.1093/nar/gkp889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olkin & Finn (1990).Olkin I, Finn JD. Testing correlated correlations. Psychological Bulletin. 1990;108:330–333. doi: 10.1037/0033-2909.108.2.330. [DOI] [Google Scholar]
- Peng et al. (2012).Peng M, Taouatas N, Cappadona S, van Breukelen B, Mohammed S, Scholten A, Heck AJ. Protease bias in absolute protein quantitation. Nature Methods. 2012;9:524–525. doi: 10.1038/nmeth.2031. [DOI] [PubMed] [Google Scholar]
- Pillai et al. (2007).Pillai RS, Bhattacharyya SN, Filipowicz W. Repression of protein synthesis by miRNAs: how many mechanisms? Trends in Cell Biology. 2007;17:118–126. doi: 10.1016/j.tcb.2006.12.007. [DOI] [PubMed] [Google Scholar]
- Princiotta et al. (2003).Princiotta MF, Finzi D, Qian SB, Gibbs J, Schuchmann S, Buttgereit F, Bennink JR, Yewdell JW. Quantitating protein synthesis, degradation, and endogenous antigen processing. Immunity. 2003;18:343–354. doi: 10.1016/S1074-7613(03)00051-7. [DOI] [PubMed] [Google Scholar]
- Rabani et al. (2011).Rabani M, Levin JZ, Fan L, Adiconis X, Raychowdhury R, Garber M, Gnirke A, Nusbaum C, Hacohen N, Friedman N, Amit I, Regev A. Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells. Nature Biotechnology. 2011;29:436–442. doi: 10.1038/nbt.1861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rajewsky (2011).Rajewsky N. MicroRNAs and the Operon paper. Journal of Molecular Biology. 2011;409:70–75. doi: 10.1016/j.jmb.2011.03.021. [DOI] [PubMed] [Google Scholar]
- Schwanhausser et al. (2011).Schwanhausser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature. 2011;473:337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]
- Selbach et al. (2008).Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N. Widespread changes in protein synthesis induced by microRNAs. Nature. 2008;455:58–63. doi: 10.1038/nature07228. [DOI] [PubMed] [Google Scholar]
- Sharova et al. (2009).Sharova LV, Sharov AA, Nedorezov T, Piao Y, Shaik N, Ko MS. Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells. DNA Res. 2009;16:45–58. doi: 10.1093/dnares/dsn030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Subtelny et al. (in press).Subtelny AO, Eichhorn SW, Chen GR, Sive H, Bartel DP. Poly(A)-tail lengths and a developmental switch in translational control. Nature. 2014 doi: 10.1038/nature13007. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The ENCODE Project Consortium (2011).The ENCODE Project Consortium. A user’s guide to the encyclopedia of DNA elements (ENCODE) PLoS Biology. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vogel et al. (2010).Vogel C, de Sousa Abreu R, Ko D, Le SY, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Molecular Systems Biology. 2010;6:400. doi: 10.1038/msb.2010.59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vogel & Marcotte (2012).Vogel C, Marcotte EM. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nature Reviews Genetics. 2012;13:227–232. doi: 10.1038/nrg3185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wisniewski et al. (2012).Wisniewski JR, Ostasiewicz P, Duś K, Zielinska DF, Gnad F, Mann M. Extensive quantitative remodeling of the proteome between normal colon tissue and adenocarcinoma. Molecular Systems Biology. 2012;8:611. doi: 10.1038/msb.2012.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wollfe (1998).Wollfe A. Chromatin: structure and function. San Diego: Academic Press; 1998. [Google Scholar]
- Wong et al. (2011).Wong PG, Winter SL, Zaika E, Cao TV, Oguz U, Koomen JM, Hamlin JL, Alexandrow MG. Cdc45 limits replicon usage from a low density of preRCs in mammalian cells. PLoS ONE. 2011;6:e270. doi: 10.1371/journal.pone.0017533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao et al. (2007).Xiao C, Calado DP, Galler G, Thai TH, Patterson HC, Wang J, Rajewsky N, Bender TP, Rajewsky K. MiR-150 controls B cell differentiation by targeting the transcription factor c-Myb. Cell. 2007;131:146–159. doi: 10.1016/j.cell.2007.07.021. [DOI] [PubMed] [Google Scholar]
- Yang et al. (2003).Yang E, van Nimwegen E, Zavolan M, Rajewsky N, Schroeder M, Magnasco M, Darnell JE., Jr Decay rates of human mRNAs: correlation with functional characteristics and sequence attributes. Genome Research. 2003;13:1863–1872. doi: 10.1101/gr.997703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeiler et al. (2012).Zeiler M, Straube WL, Lundberg E, Uhlen M, Mann M. A Protein Epitope Signature Tag (PrEST) library allows SILAC-based absolute quantification and multiplexed determination of protein copy numbers in cell lines. Molecular & Cellular Proteomics. 2012;11:e270. doi: 10.1074/mcp.O111.009613. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.