Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Feb 26;30(3):265-70.
doi: 10.1038/nbt.2136.

Massively parallel functional dissection of mammalian enhancers in vivo

Affiliations

Massively parallel functional dissection of mammalian enhancers in vivo

Rupali P Patwardhan et al. Nat Biotechnol. .

Abstract

The functional consequences of genetic variation in mammalian regulatory elements are poorly understood. We report the in vivo dissection of three mammalian enhancers at single-nucleotide resolution through a massively parallel reporter assay. For each enhancer, we synthesized a library of >100,000 mutant haplotypes with 2-3% divergence from the wild-type sequence. Each haplotype was linked to a unique sequence tag embedded within a transcriptional cassette. We introduced each enhancer library into mouse liver and measured the relative activities of individual haplotypes en masse by sequencing the transcribed tags. Linear regression analysis yielded highly reproducible estimates of the effect of every possible single-nucleotide change on enhancer activity. The functional consequence of most mutations was modest, with ∼22% affecting activity by >1.2-fold and ∼3% by >2-fold. Several, but not all, positions with higher effects showed evidence for purifying selection, or co-localized with known liver-associated transcription factor binding sites, demonstrating the value of empirical high-resolution functional analysis.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Schematic illustration of the method
We used doped oligonucleotide synthesis and polymerase cycling assembly (PCA) to generate a highly complex library of enhancer haplotypes for each enhancer studied. On average, each enhancer haplotype diverged from wild-type by ~2–3% (red circles represent mutations). These mutant enhancers, along with 20bp random tags, were cloned into an expression vector (pGL4.23) containing a minimal promoter driving transcription of luciferase (minP/Luc). We performed “subassembly” on each library to determine the full sequence of each enhancer haplotype and to identify the 20bp tag to which each haplotype was cloned in cis. Each library was then introduced into two mice via hydrodynamic tail vein injection, livers were harvested after 24 hours, and RNA-seq was performed to quantify abundance of transcribed 20bp tags. These data were used to estimate the impact of each possible mutation on transcriptional activation.
Figure 2
Figure 2. Effect size on transcriptional activity of all possible substitution mutations in three mammalian enhancers
Estimated effect size of mutation at each position based on coefficients from univariate (grey columns, left axis) and trivariate (A:red, C:blue, G:green, T:purple) models are shown for ALDOB ((a) and (b) respectively), ECR11 ((c) and (d) respectively), and LTV1 ((e) and (f) respectively). Effect sizes were estimated by taking the log2 of the ratio of the number of pools predicted by the model with a mutation to the number of pools predicted for the wild-type nucleotide (total number of pools sequenced per library: ALDOB: 39; ECR11: 69; LTV1 Set 1: 10; LTV1 Set 2: 10). Effect sizes are only shown for positions where model coefficients had associated p-values ≤ 0.01. We also used multiple linear regression with sets of 10 adjacent positions as predictors. The F-statistic of these models, representing the extent to which the model is predictive of the outcome, is plotted (blue shadow, right axis) for ALDOB (a), ECR11 (c), and LTV1 (e). The locations of TFBS predictions using the MATCH web server (with restriction to TFs present in liver) are shown as horizontal grey bars at the top of the plot in (a), (c), and (e). The location of a partial LINE element in ECR11 is shown as an orange bar at the bottom of (c).
Figure 2
Figure 2. Effect size on transcriptional activity of all possible substitution mutations in three mammalian enhancers
Estimated effect size of mutation at each position based on coefficients from univariate (grey columns, left axis) and trivariate (A:red, C:blue, G:green, T:purple) models are shown for ALDOB ((a) and (b) respectively), ECR11 ((c) and (d) respectively), and LTV1 ((e) and (f) respectively). Effect sizes were estimated by taking the log2 of the ratio of the number of pools predicted by the model with a mutation to the number of pools predicted for the wild-type nucleotide (total number of pools sequenced per library: ALDOB: 39; ECR11: 69; LTV1 Set 1: 10; LTV1 Set 2: 10). Effect sizes are only shown for positions where model coefficients had associated p-values ≤ 0.01. We also used multiple linear regression with sets of 10 adjacent positions as predictors. The F-statistic of these models, representing the extent to which the model is predictive of the outcome, is plotted (blue shadow, right axis) for ALDOB (a), ECR11 (c), and LTV1 (e). The locations of TFBS predictions using the MATCH web server (with restriction to TFs present in liver) are shown as horizontal grey bars at the top of the plot in (a), (c), and (e). The location of a partial LINE element in ECR11 is shown as an orange bar at the bottom of (c).
Figure 2
Figure 2. Effect size on transcriptional activity of all possible substitution mutations in three mammalian enhancers
Estimated effect size of mutation at each position based on coefficients from univariate (grey columns, left axis) and trivariate (A:red, C:blue, G:green, T:purple) models are shown for ALDOB ((a) and (b) respectively), ECR11 ((c) and (d) respectively), and LTV1 ((e) and (f) respectively). Effect sizes were estimated by taking the log2 of the ratio of the number of pools predicted by the model with a mutation to the number of pools predicted for the wild-type nucleotide (total number of pools sequenced per library: ALDOB: 39; ECR11: 69; LTV1 Set 1: 10; LTV1 Set 2: 10). Effect sizes are only shown for positions where model coefficients had associated p-values ≤ 0.01. We also used multiple linear regression with sets of 10 adjacent positions as predictors. The F-statistic of these models, representing the extent to which the model is predictive of the outcome, is plotted (blue shadow, right axis) for ALDOB (a), ECR11 (c), and LTV1 (e). The locations of TFBS predictions using the MATCH web server (with restriction to TFs present in liver) are shown as horizontal grey bars at the top of the plot in (a), (c), and (e). The location of a partial LINE element in ECR11 is shown as an orange bar at the bottom of (c).
Figure 3
Figure 3. Mutation effect size profiles in transcription factor binding sites
For a predicted HNF4 site (positions 94–105) (a) and a predicted HNF1 site (positions 135–148) (b) in ALDOB, plotted is the effect size for each possible substitution, with the consensus TF binding sequence (orange) and the enhancer sequence (grey for consensus, black for non-consensus). Non-consensus positions where rescue is observed after mutating to consensus are shown in boldface. HNF4 binding to the ALDOB enhancer region in human liver has been previously demonstrated, whereas in vivo occupancy data for HNF1 at this region is not yet available.
Figure 4
Figure 4. Distribution of effect sizes for all possible substitution mutations in three mammalian enhancers
For the three enhancers studied (two replicate libraries for LTV1), plotted is the cumulative fraction of substitutions possessing a given effect size expressed as the absolute value of the effect size of a given substitution. For example, across the three enhancers, between ~80% and ~95% of substitutions influence transcriptional activity by less than a factor of 1.5.

Comment in

Similar articles

Cited by

References

    1. Consortium EP, et al. A user’s guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9:e1001046. - PMC - PubMed
    1. Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. - PMC - PubMed
    1. Visel A, et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature. 2009;457:854–858. - PMC - PubMed
    1. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12:628–640. - PubMed
    1. Kleinjan DA, van Heyningen V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am J Hum Genet. 2005;76:8–32. - PMC - PubMed

Publication types

Substances