Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Aug 15;4(8):e1000157.
doi: 10.1371/journal.pgen.1000157.

Selective constraints in experimentally defined primate regulatory regions

Affiliations

Selective constraints in experimentally defined primate regulatory regions

Daniel J Gaffney et al. PLoS Genet. .

Abstract

Changes in gene regulation may be important in evolution. However, the evolutionary properties of regulatory mutations are currently poorly understood. This is partly the result of an incomplete annotation of functional regulatory DNA in many species. For example, transcription factor binding sites (TFBSs), a major component of eukaryotic regulatory architecture, are typically short, degenerate, and therefore difficult to differentiate from randomly occurring, nonfunctional sequences. Furthermore, although sites such as TFBSs can be computationally predicted using evolutionary conservation as a criterion, estimates of the true level of selective constraint (defined as the fraction of strongly deleterious mutations occurring at a locus) in regulatory regions will, by definition, be upwardly biased in datasets that are a priori evolutionarily conserved. Here we investigate the fitness effects of regulatory mutations using two complementary datasets of human TFBSs that are likely to be relatively free of ascertainment bias with respect to evolutionary conservation but, importantly, are supported by experimental data. The first is a collection of almost >2,100 human TFBSs drawn from the literature in the TRANSFAC database, and the second is derived from several recent high-throughput chromatin immunoprecipitation coupled with genomic microarray (ChIP-chip) analyses. We also define a set of putative cis-regulatory modules (pCRMs) by spatially clustering multiple TFBSs that regulate the same gene. We find that a relatively high proportion ( approximately 37%) of mutations at TFBSs are strongly deleterious, similar to that at a 2-fold degenerate protein-coding site. However, constraint is significantly reduced in human and chimpanzee pCRMS and ChIP-chip sequences, relative to macaques. We estimate that the fraction of regulatory mutations that have been driven to fixation by positive selection in humans is not significantly different from zero. We also find that the level of selective constraint in our TFBSs, pCRMs, and ChIP-chip sequences is negatively correlated with the expression breadth of the regulated gene, whereas the opposite relationship holds at that gene's nonsynonymous and synonymous sites. Finally, we find that the rate of protein evolution in a transcription factor appears to be positively correlated with the breadth of expression of the gene it regulates. Our study suggests that strongly deleterious regulatory mutations are considerably more likely (1.6-fold) to occur in tissue-specific than in housekeeping genes, implying that there is a fitness cost to increasing "complexity" of gene expression.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Proportion of the total number of sites contributed by TFBSs (A) and ChIP-chip sequences (C) in different genomic regions, and frequency distribution of the distance of TFBS (B) or ChIP-chip sequence (D) from the transcription start site (TSS) of the regulated gene.
Figure 2
Figure 2. Selective constraint of flanking sequence located between two annotated TFBSs that are <1.5 kb apart.
Dotted lines show 95% confidence intervals estimated by bootstrapping the data by case-control region, 1000 times.
Figure 3
Figure 3. Selective constraint of flanking sequences for which the annotated TFBS was >1.5 kb from another annotated TFBS, coding sequence or TSS.
Dotted lines show 95% confidence intervals estimated by bootstrapping the data by case-control region, 1000 times.
Figure 4
Figure 4. Estimates of selective constraint at TFBSs, pCRMs and ChIP-chip sequences averaged across all three primates and 4-fold, 2-fold and 0-fold degenerate sites.
Genes used were those regulated by the TFBSs and inferred to be regulated by the ChIP-chip sequences.
Figure 5
Figure 5. Selective constraint of regulatory noncoding (TFBSs, pCRMs and ChIP-chip sequences) and coding (nonsynonymous sites) DNA in humans, chimpanzees and macaques.
Figure 6
Figure 6. Fraction of adaptive substitutions (α) in primate pCRMs and ChIP-chip sequences versus the threshold minor allele frequency (MAF) that was excluded from the analysis prior to the estimation of α (see text).
Confidence intervals are shown as dashed lines and were estimated by bootstrapping the data by case-control region, 10000 times.
Figure 7
Figure 7. Constraint in regulatory noncoding DNA (TFBSs, pCRMs, ChIP-chip sequences) and coding regions (nonsynonymous, 2-fold and 4-fold synonymous sites) versus gene expression breadth.
Narrow, intermediate and broad expression breadth were defined using lower (2 tissues) and upper quartiles (>30 tissues) of the distribution of number of tissues expressed per gene. Constraint at pCRMs, ChIP-chip sequences, nonsynonymous, 2-fold and 4-fold degenerate sites was significantly correlated with number of tissues in which a gene was expressed (Pearson r = −0.144, P<0.005; r = −0.092,<5.07×10−7; r = 0.176, P<1.22×10−12; r = 0.099, P<6.63×10−5; r = 0.110, P<9.36×10−6, respectively).
Figure 8
Figure 8. Dn/Ds of the transcription factors (TFs) regulating gene expression versus gene expression breadth.
Dn/Ds was estimated from human-macaque alignments, treating all TFs known to regulate each gene as a single sequence.

Similar articles

Cited by

References

    1. King MC, Wilson A. Evolution at two levels in humans and chimpanzees. Science. 1975;188:107–116. - PubMed
    1. Carroll SB. Evolution at two levels: On genes and form. PLoS Biology. 2005;3:1159–1166. - PMC - PubMed
    1. Blanchette M, Bataille AR, Chen XY, Poitras C, Laganiere J, et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 2006;16:656–668. - PMC - PubMed
    1. Wingender E, Chen X, Hehl R, Karas H, Liebich I, et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Research. 2000;28:316–319. - PMC - PubMed
    1. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002;296:916–919. - PubMed

Publication types

Substances