Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Mar;16(3):405-13.
doi: 10.1101/gr.4303406. Epub 2006 Jan 31.

A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles

Affiliations
Comparative Study

A systematic model to predict transcriptional regulatory mechanisms based on overrepresentation of transcription factor binding profiles

Li-Wei Chang et al. Genome Res. 2006 Mar.

Abstract

An important aspect of understanding a biological pathway is to delineate the transcriptional regulatory mechanisms of the genes involved. Two important tasks are often encountered when studying transcription regulation, i.e., (1) the identification of common transcriptional regulators of a set of coexpressed genes; (2) the identification of genes that are regulated by one or several transcription factors. In this study, a systematic and statistical approach was taken to accomplish these tasks by establishing an integrated model considering all of the promoters and characterized transcription factors (TFs) in the genome. A promoter analysis pipeline (PAP) was developed to implement this approach. PAP was tested using coregulated gene clusters collected from the literature. In most test cases, PAP identified the transcription regulators of the input genes accurately. When compared with chromatin immunoprecipitation experiment data, PAP's predictions are consistent with the experimental observations. When PAP was used to analyze one published expression-profiling data set and two novel coregulated gene sets, PAP was able to generate biologically meaningful hypotheses. Therefore, by taking a systematic approach of considering all promoters and characterized TFs in our model, we were able to make more reliable predictions about the regulation of gene expression in mammalian organisms.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An overview of the Promoter Analysis Pipeline (PAP). PAP has two components. The data processing pipeline assembles a set of algorithms to generate the results of a genome-wide promoter analysis, whereas the user interface queries and processes the stored data according to the user's input. Promoters were acquired and repetitive elements in the promoters were masked. Promoters of orthologous genes were aligned and transcription factor (TF) binding sites were identified and mapped. Probability scores of each promoter and each transcription factor were calculated, and a distribution of probability scores was generated for each transcription factor. R-scores were then computed using these distributions. All of these results were stored in a database termed PAPdb, which was used to predict the TFs that are most likely to regulate a set of genes and the genes most likely regulated by a set of TFs.
Figure 2.
Figure 2.
The sequence conservation of the promoter sequence defined in PAP. (A) The fraction of genes whose promoters extend to a particular upstream or downstream position from the transcription start site. Most of the genes do not encounter another upstream gene within a distance of 10 kb, whereas only a portion of genes do not reach their translation start sites within a distance of 5 kb downstream. (B) The fraction of promoters that are conserved at a particular upstream or downstream position from the transcription start site. This fraction was calculated using the total number of promoters at each position in A as the denominator. The most conserved and alignable region is around 2 kb upstream and downstream of the annotated transcription start sites.
Figure 3.
Figure 3.
PAP's prediction of target genes of transcription factors is consistent with chromatin immunoprecipitation experiment data. Target genes of transcription factors HSF1, HNF1a, HNF4a, HNF6, and E2F4 were determined by previous chromatin immunoprecipitation experiments and were collected from the literature. The nonparametric Mann-Whitney U test was used to test whether these validated genes have higher scores in PAP's predictions. In each case, the Z score and the P-value were calculated.
Figure 4.
Figure 4.
The utility of PAP to identify additional genes regulated by the same set of factors. (A) Methodology of identifying additional similarly regulated genes. Starting from a set of coregulated genes, several transcription factors may be identified and hypothesized to be the common transcription regulator of the input genes. Additional genes that may be regulated by the same factors may be searched using these transcription factors. (B) Fourteen previously reported liver-specific genes were used to test this methodology by leave-one-out cross-validation. In each round, 13 genes were analyzed by PAP and putative common transcription factors were determined. High scoring matrices were then used to score all of the human genes, and the rank of the verification gene is reported.
Figure 5.
Figure 5.
PAP identified experimentally validated TF-binding sites in cell-proliferation related genes. Promoter regions that contain bona fide TF-binding sites are shown. Other TF-binding sites predicted by PAP are also shown. Numbers in the figure represent the sequence positions according to the transcription start site. Experimentally verified sites are designated by a star above the site.
Figure 6.
Figure 6.
Predicted transcriptional regulatory model of cholesterol biosynthesis genes in Schwann cells. In this model, Egr2 does not directly regulate all of the cholesterol synthetic enzymes in myelination. Instead, Egr2 coordinates the expression of these genes through other transcription factors, including NF-Y, CREB-1, YY1, and AP-1.

Similar articles

Cited by

References

    1. Aerts, S., Thijs, G., Coessens, B., Staes, M., Moreau, Y., and De Moor, B. 2003. Toucan: Deciphering the cis-regulatory logic of coregulated genes. Nucleic Acids Res. 31 1753–1764. - PMC - PubMed
    1. Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., and Mango, S.E. 2004. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305 1743–1746. - PubMed
    1. Baeuerle, P.A. and Baichwal, V.R. 1997. NF-κB as a frequent target for immunosuppressive and anti-inflammatory molecules. Adv. Immunol. 65 111–137. - PubMed
    1. Berg, O.G. and von Hippel, P.H. 1987. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193 723–750. - PubMed
    1. Blanchette, M. and Tompa, M. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12 739–748. - PMC - PubMed

Publication types