Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003;2(2):13.
doi: 10.1186/1475-4924-2-13. Epub 2003 May 22.

Identification of conserved regulatory elements by comparative genome analysis

Affiliations
Comparative Study

Identification of conserved regulatory elements by comparative genome analysis

Boris Lenhard et al. J Biol. 2003.

Abstract

Background: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments.

Results: We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/.

Conclusions: Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Cross-species comparisons of the β-globin gene promoter. (a) Analysis of the human promoter without phylogenetic filtering generates numerous predictions, most of which are biologically irrelevant. (b) Comparison with the chicken promoter fails to detect conserved sites (screened with the artificially low conservation cutoff of 25%). (c) Comparison with the mouse promoter sequence identifies conserved sites, including a documented GATA-binding site [49] (boxed). (d) Comparison with the cow promoter identifies more conserved sites. (e) Comparison to the Macaque monkey (Macaca cynomolgus) promoter results in a plot similar to the single sequence analysis. Unless indicated, all plots were generated using all available matrices from vertebrates, with 70% conservation cutoff, 50 base-pair window size and 85% transcription factor score threshold settings. The y axis in all graphs specifies the percentage of identical nucleotides within a sliding window of fixed length (using the default of 50 base-pairs). The x axis refers to the nucleotide position in the human sequence at which the window initiates.
Figure 2
Figure 2
The impact of phylogenetic footprinting analysis. Both (a-c) a high-quality set (14 genes and 40 verified sites), and (d-f) a larger collection of promoters (57 genes and 110 sites, from the TRANSFAC database [20,21]) were analyzed. (a,d) Comparison of the selectivity (defined as the average number of predictions per 100 bp, using all models) between orthologous and single-sequence analysis modes. (b,e) Comparison of the sensitivity (the portion of 40 or 110 verified sites, respectively, that are detected with the given setting) between orthologous and single-sequence analysis modes. (c,f) Ratios of the number of sites detected in single-sequence mode to the number detected in orthologous-sequence mode; the pair: single-sequence ratios are displayed for both sensitivity (detected verified sites) and selectivity (all predicted sites).
Figure 3
Figure 3
The ConSite result report and visualization tools for the analysis of two orthologous genomic sequences. (a) Graphical view, with conservation profile plots for the two orthologous sequences, as well as the control panel for altering the visualization parameters. (b) Pop-up window containing information about individual TFBSs. (c) Detailed alignment view, providing sequence-level details on putative TFBSs conserved between two orthologous sequences.

Similar articles

Cited by

References

    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. - DOI - PubMed
    1. Tronche F, Ringeisen F, Blumenfeld M, Yaniv M, Pontoglio M. Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J Mol Biol. 1997;266:231–245. doi: 10.1006/jmbi.1996.0760. - DOI - PubMed
    1. Fickett JW. Quantitative discrimination of MEF2 sites. Mol Cell Biol. 1996;16:437–441. - PMC - PubMed
    1. Gumucio DL, Heilstedt-Williamson H, Gray TA, Tarle SA, Shelton DA, Tagle DA, Slightom JL, Goodman M, Collins FS. Phylogenetic footprinting reveals a nuclear protein which binds to silencer sequences in the human gamma and epsilon globin genes. Mol Cell Biol. 1992;12:4919–4929. - PMC - PubMed
    1. Pennacchio LA, Rubin EM. Genomic strategies to identify mammalian regulatory sequences. Nat Rev Genet. 2001;2:100–109. doi: 10.1038/35052548. - DOI - PubMed

Publication types

Substances

LinkOut - more resources