Key Points
-
Computational tools for the analysis of genomic DNA are good at identifying coding sequences, but poor at identifying regulatory sequences.
-
Comparative genomic sequence analysis is a powerful approach for identifying conserved non-coding regions. Many such conserved regions have been shown to be involved in gene regulation.
-
A second approach to identify regulatory regions is to look for sequence motifs known to bind to transcription factors. A number of databases have compiled information on these motifs.
-
A third approach is to use expression profiling to identify regulatory sequences. Co-regulated genes are identified by cluster analysis and their upstream regions are searched for common motifs. This method has been applied most successfully in yeast.
-
These approaches can be combined to yield a powerful strategy for identifying novel regulatory elements, and for decoding the non-coding portion of mammalian genomes.
Abstract
With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and genome-wide expression profiling, integrated with various computational tools, are poised to contribute to the decoding of genomic sequence and to the identification of those sequences that orchestrate gene regulation. In this review, we highlight several genomic approaches that are being used to identify regulatory sequences in mammalian genomes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Durick, K., Mendlein, J. & Xanthopoulos, K. G. Hunting with traps: genome-wide strategies for gene discovery and functional analysis. Genome Res. 9, 1019–1025 (1999).
Fukushige, S. & Ikeda, J. E. Trapping of mammalian promoters by Cre-lox site-specific recombination. DNA Res. 3, 73–80 (1996).
Asoh, S., Lee-Kwon, W., Mouradian, M. M. & Nirenberg, M. Selection of DNA clones with enhancer sequences. Proc. Natl Acad. Sci. USA 91, 6982–6986 ( 1994).
Duret, L. & Bucher, P. Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7, 399–406 (1997).
Hardison, R. C., Oeltjen, J. & Miller, W. Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res. 7, 959–966 ( 1997).
Hardison, R. C. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16, 369–372 (2000).An excellent review of comparative sequence analyses, limitations and successes.
Loots, G. G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000).
Gottgens, B. et al. Analysis of vertebrate SCL loci identifies conserved enhancers . Nature Biotechnol. 18, 181– 186 (2000).References 7 and 8 are early examples of the use of human–mouse comparative sequence analyses for assigning priority to regions of DNA to screen for functional properties.
Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 ( 1998); erratum 282, 1421 ( 1998)
Spellman, P. T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. & Church, G. M. Systematic determination of genetic network architecture. Nature Genet. 22, 281–285 (1999).References 9, 10 and 11 provide excellent examples of yeast microarray data and how they can be used to cluster pathway-related genes on the basis of similar expression patterns.
Zhu, J. & Zhang, M. Q. Cluster, function and promoter: analysis of yeast expression array. Pac. Symp. Biocomput. 479–490 (2000).
Wasserman, W. W. & Fickett, J. W. Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278, 167–181 (1998).
Niehrs, C. & Pollet, N. Synexpression groups in eukaryotes . Nature 402, 483–487 (1999).
Lockhart, D. J. & Winzeler, E. A. Genomics, gene expression and DNA arrays. Nature 405, 827–836 (2000).A significant review of the numerous applications of using DNA arrays to understand biological processes.
Hughes, J. D., Estep, P. W., Tavazoie, S. & Church, G. M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000).
Zhang, M. Q. Promoter analysis of co-regulated genes in the yeast genome. Comput. Chem. 23, 233–250 ( 1999).
Faisst, S. & Meyer, S. Compilation of vertebrate-encoded transcription factors. Nucleic Acids Res. 20, 3–26 (1992).
Frech, K., Herrmann, G. & Werner, T. Computer-assisted prediction, classification, and delimitation of protein binding sites in nucleic acids. Nucleic Acids Res. 21, 1655–1664 (1993).
Ghosh, D. Object-oriented transcription factors database (ooTFD). Nucleic Acids Res. 28, 308–310 ( 2000).
Heinemeyer, T. et al. Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res. 26, 362– 367 (1998).
Kel-Margoulis, O. V., Romashchenko, A. G., Kolchanov, N. A., Wingender, E. & Kel, A. E. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res. 28, 311–315 ( 2000).
Morgenstern, B. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211– 218 (1999).
Prestridge, D. S. SIGNAL SCAN 4.0: additional databases and sequence formats. Comput. Appl. Biosci. 12, 157–160 (1996).
Prestridge, D. S. Computer software for eukaryotic promoter analysis. Methods Mol. Biol. 130, 265–295 ( 2000).
Perier, R. C., Praz, V., Junier, T., Bonnard, C. & Bucher, P. The eukaryotic promoter database (EPD). Nucleic Acids Res. 28, 302–303 (2000).
Quandt, K., Frech, K., Karas, H., Wingender, E. & Werner, T. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23, 4878–4884 (1995).
Wingender, E. et al. TRANSFAC: an integrated system for gene expression regulation . Nucleic Acids Res. 28, 316– 319 (2000).
Werner, T. Computer-assisted analysis of transcription control regions. Matinspector and other programs. Methods Mol. Biol. 132, 337–349 (2000).
Li, Q., Harju, S. & Peterson, K. R. Locus control regions: coming of age at a decade plus. Trends Genet. 15, 403– 408 (1999).A detailed summary of our current understanding of the β-globin locus control region.
Lacy, D. A. et al. Faithful expression of the human 5q31 cytokine cluster in transgenic mice. J. Immunol. 164, 4569– 4574 (2000).
Frazer, K. A., Narla, G., Zhang, J. L. & Rubin, E. M. The apolipoprotein(a) gene is regulated by sex hormones and acute-phase inducers in YAC transgenic mice. Nature Genet. 9, 424– 431 (1995).A transgenic study supporting the commonality of gene regulation between species.
Jimenez, G., Gale, K. B. & Enver, T. The mouse β-globin locus control region: hypersensitive sites 3 and 4. Nucleic Acids Res. 20, 5797 –5803 (1992).
Hood, L., Rowen, L. & Koop, B. F. Human and mouse T-cell receptor loci: genomics, evolution, diversity, and serendipity. Ann. NY Acad. Sci. 758, 390–412 (1995).
Koop, B. F. & Hood, L. Striking sequence similarity over almost 100 kilobases of human and mouse T-cell receptor DNA. Nature Genet. 7, 48–53 (1994 ).An example of a large genomic region in human and mouse that is highly conserved, thus limiting regulatory sequence identification.
Ho, P. J. & Thein, S. L. Gene regulation and deregulation: a β-globin perspective. Blood Rev. 14, 78–93 (2000).
Talbot, D. et al. A dominant control region from the human β-globin locus conferring integration site-independent gene expression. Nature 338, 352–355 ( 1989).
Dubchak, I. et al. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 10, 1304 –1306 (2000).
Hardison, R. et al. Sequence and comparative analysis of the rabbit α-like globin gene cluster reveals a rapid mode of evolution in a G+C-rich region of mammalian genomes. J. Mol. Biol. 222, 233–249 (1991).
Bulyk, M. L., Gentalen, E., Lockhart, D. J. & Church, G. M. Quantifying DNA-protein interactions by double-stranded DNA arrays. Nature Biotechnol. 17, 573–577 (1999).
Cavener, D. R. Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res. 15, 1353–1361 (1987).
Werner, T. Models for prediction and recognition of eukaryotic promoters. Mamm. Genome 10, 168–175 (1999).
Wagner, A. A computational genomics approach to the identification of gene networks. Nucleic Acids Res. 25, 3594–3604 (1997).
van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).
Wagner, A. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15, 776–784 (1999).
Tagle, D. A. et al. Embryonic ɛ- and γ-globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439–455 ( 1988).
Vuillaumier, S. et al. Cross-species characterization of the promoter region of the cystic fibrosis transmembrane conductance regulator gene reveals multiple levels of regulation. Biochem J. 327, 651 –662 (1997).
Gumucio, D. L. et al. Evolutionary strategies for the elucidation of cis- and trans-factors that regulate the developmental switching programs of the β-like globin genes. Mol. Phylogenet. Evol. 5, 18–32 (1996). References 46 and 48 illustrate the power of comparative genomic analyses through phylogenetic footprints of globin genes.
Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).
Cross, S. H., Clark, V. H. & Bird, A. P. Isolation of CpG islands from large genomic clones . Nucleic Acids Res. 27, 2099– 2107 (1999).
John, R. M., Robbins, C. A. & Myers, R. M. Identification of genes within CpG-enriched DNA from human chromosome 4p16.3. Hum. Mol. Genet. 3, 1611–1616 (1994).
Watanabe, T. et al. Isolation of estrogen-responsive genes with a CpG island library . Mol. Cell. Biol. 18, 442– 449 (1998).
Larsen, F., Gundersen, G. & Prydz, H. Choice of enzymes for mapping based on CpG islands in the human genome. Genet. Anal. Tech. Appl. 9, 80–85 (1992).
Kato, R. & Sasaki, H. Quick identification and localization of CpG islands in large genomic fragments by partial digestion with HpaII and HhaI. DNA Res. 5, 287– 295 (1998).
Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 ( 1999); erratum 404, 904 ( 2000).
Ioshikhes, I. P. & Zhang, M. Q. Large-scale human promoter mapping using CpG islands. Nature Genet. 26 , 61–63 (2000).
Bucher, P. Regulatory elements and expression profiles. Curr. Opin. Struct. Biol. 9, 400–407 ( 1999).
Greenfield, A. Applications of DNA microarrays to the transcriptional analysis of mammalian genomes. Mamm. Genome 11, 609– 613 (2000).
Hill, A. A., Hunter, C. P., Tsung, B. T., Tucker-Kellogg, G. & Brown, E. L. Genomic analysis of gene expression in C. elegans. Science 290, 809– 812 (2000).
Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W. & Lawrence, C. E. Human-mouse genome comparisons to locate regulatory sites. Nature Genet. 26, 225–228 (2000).
King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107– 116 (1975).A landmark paper highlighting the large amount of sequence conservation between humans and chimpanzees, indicating that regulatory differences might account for the varying phenotypes between the two species.
Luo, Z. In search of the whales' sisters. Nature 404, 235–237 (2000).
Arnason, U., Gullberg, A., Gretarsdottir, S., Ursing, B. & Janke, A. The mitochondrial genome of the sperm whale and a new molecular reference for estimating eutherian divergence dates . J. Mol. Evol. 50, 569– 578 (2000).
Ursing, B. M. & Arnason, U. Analyses of mitochondrial genomes strongly support a hippopotamus-whale clade. Proc. R. Soc. Lond. B 265, 2251–2255 ( 1998).
Shimamura, M. et al. Molecular evidence from retroposons that whales form a clade within even-toed ungulates. Nature 388, 666–670 (1997).
Nikaido, M., Rooney, A. P. & Okada, N. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interpersed elements: hippopotamuses are the closest extant relatives of whales. Proc. Natl Acad. Sci. USA 96, 10261–10266 (1999).
Blackwood, E. M. & Kadonaga, J. T. Going the distance: a current view of enhancer action. Science 281, 61–63 (1998).
Fraser, P. & Grosveld, F. Locus control regions, chromatin activation and transcription. Curr. Opin. Cell Biol. 10, 361–365 (1998).
Grosveld, F. Activation by locus control regions? Curr. Opin. Genet. Dev. 9, 152–157 (1999).
Bell, A. C. & Felsenfeld, G. Stopped at the border: boundaries and insulators. Curr. Opin. Genet. Dev. 9, 191–198 (1999).
Geyer, P. K. The role of insulator elements in defining domains of gene expression. Curr. Opin. Genet. Dev. 7, 242–248 (1997).
Ogbourne, S. & Antalis, T. M. Transcriptional control and the role of silencers in transcriptional regulation in eukaryotes. Biochem J. 331, 1–14 ( 1998).
Hart, C. M. & Laemmli, U. K. Facilitation of chromatin dynamics by SARs. Curr. Opin. Genet. Dev. 8, 519– 525 (1998).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. & Lander, E. S. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 ( 2000).
Delcher, A. L. et al. Alignment of whole genomes. Nucleic Acids Res. 27, 2369–2376 ( 1999).
Mayor, C. et al. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics (in the press).
Schwartz, S. et al. PipMaker — a web server for aligning two genomic DNA sequences. Genome Res. 10, 577– 586 (2000).
Acknowledgements
This research was supported by a Programs for Genomic Applications grant awarded to E.M.R. from the NHLBI and conducted at the E.O. Lawrence Berkeley National Laboratory, University of California, sponsored by the Department of Energy, as well as an appointment to the Alexander Hollaender Distinguished Postdoctoral Fellowship Program sponsored by the US Department of Energy, Office of Biological and Environmental Research, and administered by the Oak Ridge Institute for Science and Education (L.A.P.). We thank M. Biggin, J. Bristow, I. Dubchak, C. Prangeand D. Symula for their thoughtful comments.
Author information
Authors and Affiliations
Related links
Glossary
- DNASEI HYPERSENSITIVITY ASSAY
-
Identifies regions of the genome that lack nucleosome structure and are therefore readily degraded by the enzyme DNaseI. Such regions tend to be associated with transcriptional activity.
- DNA FOOTPRINTING ASSAY
-
An assay that identifies a region of DNA that is protected from digestion by DNaseI (usually due to the binding of a protein, such as a transcription factor).
- GEL SHIFT ASSAY
-
A gel-based assay in which proteins that bind to a DNA fragment are detected by virtue of the reduced migration of the DNA. The assay is often used to detect transcription factor binding.
- CRE RECOMBINASE SYSTEM
-
A method in which the Cre enzyme catalyses recombination between loxP sequences. If the loxP sequences are arranged as a direct repeat, recombination will delete the DNA between the sites.
- CPG ISLANDS
-
Sequences of at least 200 bp with greater than 50% G+C content and high CpG frequency.
- FLAT FILE
-
A computer readable file or database in which records are not connected or 'related'. Similar to a card index.
- RELATIONAL DATABASE
-
A storage format in which data items can be stored in separate files but linked together to form different relations. This system allows greater flexibility than a flat file format.
- MLUI CELL-CYCLE BOX
-
An 8-bp motif (ACGCGTNA) that promotes the transcription of genes involved in DNA replication in yeast.
Rights and permissions
About this article
Cite this article
Pennacchio, L., Rubin, E. Genomic strategies to identify mammalian regulatory sequences. Nat Rev Genet 2, 100–109 (2001). https://doi.org/10.1038/35052548
Issue Date:
DOI: https://doi.org/10.1038/35052548