Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 May;2(5):e53.
doi: 10.1371/journal.pcbi.0020053. Epub 2006 May 26.

Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells

Affiliations

Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells

Anthony A Philippakis et al. PLoS Comput Biol. 2006 May.

Abstract

While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed "CodeFinder") that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Genetic Regulation of Founder Cell Fates
(A) Schematic of Wg, Ras, and Dpp signal transduction cascades responsible for specifying FC fates. Transmembrane receptors (fork-shapes), their ligands (squares), intracellular signaling molecules (octagons), and target TFs (ovals) are shown and colored by pathway. (B) Schematic of eve transcriptional regulation. Shown in thick solid arrows are the signaling inputs from the Wg, Dpp, and Ras pathways. Shown in thinner arrows are the genetic interactions linking these signals to their downstream TFs; solid arrows indicate interactions between proteins of the same pathway, and dotted arrows indicate known interactions between pathways. Colored circles indicate the five TFs (dTCF, Mad, Pnt, Twi, and Tin) known to drive eve expression within FCs.
Figure 2
Figure 2. Inspection of an FC TRM Composed of dTCF/Mad/Pnt/Twi/Tin
(A) Detection rate of the 159 known FC genes as compared to all other D. melanogaster genes, when genes are ranked by the amount of associated non-coding, non-repetitive sequence. The X-axis indicates a given cutoff rank; the Y-axis indicates the fraction of either the 159 FC genes (solid line) or the non-FC genes (dotted line) observed to have a length greater than the corresponding cutoff rank. (B) Detection rates of the 159 known FC genes (solid line) and a set of length-matched background sequences (dashed line; see Materials and Methods) when ranked by length; it can be seen that these curves are largely overlapping. (C) Detection rates of the 159 known FC genes as compared to length-matched background sequences, when genes are ranked by ModuleFinder scores using a scan in which any combination of the five TFs can contribute to the score. Again, the X-axis indicates a given cutoff rank and the Y-axis indicates the fraction of the 159 FC genes (solid curve) or background sequences (dotted curve) with ModuleFinder scores better than the given cutoff rank. For all panels, the area between these curves is computed, and its statistical significance is computed using the WMW-statistic (see Materials and Methods).
Figure 3
Figure 3. Changes in Expression of FC Genes in a Pnt gof Mutant Background
(A) Detection rate of the 159 known FC genes in a Pnt gof expression profile. All genes are ranked according to the t-statistic (see Estrada et al. [21]) indicating their up- or down-regulation in a Pnt gof mutant background (the most up-regulated genes are positioned at the left). As in Figure 2, detection rates of the 159 known FC genes (solid line) and all other genes (dashed line) are shown. (B) Difference between the detection rate curves of (A); leading and trailing edges indicate the points of maximal difference. (C) t-statistics for all genes in the Pnt gof expression profile.
Figure 4
Figure 4. Enrichment for the FC TRM in PLE Genes
(A) PLE and background genes were scanned by ModuleFinder using dTCF/Mad/Pnt/Twi/Tin and sorted by score in decreasing order. As in Figure 2, detection curves for PLE genes and non-PLE genes are shown. (B) PLE and background genes were scanned by ModuleFinder using only the Pnt motif and sorted in decreasing order. (C–E) Area between PLE and non-PLE detection curves is shown when scanning with the TFs dTCF/Mad/Pnt/Twi/Tin either individually (C), with all AND and OR combinations involving four or five TFs (D), or all AND combinations involving three TFs (E). (F–G) Dotted lines indicate threshold statistical significance values of p < 0.05, as computed by WMW. Also shown are the detection rate curves using the PTE as a foreground set using the OR combination dTCF/Mad/Pnt/Twi/Tin (F), as well as the Pnt motif alone (G).
Figure 5
Figure 5. An Expression Cluster of Genes Enriched for Pnt AND Twi AND Tin
(A) Clustering of the 159 FC genes and the 12 expression profiles of Estrada et al. [21], using self-organizing map clustering followed by hierarchical clustering. Note that all columns are median-centered. The red box indicates a gene cluster (C1) that contains eve and whose genes show similar expression profiles. Here, abbreviations are EGFR = EGF receptor gof; FGFR = FGF receptor gof; Arm+Ras = armadillo and Ras gof; Ras = Ras gof; Pnt = pointed gof; Arm = armadillo gof; Dl = Delta lof; Lmd = Lameduck lof; Wg = wingless lof; Spi = spitz lof; Tkv = thickveins gof; N = Notch gof. (B) Detection rate curves for the OR combination of dTCF/Mad/Pnt/Twi/Tin using C1 as a foreground gene set. (C–E) Area between C1 and non-C1 detection curves is shown when scanning with the TFs dTCF/Mad/Pnt/Twi/Tin either individually (C), with all AND and OR combinations involving four or five TFs (D), or all AND combinations involving three TFs (E). Dotted lines indicate threshold statistical significance values of p < 0.01 and p < 0.001, as computed by WMW. (F) Detection rate curves for Pnt AND Twi AND Tin combinations using C1 as a foreground gene set.
Figure 6
Figure 6. Analysis of PNC Genes and Their Associated TRM
(A) Detection rates of PNC genes (after removing seven genes that are also FC genes) as compared to background regions using the OR combination of dTCF, Mad, Pnt, Twi, Tin (negative control). (B) Detection rates of C1 genes (after removing genes that are also PNC genes) as compared to background regions using the combination Ac/Sc OR Su(H) (negative control). (C) Detection rate of PNC genes as compared to non-PNC genes using Ac/Sc OR Su(H). (D) Area between PNC and background region detection rate curves for all AND and OR combinations of Ac/Sc and Su(H). (E) Detection rate of non-SOP genes as compared to background regions using Ac/Sc OR Su(H). (F) Area between non-SOP and background region detection rate curves for all AND and OR combinations of Ac/Sc and Su(H). (G) Detection rate of SOP genes as compared to background genes using Ac/Sc OR Su(H). (H) Area between SOP and background region detection rate curves for all AND and OR combinations of Ac/Sc and Su(H).
Figure 7
Figure 7. Schematic Representation of Tested Regions Associated with FC Genes
The ModuleFinder prediction, TFBS composition, ModuleFinder score, genomic location and actual genomic region tested from regions associated with FC genes from C1 (A) or not included in C1 (B).
Figure 8
Figure 8. Empirical Validation of Predicted FC Transcriptional Enhancers
Expression of Ndg (A), mib2 (F), phyl (K), and lbl (P) mRNA in stage 11 wild type embryos detected by in situ hybridization. Arrowheads in (P) highlight lbl-expressing FCs. β-galactosidase expression from Ndg-lacZ (B), mib2-lacZ (G), phyl-lacZ (L), and lbl-lacZ (Q) constructs in stage-11 embryos detected by immunohistochemistry. Fluorescent in situ hybridization analysis of stage-11 embryos for Ndg (C), lacZ (D) mRNA, and merge (E) from Ndg-lacZ embryos; mib2 (H), lacZ (I) mRNA, and merge (J) from mib2-lacZ embryos; phyl (M), lacZ (N) mRNA, and merge (O) from phyl-lacZ embryos; and lbl (R), lacZ (S) mRNA and merge (T) from lbl-lacZ embryos.
Figure 9
Figure 9. Summary of New Hypotheses Derived from the Present Analysis
(A) Venn diagram depicting various FC gene subsets. Cluster 1 (C1) and the Pnt leading edge (PLE) genes are likely only a subset of all Pnt target genes (dashed ellipse), and additional FC genes appear to be unresponsive to Pnt. (B) Schematic of complexities in FC gene regulation. Analysis of the eve mesodermal enhancer initially directed our attention to the TFs dTCF, Pnt, Mad, Twi, and Tin. CodeFinder analysis and subsequent experimental validation implicated a subset of these TFs (Pnt, Twi, Tin) in the regulation of genes from C1, as exemplified by Ndg. Additional (non-C1) genes are predicted to respond to Pnt in combination with other factors yet to be determined (X; grey lines represent hypothetical enhancers). Still other classes of FC genes will respond to different codes, which may include input from FC genes known to encode TFs.

Similar articles

Cited by

References

    1. Davidson EH. Genomic regulatory systems. San Diego (California): Academic Press; 2001. 261. p.
    1. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A. 2002;99:757–762. - PMC - PubMed
    1. Berman BP, Pfeiffer BD, Laverty TR, Salzberg SL, Rubin GM, et al. Computational identification of developmental enhancers: Conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura . Genome Biol. 2004;5((9)):R61. - PMC - PubMed
    1. Emberly E, Rajewsky N, Siggia ED. Conservation of regulatory elements between two species of Drosophila . BMC Bioinformatics. 2003;4:57. - PMC - PubMed
    1. Rajewsky N, Vergassola M, Gaul U, Siggia ED. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics. 2002;3:30. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources