A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis
- PMID: 22412381
- PMCID: PMC3297574
- DOI: 10.1371/journal.pgen.1002531
A machine learning approach for identifying novel cell type-specific transcriptional regulators of myogenesis
Abstract
Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA-based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type-specific developmental gene expression patterns.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Similar articles
-
Integrative analysis of the zinc finger transcription factor Lame duck in the Drosophila myogenic gene regulatory network.Proc Natl Acad Sci U S A. 2012 Dec 11;109(50):20768-73. doi: 10.1073/pnas.1210415109. Epub 2012 Nov 26. Proc Natl Acad Sci U S A. 2012. PMID: 23184988 Free PMC article.
-
Contribution of distinct homeodomain DNA binding specificities to Drosophila embryonic mesodermal cell-specific gene expression programs.PLoS One. 2013 Jul 26;8(7):e69385. doi: 10.1371/journal.pone.0069385. Print 2013. PLoS One. 2013. PMID: 23922708 Free PMC article.
-
Quantitative-enhancer-FACS-seq (QeFS) reveals epistatic interactions among motifs within transcriptional enhancers in developing Drosophila tissue.Genome Biol. 2021 Dec 20;22(1):348. doi: 10.1186/s13059-021-02574-x. Genome Biol. 2021. PMID: 34930411 Free PMC article.
-
Vertebrate hairy and Enhancer of split related proteins: transcriptional repressors regulating cellular differentiation and embryonic patterning.Oncogene. 2001 Dec 20;20(58):8342-57. doi: 10.1038/sj.onc.1205094. Oncogene. 2001. PMID: 11840327 Review.
-
Combinatorial transcriptional regulation: the interaction of transcription factors and cell signaling molecules with homeodomain proteins in Drosophila development.Crit Rev Eukaryot Gene Expr. 2001;11(1-3):145-71. Crit Rev Eukaryot Gene Expr. 2001. PMID: 11693959 Review.
Cited by
-
Cis-regulatory architecture of a brain signaling center predates the origin of chordates.Nat Genet. 2016 May;48(5):575-80. doi: 10.1038/ng.3542. Epub 2016 Apr 11. Nat Genet. 2016. PMID: 27064252 Free PMC article.
-
Genome-wide screens for in vivo Tinman binding sites identify cardiac enhancers with diverse functional architectures.PLoS Genet. 2013;9(1):e1003195. doi: 10.1371/journal.pgen.1003195. Epub 2013 Jan 10. PLoS Genet. 2013. PMID: 23326246 Free PMC article.
-
What does our genome encode?Genome Res. 2012 Sep;22(9):1602-11. doi: 10.1101/gr.146506.112. Genome Res. 2012. PMID: 22955972 Free PMC article.
-
Two Forkhead transcription factors regulate cardiac progenitor specification by controlling the expression of receptors of the fibroblast growth factor and Wnt signaling pathways.Development. 2016 Jan 15;143(2):306-17. doi: 10.1242/dev.122952. Epub 2015 Dec 10. Development. 2016. PMID: 26657774 Free PMC article.
-
Differential regulation of mesodermal gene expression by Drosophila cell type-specific Forkhead transcription factors.Development. 2012 Apr;139(8):1457-66. doi: 10.1242/dev.069005. Epub 2012 Feb 29. Development. 2012. PMID: 22378636 Free PMC article.
References
-
- Carroll SB, Grenier JK, Weatherbee SD. From DNA to Diversity. Molecular Genetics and the Evolution of Animal Design. Malden, Massachusetts: Blackwell Publishing; 2005.
-
- Davidson E. 2006. 304 The Regulatory Genome: Gene Regulatory Networks In Development And Evolution: Academic Press.
-
- Philippakis AA, Busser BW, Gisselbrecht SS, He FS, Estrada B, et al. Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells. PLoS Comput Biol. 2006;2:e53. doi: 10.1371/journal.pcbi.0020053. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Miscellaneous