Searching for statistically significant regulatory modules
- PMID: 14534166
- DOI: 10.1093/bioinformatics/btg1054
Searching for statistically significant regulatory modules
Abstract
Motivation: The regulatory machinery controlling gene expression is complex, frequently requiring multiple, simultaneous DNA-protein interactions. The rate at which a gene is transcribed may depend upon the presence or absence of a collection of transcription factors bound to the DNA near the gene. Locating transcription factor binding sites in genomic DNA is difficult because the individual sites are small and tend to occur frequently by chance. True binding sites may be identified by their tendency to occur in clusters, sometimes known as regulatory modules.
Results: We describe an algorithm for detecting occurrences of regulatory modules in genomic DNA. The algorithm, called mcast, takes as input a DNA database and a collection of binding site motifs that are known to operate in concert. mcast uses a motif-based hidden Markov model with several novel features. The model incorporates motif-specific p-values, thereby allowing scores from motifs of different widths and specificities to be compared directly. The p-value scoring also allows mcast to only accept motif occurrences with significance below a user-specified threshold, while still assigning better scores to motif occurrences with lower p-values. mcast can search long DNA sequences, modeling length distributions between motifs within a regulatory module, but ignoring length distributions between modules. The algorithm produces a list of predicted regulatory modules, ranked by E-value. We validate the algorithm using simulated data as well as real data sets from fruitfly and human.
Availability: http://meme.sdsc.edu/MCAST/paper
Similar articles
-
Statistical significance of cis-regulatory modules.BMC Bioinformatics. 2007 Jan 22;8:19. doi: 10.1186/1471-2105-8-19. BMC Bioinformatics. 2007. PMID: 17241466 Free PMC article.
-
Computational detection of cis -regulatory modules.Bioinformatics. 2003 Oct;19 Suppl 2:ii5-14. doi: 10.1093/bioinformatics/btg1052. Bioinformatics. 2003. PMID: 14534164
-
A graph-based approach to systematically reconstruct human transcriptional regulatory modules.Bioinformatics. 2007 Jul 1;23(13):i577-86. doi: 10.1093/bioinformatics/btm227. Bioinformatics. 2007. PMID: 17646346
-
Finding regulatory elements and regulatory motifs: a general probabilistic framework.BMC Bioinformatics. 2007 Sep 27;8 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-8-S6-S4. BMC Bioinformatics. 2007. PMID: 17903285 Free PMC article. Review.
-
Parsing regulatory DNA: general tasks, techniques, and the PhyloGibbs approach.J Biosci. 2007 Aug;32(5):863-70. doi: 10.1007/s12038-007-0086-0. J Biosci. 2007. PMID: 17914228 Review.
Cited by
-
The MEME Suite.Nucleic Acids Res. 2015 Jul 1;43(W1):W39-49. doi: 10.1093/nar/gkv416. Epub 2015 May 7. Nucleic Acids Res. 2015. PMID: 25953851 Free PMC article.
-
MEME: discovering and analyzing DNA and protein sequence motifs.Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73. doi: 10.1093/nar/gkl198. Nucleic Acids Res. 2006. PMID: 16845028 Free PMC article.
-
Comparison between Timelines of Transcriptional Regulation in Mammals, Birds, and Teleost Fish Somitogenesis.PLoS One. 2016 May 18;11(5):e0155802. doi: 10.1371/journal.pone.0155802. eCollection 2016. PLoS One. 2016. PMID: 27192554 Free PMC article.
-
PReMod: a database of genome-wide mammalian cis-regulatory module predictions.Nucleic Acids Res. 2007 Jan;35(Database issue):D122-6. doi: 10.1093/nar/gkl879. Epub 2006 Dec 5. Nucleic Acids Res. 2007. PMID: 17148480 Free PMC article.
-
OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif.BMC Bioinformatics. 2009 Jul 7;10:208. doi: 10.1186/1471-2105-10-208. BMC Bioinformatics. 2009. PMID: 19583839 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources