Detection of functional DNA motifs via statistical over-representation
- PMID: 14988425
- PMCID: PMC390287
- DOI: 10.1093/nar/gkh299
Detection of functional DNA motifs via statistical over-representation
Abstract
The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.
Figures
Similar articles
-
Ab initio identification of putative human transcription factor binding sites by comparative genomics.BMC Bioinformatics. 2005 May 2;6:110. doi: 10.1186/1471-2105-6-110. BMC Bioinformatics. 2005. PMID: 15865625 Free PMC article.
-
DNA motif representation with nucleotide dependency.IEEE/ACM Trans Comput Biol Bioinform. 2008 Jan-Mar;5(1):110-9. doi: 10.1109/TCBB.2007.70220. IEEE/ACM Trans Comput Biol Bioinform. 2008. PMID: 18245880
-
Finding motifs from all sequences with and without binding sites.Bioinformatics. 2006 Sep 15;22(18):2217-23. doi: 10.1093/bioinformatics/btl371. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870937
-
YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation.Nucleic Acids Res. 2003 Jul 1;31(13):3586-8. doi: 10.1093/nar/gkg618. Nucleic Acids Res. 2003. PMID: 12824371 Free PMC article.
-
DNA binding sites: representation and discovery.Bioinformatics. 2000 Jan;16(1):16-23. doi: 10.1093/bioinformatics/16.1.16. Bioinformatics. 2000. PMID: 10812473 Review.
Cited by
-
High-throughput cis-regulatory element discovery in the vector mosquito Aedes aegypti.BMC Genomics. 2016 May 10;17:341. doi: 10.1186/s12864-016-2468-x. BMC Genomics. 2016. PMID: 27161480 Free PMC article.
-
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.BMC Plant Biol. 2013 Mar 15;13:42. doi: 10.1186/1471-2229-13-42. BMC Plant Biol. 2013. PMID: 23497159 Free PMC article.
-
Functional analysis of transcription factor binding sites in human promoters.Genome Biol. 2012 Sep 26;13(9):R50. doi: 10.1186/gb-2012-13-9-r50. Genome Biol. 2012. PMID: 22951020 Free PMC article.
-
Wide-scale analysis of human functional transcription factor binding reveals a strong bias towards the transcription start site.PLoS One. 2007 Aug 29;2(8):e807. doi: 10.1371/journal.pone.0000807. PLoS One. 2007. PMID: 17726537 Free PMC article.
-
MotifViz: an analysis and visualization tool for motif discovery.Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W420-3. doi: 10.1093/nar/gkh426. Nucleic Acids Res. 2004. PMID: 15215422 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases