DREME: motif discovery in transcription factor ChIP-seq data

doi:10.1093/bioinformatics/btr261

. 2011 Jun 15;27(12):1653-9.

doi: 10.1093/bioinformatics/btr261. Epub 2011 May 4.

DREME: motif discovery in transcription factor ChIP-seq data

Timothy L Bailey¹

Affiliations

PMID: 21543442
PMCID: PMC3106199
DOI: 10.1093/bioinformatics/btr261

DREME: motif discovery in transcription factor ChIP-seq data

Timothy L Bailey. Bioinformatics. 2011.

. 2011 Jun 15;27(12):1653-9.

doi: 10.1093/bioinformatics/btr261. Epub 2011 May 4.

Author

Timothy L Bailey¹

Affiliation

¹ Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia. t.bailey@uq.edu.au

PMID: 21543442
PMCID: PMC3106199
DOI: 10.1093/bioinformatics/btr261

Abstract

Motivation: Transcription factor (TF) ChIP-seq datasets have particular characteristics that provide unique challenges and opportunities for motif discovery. Most existing motif discovery algorithms do not scale well to such large datasets, or fail to report many motifs associated with cofactors of the ChIP-ed TF.

Results: We present DREME, a motif discovery algorithm specifically designed to find the short, core DNA-binding motifs of eukaryotic TFs, and optimized to analyze very large ChIP-seq datasets in minutes. Using DREME, we discover the binding motifs of the the ChIP-ed TF and many cofactors in mouse ES cell (mESC), mouse erythrocyte and human cell line ChIP-seq datasets. For example, in mESC ChIP-seq data for the TF Esrrb, we discover the binding motifs for eight cofactor TFs important in the maintenance of pluripotency. Several other commonly used algorithms find at most two cofactor motifs in this same dataset. DREME can also perform discriminative motif discovery, and we use this feature to provide evidence that Sox2 and Oct4 do not bind in mES cells as an obligate heterodimer. DREME is much faster than many commonly used algorithms, scales linearly in dataset size, finds multiple, non-redundant motifs and reports a reliable measure of statistical significance for each motif found. DREME is available as part of the MEME Suite of motif-based sequence analysis tools (http://meme.nbcr.net).

PubMed Disclaimer

Figures

**Fig. 1.**
Comparison of DREME mESC TF ChIP-seq motifs with *in vitro* motifs. Each panel shows the logo of the *in vivo* binding motif discovered by DREME in the designated TF ChIP-seq dataset (lower logo) aligned with the logo of the best available *in vitro* motif (upper logo). Since no *in vitro* motifs are available for Sox2, Oct4 and E2f1, UniProbe motifs for closely related TF family members Sox11, Pou2f3 and E2f3 are used. The *in vitro* motif for Nanog is taken from Jauch *et al.* (2008).

**Fig. 2.**
Discriminative motif discovery in mESC ChIP-seq datasets. Panels (a) and (b) show the logo of the binding motif discovered by DREME in the two designated TF ChIP-seq datasets (lower logo) aligned with the logo of a known motif for the ChIP-ed TF (upper logo). (a) Upper logo is known Oct4 motif (Pou-family member Pou3f3, UniProbe Pou3f3_3235.2). (b) Upper logo is known Sox2 motif (TRANSFAC M01272). (c) Shows the most significant motif found by DREME in the Nanog dataset using (top to bottom) the shuffled Nanog dataset, the Oct4 dataset or the Sox2 dataset as the negative set.

**Fig. 3.**
Comparison of motif discovery algorithms. (a) The table shows the average number of motifs discovered (N), number of datasets in which the ChIP-ed motif was found (S), the average number of identifiable co-factor motifs found (C), and the average running time in seconds of the algorithm on the mESC ChIP-seq datasets. Bold font indicates best performance. Note: Times for nestedMICA and MEME are for the reduced size datasets (0.5 megabase-pairs). (b) The plot shows the running times for DREME, Amadeus, Trawler and WEEDER on the full-size mESC ChIP-seq datasets. Inset plot is the same data plotted with log scales on both axes.

See this image and copyright information in PMC

Cited by

PreDREM: a database of predicted DNA regulatory motifs from 349 human cell and tissue samples.
Zheng Y, Li X, Hu H. Zheng Y, et al. Database (Oxford). 2015 Feb 27;2015:bav007. doi: 10.1093/database/bav007. Print 2015. Database (Oxford). 2015. PMID: 25725063 Free PMC article.
HEPeak: an HMM-based exome peak-finding package for RNA epigenome sequencing data.
Cui X, Meng J, Rao MK, Chen Y, Huang Y. Cui X, et al. BMC Genomics. 2015;16 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2164-16-S4-S2. Epub 2015 Apr 21. BMC Genomics. 2015. PMID: 25917296 Free PMC article.
Nanopore long-read RNA-seq and absolute quantification delineate transcription dynamics in early embryo development of an insect pest.
Bayega A, Oikonomopoulos S, Gregoriou ME, Tsoumani KT, Giakountis A, Wang YC, Mathiopoulos KD, Ragoussis J. Bayega A, et al. Sci Rep. 2021 Apr 12;11(1):7878. doi: 10.1038/s41598-021-86753-7. Sci Rep. 2021. PMID: 33846393 Free PMC article.
Photosynthetic Genes and Genes Associated with the C4 Trait in Maize Are Characterized by a Unique Class of Highly Regulated Histone Acetylation Peaks on Upstream Promoters.
Perduns R, Horst-Niessen I, Peterhansel C. Perduns R, et al. Plant Physiol. 2015 Aug;168(4):1378-88. doi: 10.1104/pp.15.00934. Epub 2015 Jun 25. Plant Physiol. 2015. PMID: 26111542 Free PMC article.
The RNA binding protein FgRbp1 regulates specific pre-mRNA splicing via interacting with U2AF23 in Fusarium.
Wang M, Ma T, Wang H, Liu J, Chen Y, Shim WB, Ma Z. Wang M, et al. Nat Commun. 2021 May 11;12(1):2661. doi: 10.1038/s41467-021-22917-3. Nat Commun. 2021. PMID: 33976182 Free PMC article.

See all "Cited by" articles

References

1. Bailey TL, Elkan C. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1995;3:21–29. - PubMed
1. Barash Y, et al. A simple hyper-geometric approach for discovering putative transcription factor binding sites. In: Gascuel O, Moret BME, editors. Algorithms in Bioinformatics: Proceedings of the First International Workshop. 2001. Vol. 2149 in Lecture Notes in Computer Science, Springer, pp. 278–293.
1. Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the dna-binding specificities of transcription factors. Nat. Protoc. 2009;4:393–411. - PMC - PubMed
1. Bieda M, et al. Unbiased location analysis of E2F1-binding sites suggests a widespread role for e2f1 in the human genome. Genome Res. 2006;16:595–605. - PMC - PubMed
1. Boyle AP, et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 2010;21:456–464. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

[1] Bailey TL, Elkan C. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1995;3:21–29. - PubMed

[2] Bailey TL, Elkan C. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1995;3:21–29. - PubMed

[3] Barash Y, et al. A simple hyper-geometric approach for discovering putative transcription factor binding sites. In: Gascuel O, Moret BME, editors. Algorithms in Bioinformatics: Proceedings of the First International Workshop. 2001. Vol. 2149 in Lecture Notes in Computer Science, Springer, pp. 278–293.

[4] Barash Y, et al. A simple hyper-geometric approach for discovering putative transcription factor binding sites. In: Gascuel O, Moret BME, editors. Algorithms in Bioinformatics: Proceedings of the First International Workshop. 2001. Vol. 2149 in Lecture Notes in Computer Science, Springer, pp. 278–293.

[5] Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the dna-binding specificities of transcription factors. Nat. Protoc. 2009;4:393–411. - PMC - PubMed

[6] Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the dna-binding specificities of transcription factors. Nat. Protoc. 2009;4:393–411. - PMC - PubMed

[7] Bieda M, et al. Unbiased location analysis of E2F1-binding sites suggests a widespread role for e2f1 in the human genome. Genome Res. 2006;16:595–605. - PMC - PubMed

[8] Bieda M, et al. Unbiased location analysis of E2F1-binding sites suggests a widespread role for e2f1 in the human genome. Genome Res. 2006;16:595–605. - PMC - PubMed

[9] Boyle AP, et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 2010;21:456–464. - PMC - PubMed

[10] Boyle AP, et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 2010;21:456–464. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DREME: motif discovery in transcription factor ChIP-seq data

Affiliation

DREME: motif discovery in transcription factor ChIP-seq data

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous