RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets

doi:10.1093/nar/gkr1104

. 2012 Feb;40(4):e31.

doi: 10.1093/nar/gkr1104. Epub 2011 Dec 8.

RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets

Morgane Thomas-Chollier¹, Carl Herrmann, Matthieu Defrance, Olivier Sand, Denis Thieffry, Jacques van Helden

Affiliations

PMID: 22156162
PMCID: PMC3287167
DOI: 10.1093/nar/gkr1104

RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets

Morgane Thomas-Chollier et al. Nucleic Acids Res. 2012 Feb.

. 2012 Feb;40(4):e31.

doi: 10.1093/nar/gkr1104. Epub 2011 Dec 8.

Authors

Morgane Thomas-Chollier¹, Carl Herrmann, Matthieu Defrance, Olivier Sand, Denis Thieffry, Jacques van Helden

Affiliation

¹ Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, Berlin 14195, Germany.

PMID: 22156162
PMCID: PMC3287167
DOI: 10.1093/nar/gkr1104

Abstract

ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1,28,000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.

PubMed Disclaimer

Figures

**Figure 1.**
Schematic flow chart of the peak-motifs pipeline. For sake of clarity, only the main analysis steps are depicted. The pipeline takes as input a set of peak sequences, and runs several *de novo* motif discovery algorithms based on different detection criteria: over-representation, differential representation (test versus control), global position bias or local over-representation along the centered peaks. Transcription factors are predicted by matching discovered motifs against several public motif databases and/or against user-uploaded motif collections. Peak sequences are scanned with the discovered motifs to predict precise binding positions. These positions are then automatically exported as an annotation track for UCSC genome browser, thus enabling a flexible visualization in their genomic context.

**Figure 2.**
Time efficiency of motif discovery algorithms integrated in peak-motifs (plain lines) compared to alternative algorithms (dotted lines). The abscissa indicates sequence sizes, the ordinate processing times. The programs oligo-, dyad-, position-analysis and DREME show a linear time complexity (the power is ∼1), ChIPMunk has a quasi-linear complexity (power 1.27) and MEME a more than quadratic complexity (power 2.21). See Supplementary File S1 for the detailed analysis.

**Figure 3.**
Most significant motifs discovered with the different algorithms encompassed by peak-motifs for ChIP-seq peak collections pulled down with 12 transcription factors involved in ES cell pluripotency (20). The first three columns indicate the studied transcription factor and the size of the data set (in number of peaks and in Mb). The fourth and fifth columns display the ID and consensus of the chosen reference motif. The sixth column shows the best motif found by peak-motifs, followed by two estimations of the correlation between the discovered and the matched motifs (Cor and Cov). The following columns detail which algorithm(s) detected this motif, and which motifs from the Jaspar and Tranfac databases were similar to the found motif.

**Figure 4.**
Logos of the motifs discovered by peak-motifs for the factors Oct4, Sox2, Nanog and E2f1 adapted from the ChIP-seq data set by Chen *et al*. (20).

**Figure 5.**
Network of motifs discovered in the p300 data set. Each node represents a motif; the shape and color of the node denote the tissue (for the p300 datasets) and the ChIPed-factor (for the HL1 cell-line datasets, used as a validation), respectively. Two motifs are joined by a line if their normalized correlation is above 0.75; the width of the line denotes the degree of correlation. Node labels refer to the algorithm used to discover the motif: L (local-words), P (position-analysis), O (oligo-analysis), D (dyad-analysis) as well as the considered word length (6 or 7). The names of the transcription factor(s) likely associated with the motif clusters are also indicated, together with a representative logo.

See this image and copyright information in PMC

Cited by

i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly.
Imrichová H, Hulselmans G, Atak ZK, Potier D, Aerts S. Imrichová H, et al. Nucleic Acids Res. 2015 Jul 1;43(W1):W57-64. doi: 10.1093/nar/gkv395. Epub 2015 Apr 29. Nucleic Acids Res. 2015. PMID: 25925574 Free PMC article.
Functional characterization of the Arabidopsis transcription factor bZIP29 reveals its role in leaf and root development.
Van Leene J, Blomme J, Kulkarni SR, Cannoot B, De Winne N, Eeckhout D, Persiau G, Van De Slijke E, Vercruysse L, Vanden Bossche R, Heyndrickx KS, Vanneste S, Goossens A, Gevaert K, Vandepoele K, Gonzalez N, Inzé D, De Jaeger G. Van Leene J, et al. J Exp Bot. 2016 Oct;67(19):5825-5840. doi: 10.1093/jxb/erw347. Epub 2016 Sep 22. J Exp Bot. 2016. PMID: 27660483 Free PMC article.
The Arabidopsis hnRNP-Q Protein LIF2 and the PRC1 Subunit LHP1 Function in Concert to Regulate the Transcription of Stress-Responsive Genes.
Molitor AM, Latrasse D, Zytnicki M, Andrey P, Houba-Hérin N, Hachet M, Battail C, Del Prete S, Alberti A, Quesneville H, Gaudin V. Molitor AM, et al. Plant Cell. 2016 Sep;28(9):2197-2211. doi: 10.1105/tpc.16.00244. Epub 2016 Aug 5. Plant Cell. 2016. PMID: 27495811 Free PMC article.
Small molecule inhibition of cAMP response element binding protein in human acute myeloid leukemia cells.
Mitton B, Chae HD, Hsu K, Dutta R, Aldana-Masangkay G, Ferrari R, Davis K, Tiu BC, Kaul A, Lacayo N, Dahl G, Xie F, Li BX, Breese MR, Landaw EM, Nolan G, Pellegrini M, Romanov S, Xiao X, Sakamoto KM. Mitton B, et al. Leukemia. 2016 Dec;30(12):2302-2311. doi: 10.1038/leu.2016.139. Epub 2016 May 23. Leukemia. 2016. PMID: 27211267 Free PMC article.
Discriminative motif optimization based on perceptron training.
Patel RY, Stormo GD. Patel RY, et al. Bioinformatics. 2014 Apr 1;30(7):941-8. doi: 10.1093/bioinformatics/btt748. Epub 2013 Dec 24. Bioinformatics. 2014. PMID: 24369152 Free PMC article.

See all "Cited by" articles

References

1. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods. 2007;4:651–657. - PubMed
1. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science. 2007;316:1497–1502. - PubMed
1. Boeva V, Surdez D, Guillon N, Tirode F, Fejes AP, Delattre O, Barillot E. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 2010;38:e126. - PMC - PubMed
1. Machanick P, Bailey TL. MEME-ChIP: Motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. - PMC - PubMed
1. Bailey TL. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–1659. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods. 2007;4:651–657. - PubMed

[2] Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods. 2007;4:651–657. - PubMed

[3] Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science. 2007;316:1497–1502. - PubMed

[4] Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science. 2007;316:1497–1502. - PubMed

[5] Boeva V, Surdez D, Guillon N, Tirode F, Fejes AP, Delattre O, Barillot E. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 2010;38:e126. - PMC - PubMed

[6] Boeva V, Surdez D, Guillon N, Tirode F, Fejes AP, Delattre O, Barillot E. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 2010;38:e126. - PMC - PubMed

[7] Machanick P, Bailey TL. MEME-ChIP: Motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. - PMC - PubMed

[8] Machanick P, Bailey TL. MEME-ChIP: Motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. - PMC - PubMed

[9] Bailey TL. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–1659. - PMC - PubMed

[10] Bailey TL. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–1659. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets

Affiliation

RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous