Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 2;47(21):e139.
doi: 10.1093/nar/gkz800.

A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package

Affiliations

A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package

Victor Levitsky et al. Nucleic Acids Res. .

Abstract

Recognition of composite elements consisting of two transcription factor binding sites gets behind the studies of tissue-, stage- and condition-specific transcription. Genome-wide data on transcription factor binding generated with ChIP-seq method facilitate an identification of composite elements, but the existing bioinformatics tools either require ChIP-seq datasets for both partner transcription factors, or omit composite elements with motifs overlapping. Here we present an universal Motifs Co-Occurrence Tool (MCOT) that retrieves maximum information about overrepresented composite elements from a single ChIP-seq dataset. This includes homo- and heterotypic composite elements of four mutual orientations of motifs, separated with a spacer or overlapping, even if recognition of motifs within composite element requires various stringencies. Analysis of 52 ChIP-seq datasets for 18 human transcription factors confirmed that for over 60% of analyzed datasets and transcription factors predicted co-occurrence of motifs implied experimentally proven protein-protein interaction of respecting transcription factors. Analysis of 164 ChIP-seq datasets for 57 mammalian transcription factors showed that abundance of predicted composite elements with an overlap of motifs compared to those with a spacer more than doubled; and they had 1.5-fold increase of asymmetrical pairs of motifs with one more conservative 'leading' motif and another one 'guided'.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Classification of structural CEs variants with respect to and conservation of anchor and partner motifs (A), their mutual orientation (B), overlap or spacer (C). Cyan, green and light green colors on the panel B distinguish CEs with a spacer, partial and full overlaps, respectively. The color range from red to pink on the panel A denotes the conservation level of a motif; brown/orange and grey colors mark imbalance in CEs with more conserved anchor/partner motifs and a balance between conservation of motifs.
Figure 2.
Figure 2.
MCOT permutation procedure. ‘Foreground’ shows profiles of hits for two motifs, green and blue colors mark fixed and selected for permutation profiles. ‘Masking’ partitions each profile onto ‘clusters’ of hits and spacers. ‘Permutation’ shows a real (top) and shuffled (bottom) orders of clusters and spacers. ‘Alignment quality check’ illustrates the checkpoint of permutation. ‘Background’ shows the result of permutation.
Figure 3.
Figure 3.
MCOT algorithm scheme. Grey color highlights input and output data. Pink and blue colors imply observed and expected data. Motifs mapping in peaks (Recognition) is performed for five stringencies (see Figure 1A) and it prepares profiles of hits for both motifs. These profiles are used to generate background profiles with mutually independent occurrences of anchor and partner motifs (Permutation). Observed and expected profiles of hits for anchor/partner motifs are further used for CEs search. Fisher's exact tests are applied to estimate CE enrichment and CE asymmetry (P-values) (see Materials and Methods). Output data also incorporate P-value that characterizes the similarity of anchor and partner motifs.
Figure 4.
Figure 4.
Examples of predicted CEs. The reciprocal analysis of two ChIP-seq datasets: fine structure of Jun/USF1 CEs (A); novel CEs STAT6/CEBPα (B). Analysis of a single ChIP-seq dataset: novel CEs ZNF341/STAT3 (C). Here we represent the analysis of Jun (A) and STAT6 (B) peaks, the respective reciprocal datasets of USF1 and CEBPα peaks we provided in Supplementary Figures S4 and S8. In reciprocal analyses (A, B) we derived partner motifs from the de novo motif search (17) in a ChIP-seq dataset for the respective TF; analysis of a single ChIP-seq dataset (C) meant extraction of a partner motif from the Hocomoco database (30). In each panel, four charts respect to four mutual orientations of motifs within CEs (Figure 1B), the logo alignment and the arrow point to the most abundant CE variant for each orientation. Axes X denote mutual locations of two motifs (Figure 1C), the ranges of full/partial overlaps and spacers are marked with dark/light grey and white backgrounds. Axes Y denote the fraction of peaks that contains potential CE with a specific mutual location and orientation.
Figure 5.
Figure 5.
Confirmation of predicted CEs with known protein-protein interactions between anchor and partner TFs. Axis X denotes the significance of the Fisher's exact test that checks the enrichment of known protein-protein interactions among anchor and partner TFs that respect to predicted CEs. Axis Y marks ChIP-seq datasets for anchor TFs. The dashed line denotes the Bonferroni-corrected threshold, P-value < 0.01. This figure provides the experimental support for MCOT predictions.
Figure 6.
Figure 6.
Asymmetry of motifs conservation within predicted CEs RELA/IKZF1. The significance of CEs with an overlap of RELA and IKZF1 motifs with anchor RELA (A) and IKZF1 (B) as a function of motif conservation. Red/rose colors denote variation of stringency from the most conservative (red, 1) to the most permissive (rose, 5). Light/dark blue colors mark the significance of CE (P-value < 0.002) (see Materials and Methods). This figure shows that irrespective to the selection of anchor motif in CEs RELA/IKZF1 the motif RELA is more conserved than IKZF1 motif.
Figure 7.
Figure 7.
Abundance and asymmetry of predicted CEs with overlaps of motifs and with spacers. Abundance of heterotypic CEs with overlaps of motifs (A) and those with spacers of length below 30 bp (B) as a function of the CE significance (axes X) and the significance of asymmetry in conservation between anchor and partner motifs (axes Y), see Materials and Methods. The color keys show the CE abundance for 117/47 human/mouse ChIP-seq datasets (see Supplementary Table S1). CEs consisted of an anchor motif and either of 396/353 partner motifs from the Hocomoco human/mouse libraries (30), see Materials and Methods. CEs without the significant match of anchor and partner motifs (P-value > 0.05) were kept in analysis. This figure shows that predicted CEs with overlaps compared to those with spacers are more abundant and more often comprise two motifs of various conservation.

Similar articles

Cited by

References

    1. Morgunova E., Taipale J.. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 2017; 47:1–8. - PubMed
    1. MacQuarrie K.L., Fong A.P., Morse R.H., Tapscott S.J.. Genome-wide transcription factor binding: beyond direct target regulation. Trends Genet. 2011; 27:141–148. - PMC - PubMed
    1. Hnisz D., Shrinivas K., Young R.A., Chakraborty A.K., Sharp P.A.. A phase separation model for transcriptional control. Cell. 2017; 169:13–23. - PMC - PubMed
    1. Hu Z., Tee W.W.. Enhancers and chromatin structures: regulatory hubs in gene expression and diseases. Biosci. Rep. 2017; 37:BSR20160183. - PMC - PubMed
    1. Kel-Margoulis O.V., Kel A.E., Reuter I., Deineko I.V., Wingender E.. TRANSCompel: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res. 2002; 30:332–334. - PMC - PubMed

Publication types

Substances