TherMos: Estimating protein-DNA binding energies from in vivo binding profiles

doi:10.1093/nar/gkt250

. 2013 Jun;41(11):5555-68.

doi: 10.1093/nar/gkt250. Epub 2013 Apr 16.

TherMos: Estimating protein-DNA binding energies from in vivo binding profiles

Wenjie Sun¹, Xiaoming Hu, Michael H K Lim, Calista K L Ng, Siew Hua Choo, Diogo S Castro, Daniela Drechsel, François Guillemot, Prasanna R Kolatkar, Ralf Jauch, Shyam Prabhakar

Affiliations

PMID: 23595148
PMCID: PMC3675472
DOI: 10.1093/nar/gkt250

TherMos: Estimating protein-DNA binding energies from in vivo binding profiles

Wenjie Sun et al. Nucleic Acids Res. 2013 Jun.

. 2013 Jun;41(11):5555-68.

doi: 10.1093/nar/gkt250. Epub 2013 Apr 16.

Authors

Wenjie Sun¹, Xiaoming Hu, Michael H K Lim, Calista K L Ng, Siew Hua Choo, Diogo S Castro, Daniela Drechsel, François Guillemot, Prasanna R Kolatkar, Ralf Jauch, Shyam Prabhakar

Affiliation

¹ Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis St, Singapore 138672, Singapore.

PMID: 23595148
PMCID: PMC3675472
DOI: 10.1093/nar/gkt250

Abstract

Accurately characterizing transcription factor (TF)-DNA affinity is a central goal of regulatory genomics. Although thermodynamics provides the most natural language for describing the continuous range of TF-DNA affinity, traditional motif discovery algorithms focus instead on classification paradigms that aim to discriminate 'bound' and 'unbound' sequences. Moreover, these algorithms do not directly model the distribution of tags in ChIP-seq data. Here, we present a new algorithm named Thermodynamic Modeling of ChIP-seq (TherMos), which directly estimates a position-specific binding energy matrix (PSEM) from ChIP-seq/exo tag profiles. In cross-validation tests on seven genome-wide TF-DNA binding profiles, one of which we generated via ChIP-seq on a complex developing tissue, TherMos predicted quantitative TF-DNA binding with greater accuracy than five well-known algorithms. We experimentally validated TherMos binding energy models for Klf4 and Esrrb, using a novel protocol to measure PSEMs in vitro. Strikingly, our measurements revealed strong non-additivity at multiple positions within the two PSEMs. Among the algorithms tested, only TherMos was able to model the entire binding energy landscape of Klf4 and Esrrb. Our study reveals new insights into the energetics of TF-DNA binding in vivo and provides an accurate first-principles approach to binding energy inference from ChIP-seq and ChIP-exo data.

PubMed Disclaimer

Figures

**Figure 1.**
Workflow of the TherMos algorithm. (A) Enrichment of tags in the control library as a function of local GC content. (B) Forward and reverse smoothing weights (tag distribution at binding sites) estimated using an iterative peak refinement procedure. Most tag fragments are <100 bp from the binding site. (C) Construction of ChIP-seq profile (Y_obs) from per base pair tag profile. (D) Initial guess of PSEM. (E) Theoretically predicted ChIP-seq profile (Y_pred), based on PSEM and inferred average peak shape. (F) PSEM that best fits the Y_obs.

**Figure 2.**
(A) Mash1 *in vivo* ChIP-seq profile (E12.5 mouse spinal cord) shows strong peaks at known targets of Mash1. (B, C) Performance of TherMos and other algorithms in 10-fold cross-validation testing on the seven whole-genome TF binding profiles. For each algorithm and each TF, the bar height indicates the average SPE or rank correlation coefficient across the 10 test sets. The summary bars at the end indicate average performance across all seven TFs. (B) SPE is calculated between predicted (motif) and observed (experimental data) ChIP-seq binding profile. Smaller SPE indicates higher accuracy. (C) Rank correlation coefficient is calculated between predicted (motif) and observed (experimental data) ChIP-seq tag counts. Average rank correlation coefficients below zero for some of the algorithms are not shown.

**Figure 3.**
*In vitro* binding energy model for Esrrb and comparison with algorithmic predictions from ChIP-seq. (A) Sequence logos of Esrrb motifs predicted by TherMos, MatrixREDUCE, Weeder, MEME, DREME and ChIPMunk. (B) Results of the EMSA competition assays. (C) The sequence logo of the Esrrb affinity model measured *in vitro* by EMSA competition assays. (D) Euclidean distance between *in vitro* motif and the motifs predicted by various algorithms.

**Figure 4.**
Position interdependence in Esrrb binding. (A) Euclidean distance at each nucleotide position between *in vitro* motif and the motifs predicted by various algorithms. (B) Esrra primary and secondary motifs measured using PBM (29). The nucleotides showing positional interdependence are highlighted in the box. (C) The measured affinity by single or multiple mutation EMSA (i.e. the measured log ratio of the *K_d* of the mutant sequences to *K_d* of the reference sequence) versus the predicted affinity (i.e. the corresponding predicted log ratio) by single mutation EMSA. Twenty-seven single mutant (diamond) and two multiple mutant (circle) sequences were tested in EMSA competition assays. The consensus (reference) sequence is highlighted in red with the mutated nucleotides highlighted in blue. Error bars for the two multiple-mutant sequences are too small to be visible in this plot.

**Figure 5.**
*In vitro* binding energy model for Klf4 and comparison with the algorithmic predictions from ChIP-seq. (A) Sequence logos of Klf4 motifs predicted by TherMos, MatrixREDUCE, Weeder, MEME, DREME and ChIPMunk. (B) Results of the EMSA competition assays. (C) The sequence logo of the Klf4 affinity model measured *in vitro* by EMSA competition assays. (D) Euclidean distance between the *in vitro* motif and the motifs predicted by various algorithms.

**Figure 6.**
Position interdependence in Klf4 binding. (A) Euclidean distance at each nucleotide position between the *in vitro* motif and the motifs predicted by various algorithms from ChIP-seq data. (B) Klf7 primary and secondary motifs measured using PBM (29). The nucleotides showing positional interdependence are highlighted in the box. (C) Twenty-five mutant sequences were tested in multiple mutations EMSA competition assays. The 10-bp consensus sequence is highlighted in red with two flanking nucleotides (in black) at both ends. The mutated nucleotides are highlighted in blue. (D) The multi-mutation measured affinity (i.e. the observed log ratio of the *K_d* of the 25 mutant sequences to *K_d* of the Mut 10) versus the corresponding log ratio predicted by single mutation affinity model, TherMos, MatrixREDUCE, Weeder, MEME, DREME, ChIPMunk, PBM primary motif and PBM secondary motif (29). The Pearson correlation coefficient is also shown in the plot.

See this image and copyright information in PMC

Cited by

Decoupling of evolutionary changes in transcription factor binding and gene expression in mammals.
Wong ES, Thybert D, Schmitt BM, Stefflova K, Odom DT, Flicek P. Wong ES, et al. Genome Res. 2015 Feb;25(2):167-78. doi: 10.1101/gr.177840.114. Epub 2014 Nov 13. Genome Res. 2015. PMID: 25394363 Free PMC article.
Quality versus accuracy: result of a reanalysis of protein-binding microarrays from the DREAM5 challenge by using BayesPI2 including dinucleotide interdependence.
Wang J. Wang J. BMC Bioinformatics. 2014 Aug 27;15(1):289. doi: 10.1186/1471-2105-15-289. BMC Bioinformatics. 2014. PMID: 25158938 Free PMC article.
Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.
Siebert M, Söding J. Siebert M, et al. Nucleic Acids Res. 2016 Jul 27;44(13):6055-69. doi: 10.1093/nar/gkw521. Epub 2016 Jun 9. Nucleic Acids Res. 2016. PMID: 27288444 Free PMC article.
Evidence supporting the existence of a NUPR1-like family of helix-loop-helix chromatin proteins related to, yet distinct from, AT hook-containing HMG proteins.
Urrutia R, Velez G, Lin M, Lomberk G, Neira JL, Iovanna J. Urrutia R, et al. J Mol Model. 2014 Aug;20(8):2357. doi: 10.1007/s00894-014-2357-7. Epub 2014 Jul 24. J Mol Model. 2014. PMID: 25056123 Free PMC article.
Insights from resolving protein-DNA interactions at near base-pair resolution.
Venters BJ. Venters BJ. Brief Funct Genomics. 2018 Mar 1;17(2):80-88. doi: 10.1093/bfgp/elx043. Brief Funct Genomics. 2018. PMID: 29211822 Free PMC article. Review.

See all "Cited by" articles

References

1. Stormo GD. Consensus patterns in DNA. Methods Enzymol. 1990;183:211–221. - PubMed
1. Man TK, Stormo G. Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001;29:2471–2478. - PMC - PubMed
1. Bulyk ML, Johnson PLF, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–1261. - PMC - PubMed
1. Oliphant AR, Brandl CJ, Struhl K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell Biol. 1989;9:2944–2949. - PMC - PubMed
1. Bailey TL, Elkan C. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: August. AAAI Press; 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers; pp. 28–36. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

[1] Stormo GD. Consensus patterns in DNA. Methods Enzymol. 1990;183:211–221. - PubMed

[2] Stormo GD. Consensus patterns in DNA. Methods Enzymol. 1990;183:211–221. - PubMed

[3] Man TK, Stormo G. Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001;29:2471–2478. - PMC - PubMed

[4] Man TK, Stormo G. Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001;29:2471–2478. - PMC - PubMed

[5] Bulyk ML, Johnson PLF, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–1261. - PMC - PubMed

[6] Bulyk ML, Johnson PLF, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–1261. - PMC - PubMed

[7] Oliphant AR, Brandl CJ, Struhl K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell Biol. 1989;9:2944–2949. - PMC - PubMed

[8] Oliphant AR, Brandl CJ, Struhl K. Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides: analysis of yeast GCN4 protein. Mol. Cell Biol. 1989;9:2944–2949. - PMC - PubMed

[9] Bailey TL, Elkan C. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: August. AAAI Press; 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers; pp. 28–36. - PubMed

[10] Bailey TL, Elkan C. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: August. AAAI Press; 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers; pp. 28–36. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TherMos: Estimating protein-DNA binding energies from in vivo binding profiles

Affiliation

TherMos: Estimating protein-DNA binding energies from in vivo binding profiles

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous