The next generation of transcription factor binding site prediction
- PMID: 24039567
- PMCID: PMC3764009
- DOI: 10.1371/journal.pcbi.1003214
The next generation of transcription factor binding site prediction
Abstract
Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Similar articles
-
High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions.PLoS Comput Biol. 2010 Sep 9;6(9):e1000916. doi: 10.1371/journal.pcbi.1000916. PLoS Comput Biol. 2010. PMID: 20838582 Free PMC article.
-
Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment.BMC Genomics. 2014 Jun 13;15(1):472. doi: 10.1186/1471-2164-15-472. BMC Genomics. 2014. PMID: 24927817 Free PMC article.
-
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.BMC Genomics. 2014 Jan 29;15(1):80. doi: 10.1186/1471-2164-15-80. BMC Genomics. 2014. PMID: 24472686 Free PMC article.
-
Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants.Mol Plant. 2019 Jun 3;12(6):743-763. doi: 10.1016/j.molp.2018.10.010. Epub 2018 Nov 15. Mol Plant. 2019. PMID: 30447332 Review.
-
An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data.Brief Bioinform. 2018 Sep 28;19(5):1069-1081. doi: 10.1093/bib/bbx026. Brief Bioinform. 2018. PMID: 28334268 Review.
Cited by
-
Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.Genet Mol Biol. 2024 Jan 19;46(4):e20230048. doi: 10.1590/1678-4685-GMB-2023-0048. eCollection 2024. Genet Mol Biol. 2024. PMID: 38285430 Free PMC article.
-
By the company they keep: interaction networks define the binding ability of transcription factors.Nucleic Acids Res. 2015 Oct 30;43(19):e125. doi: 10.1093/nar/gkv607. Epub 2015 Jun 18. Nucleic Acids Res. 2015. PMID: 26089389 Free PMC article.
-
Disentangling transcription factor binding site complexity.Nucleic Acids Res. 2018 Nov 16;46(20):e121. doi: 10.1093/nar/gky683. Nucleic Acids Res. 2018. PMID: 30085218 Free PMC article.
-
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047. BMC Genomics. 2014. PMID: 25442502 Free PMC article.
-
A survey on algorithms to characterize transcription factor binding sites.Brief Bioinform. 2023 May 19;24(3):bbad156. doi: 10.1093/bib/bbad156. Brief Bioinform. 2023. PMID: 37099664 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous