Heterogeneity in DNA multiple alignments: modeling, inference, and applications in motif finding
- PMID: 19995355
- DOI: 10.1111/j.1541-0420.2009.01362.x
Heterogeneity in DNA multiple alignments: modeling, inference, and applications in motif finding
Abstract
Transcription factors bind sequence-specific sites in DNA to regulate gene transcription. Identifying transcription factor binding sites (TFBSs) is an important step for understanding gene regulation. Although sophisticated in modeling TFBSs and their combinatorial patterns, computational methods for TFBS detection and motif finding often make oversimplified homogeneous model assumptions for background sequences. Since nucleotide base composition varies across genomic regions, it is expected to be helpful for motif finding to incorporate the heterogeneity into background modeling. When sequences from multiple species are utilized, variation in evolutionary conservation violates the common assumption of an identical conservation level in multiple alignments. To handle both types of heterogeneity, we propose a generative model in which a segmented Markov chain is used to partition a multiple alignment into regions of homogeneous nucleotide base composition and a hidden Markov model (HMM) is employed to account for different conservation levels. Bayesian inference on the model is developed via Gibbs sampling with dynamic programming recursions. Simulation studies and empirical evidence from biological data sets reveal the dramatic effect of background modeling on motif finding, and demonstrate that the proposed approach is able to achieve substantial improvements over commonly used background models.
© 2009, The International Biometric Society.
Similar articles
-
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article.
-
Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites.Nat Biotechnol. 2003 Apr;21(4):435-9. doi: 10.1038/nbt802. Epub 2003 Mar 10. Nat Biotechnol. 2003. PMID: 12627170
-
Regulatory motif finding by logic regression.Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27. Bioinformatics. 2004. PMID: 15166027
-
Hidden Markov model and its applications in motif findings.Methods Mol Biol. 2010;620:405-16. doi: 10.1007/978-1-60761-580-4_13. Methods Mol Biol. 2010. PMID: 20652513 Review.
-
Eukaryotic transcription factor binding sites--modeling and integrative search methods.Bioinformatics. 2008 Jun 1;24(11):1325-31. doi: 10.1093/bioinformatics/btn198. Epub 2008 Apr 21. Bioinformatics. 2008. PMID: 18426806 Review.
Cited by
-
Identification of context-dependent motifs by contrasting ChIP binding data.Bioinformatics. 2010 Nov 15;26(22):2826-32. doi: 10.1093/bioinformatics/btq546. Epub 2010 Sep 23. Bioinformatics. 2010. PMID: 20870645 Free PMC article.