ANN-Spec: a method for discovering transcription factor binding sites with improved specificity
- PMID: 10902194
- DOI: 10.1142/9789814447331_0044
ANN-Spec: a method for discovering transcription factor binding sites with improved specificity
Abstract
This work describes ANN-Spec, a machine learning algorithm and its application to discovering un-gapped patterns in DNA sequence. The approach makes use of an Artificial Neural Network and a Gibbs sampling method to define the Specificity of a DNA-binding protein. ANN-Spec searches for the parameters of a simple network (or weight matrix) that will maximize the specificity for binding sequences of a positive set compared to a background sequence set. Binding sites in the positive data set are found with the resulting weight matrix and these sites are then used to define a local multiple sequence alignment. Training complexity is O(lN) where l is the width of the pattern and N is the size of the positive training data. A quantitative comparison of ANN-Spec and a few related programs is presented. The comparison shows that ANN-Spec finds patterns of higher specificity when training with a background data set. The program and documentation are available from the authors for UNIX systems.
Similar articles
-
Identifying target sites for cooperatively binding factors.Bioinformatics. 2001 Jul;17(7):608-21. doi: 10.1093/bioinformatics/17.7.608. Bioinformatics. 2001. PMID: 11448879
-
Modeling transcription factor binding sites with Gibbs Sampling and Minimum Description Length encoding.Proc Int Conf Intell Syst Mol Biol. 1997;5:268-71. Proc Int Conf Intell Syst Mol Biol. 1997. PMID: 9322048
-
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article.
-
Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNA.In Silico Biol. 1998;1(1):21-8. In Silico Biol. 1998. PMID: 11471239 No abstract available.
-
GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA.BMC Bioinformatics. 2005 Feb 22;6:36. doi: 10.1186/1471-2105-6-36. BMC Bioinformatics. 2005. PMID: 15725347 Free PMC article.
Cited by
-
Towards a theoretical understanding of false positives in DNA motif finding.BMC Bioinformatics. 2012 Jun 27;13:151. doi: 10.1186/1471-2105-13-151. BMC Bioinformatics. 2012. PMID: 22738169 Free PMC article.
-
Genomewide bioinformatic analysis negates any specific role for Dof, GATA and Ag/cTCA motifs in nitrate responsive gene expression in Arabidopsis.Physiol Mol Biol Plants. 2009 Apr;15(2):145-50. doi: 10.1007/s12298-009-0016-8. Epub 2009 Jun 28. Physiol Mol Biol Plants. 2009. PMID: 23572923 Free PMC article.
-
Boosting the prediction and understanding of DNA-binding domains from sequence.Nucleic Acids Res. 2010 Jun;38(10):3149-58. doi: 10.1093/nar/gkq061. Epub 2010 Feb 15. Nucleic Acids Res. 2010. PMID: 20156993 Free PMC article.
-
A motif co-occurrence approach for genome-wide prediction of transcription-factor-binding sites in Escherichia coli.Genome Res. 2004 Feb;14(2):201-8. doi: 10.1101/gr.1448004. Genome Res. 2004. PMID: 14762058 Free PMC article.
-
Functional annotation of novel lineage-specific genes using co-expression and promoter analysis.BMC Genomics. 2010 Mar 9;11:161. doi: 10.1186/1471-2164-11-161. BMC Genomics. 2010. PMID: 20214810 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources