DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches
- PMID: 23737141
- DOI: 10.1002/prot.24330
DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches
Abstract
Accurate prediction of DNA-binding residues has become a problem of increasing importance in structural bioinformatics. Here, we presented DNABind, a novel hybrid algorithm for identifying these crucial residues by exploiting the complementarity between machine learning- and template-based methods. Our machine learning-based method was based on the probabilistic combination of a structure-based and a sequence-based predictor, both of which were implemented using support vector machines algorithms. The former included our well-designed structural features, such as solvent accessibility, local geometry, topological features, and relative positions, which can effectively quantify the difference between DNA-binding and nonbinding residues. The latter combined evolutionary conservation features with three other sequence attributes. Our template-based method depended on structural alignment and utilized the template structure from known protein-DNA complexes to infer DNA-binding residues. We showed that the template method had excellent performance when reliable templates were found for the query proteins but tended to be strongly influenced by the template quality as well as the conformational changes upon DNA binding. In contrast, the machine learning approach yielded better performance when high-quality templates were not available (about 1/3 cases in our dataset) or the query protein was subject to intensive transformation changes upon DNA binding. Our extensive experiments indicated that the hybrid approach can distinctly improve the performance of the individual methods for both bound and unbound structures. DNABind also significantly outperformed the state-of-art algorithms by around 10% in terms of Matthews's correlation coefficient. The proposed methodology could also have wide application in various protein functional site annotations. DNABind is freely available at http://mleg.cse.sc.edu/DNABind/.
Keywords: DNA-binding residue; conformational change; machine learning; protein-DNA interaction; structural analysis; template.
Copyright © 2013 Wiley Periodicals, Inc.
Similar articles
-
RBRDetector: improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies.Proteins. 2014 Oct;82(10):2455-71. doi: 10.1002/prot.24610. Epub 2014 Jun 9. Proteins. 2014. PMID: 24854765
-
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.PLoS One. 2015 Jul 15;10(7):e0133260. doi: 10.1371/journal.pone.0133260. eCollection 2015. PLoS One. 2015. PMID: 26176857 Free PMC article.
-
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838. Proteins. 2008. PMID: 18186470
-
Computational intelligence techniques in bioinformatics.Comput Biol Chem. 2013 Dec;47:37-47. doi: 10.1016/j.compbiolchem.2013.04.007. Epub 2013 Jul 10. Comput Biol Chem. 2013. PMID: 23891719 Review.
-
An overview of the prediction of protein DNA-binding sites.Int J Mol Sci. 2015 Mar 6;16(3):5194-215. doi: 10.3390/ijms16035194. Int J Mol Sci. 2015. PMID: 25756377 Free PMC article. Review.
Cited by
-
A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs.PLoS Comput Biol. 2015 Dec 17;11(12):e1004639. doi: 10.1371/journal.pcbi.1004639. eCollection 2015 Dec. PLoS Comput Biol. 2015. PMID: 26681179 Free PMC article.
-
EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.Brief Bioinform. 2024 May 23;25(4):bbae330. doi: 10.1093/bib/bbae330. Brief Bioinform. 2024. PMID: 38975896 Free PMC article.
-
GMean-a semi-supervised GRU and K-mean model for predicting the TF binding site.Sci Rep. 2024 Jan 30;14(1):2539. doi: 10.1038/s41598-024-52933-4. Sci Rep. 2024. PMID: 38291225 Free PMC article.
-
Nuclear translocation of vitellogenin in the honey bee (Apis mellifera).Apidologie. 2022;53(1):13. doi: 10.1007/s13592-022-00914-9. Epub 2022 Mar 15. Apidologie. 2022. PMID: 35309709 Free PMC article.
-
The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.Bioinformatics. 2019 Jan 1;35(1):12-19. doi: 10.1093/bioinformatics/bty523. Bioinformatics. 2019. PMID: 29947739 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources