PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine
- PMID: 12689350
- PMCID: PMC153503
- DOI: 10.1186/1471-2105-4-11
PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine
Abstract
Background: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.
Results: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.
Conclusions: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.
Figures




Similar articles
-
Assisted curation: does text mining really help?Pac Symp Biocomput. 2008:556-67. Pac Symp Biocomput. 2008. PMID: 18229715
-
BioPPISVMExtractor: a protein-protein interaction extractor for biomedical literature using SVM and rich feature sets.J Biomed Inform. 2010 Feb;43(1):88-96. doi: 10.1016/j.jbi.2009.08.013. Epub 2009 Aug 23. J Biomed Inform. 2010. PMID: 19706337
-
Overview of the protein-protein interaction annotation extraction task of BioCreative II.Genome Biol. 2008;9 Suppl 2(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. Epub 2008 Sep 1. Genome Biol. 2008. PMID: 18834495 Free PMC article.
-
Biomolecular interaction network database.Brief Bioinform. 2005 Jun;6(2):194-8. doi: 10.1093/bib/6.2.194. Brief Bioinform. 2005. PMID: 15975228 Review.
-
What are decision trees?Nat Biotechnol. 2008 Sep;26(9):1011-3. doi: 10.1038/nbt0908-1011. Nat Biotechnol. 2008. PMID: 18779814 Free PMC article. Review.
Cited by
-
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae.J Biol. 2006;5(4):11. doi: 10.1186/jbiol36. Epub 2006 Jun 8. J Biol. 2006. PMID: 16762047 Free PMC article.
-
Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists.Plant Physiol. 2005 Aug;138(4):1914-25. doi: 10.1104/pp.105.060863. Plant Physiol. 2005. PMID: 16172098 Free PMC article.
-
Text-mining and information-retrieval services for molecular biology.Genome Biol. 2005;6(7):224. doi: 10.1186/gb-2005-6-7-224. Epub 2005 Jun 28. Genome Biol. 2005. PMID: 15998455 Free PMC article.
-
The Biomolecular Interaction Network Database and related tools 2005 update.Nucleic Acids Res. 2005 Jan 1;33(Database issue):D418-24. doi: 10.1093/nar/gki051. Nucleic Acids Res. 2005. PMID: 15608229 Free PMC article.
-
Construction and analysis of protein-protein interaction networks.Autom Exp. 2010 Feb 15;2(1):2. doi: 10.1186/1759-4499-2-2. Autom Exp. 2010. PMID: 20334628 Free PMC article.
References
-
- Sekimizu T, Park HS, Tsujii J. Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. Genome Inform Ser Workshop Genome Inform. 1998;9:62–71. - PubMed
-
- Humphreys K, Demetriou G, Gaizauskas R. Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac Symp Biocomput. 2000:505–516. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases