DISEASES: text mining and data integration of disease-gene associations
- PMID: 25484339
- DOI: 10.1016/j.ymeth.2014.11.020
DISEASES: text mining and data integration of disease-gene associations
Abstract
Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.
Keywords: Data integration; Information extraction; Named entity recognition; Text mining; Web resource.
Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Similar articles
-
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y. J Biomed Semantics. 2016. PMID: 27216254 Free PMC article.
-
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x. BMC Bioinformatics. 2015. PMID: 26047637 Free PMC article.
-
miRSel: automated extraction of associations between microRNAs and genes from the biomedical literature.BMC Bioinformatics. 2010 Mar 16;11:135. doi: 10.1186/1471-2105-11-135. BMC Bioinformatics. 2010. PMID: 20233441 Free PMC article.
-
Analysis of biological processes and diseases using text mining approaches.Methods Mol Biol. 2010;593:341-82. doi: 10.1007/978-1-60327-194-3_16. Methods Mol Biol. 2010. PMID: 19957157 Review.
-
Mining biological networks from full-text articles.Methods Mol Biol. 2014;1159:135-45. doi: 10.1007/978-1-4939-0709-0_8. Methods Mol Biol. 2014. PMID: 24788265 Review.
Cited by
-
BACH1 as a key driver in rheumatoid arthritis fibroblast-like synoviocytes identified through gene network analysis.Life Sci Alliance. 2024 Oct 28;8(1):e202402808. doi: 10.26508/lsa.202402808. Print 2025 Jan. Life Sci Alliance. 2024. PMID: 39467637 Free PMC article.
-
Analyses of GWAS signal using GRIN identify additional genes contributing to suicidal behavior.Commun Biol. 2024 Oct 21;7(1):1360. doi: 10.1038/s42003-024-06943-7. Commun Biol. 2024. PMID: 39433874 Free PMC article.
-
Aphthous stomatitis - computational biology suggests external biotic stimulus and immunogenic cell death involved.BMC Oral Health. 2024 Sep 29;24(1):1154. doi: 10.1186/s12903-024-04917-z. BMC Oral Health. 2024. PMID: 39343890 Free PMC article.
-
Integrating protein interaction and pathway crosstalk network reveals a promising therapeutic approach for psoriasis through apoptosis induction.Sci Rep. 2024 Sep 27;14(1):22103. doi: 10.1038/s41598-024-73746-5. Sci Rep. 2024. PMID: 39333640 Free PMC article.
-
BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery.Comput Struct Biotechnol J. 2024 Aug 21;23:3247-3253. doi: 10.1016/j.csbj.2024.08.016. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39279874 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources