medaCy

🏥 Medical Natural Language Processing with spaCy 🏥

MedaCy is a text processing and learning framework built over spaCy to support the lightning fast prototyping, building, and application of highly predictive named entity recognition and relationship extraction systems in the medical domain.

Features

Highly predictive out-of-the-box trained models for clinical named entity recognition and relationship extraction.
Customizable feature extraction pipelines for custom model building.
Integrated converters for common text annotation formats (Prodigy, BRAT, etc).
Pre-compiled medical terminology and abbreviation lexicons.

User Guide

Using medaCy is simple: all one needs is to select a pipeline and provide it with training data to learn from.

Training a Named Entity Recognition model for Clinical Text using medaCy:

from medacy.pipelines import ClinicalPipeline
from medacy.tools import DataLoader
from medacy.pipeline_component import MetaMap
import joblib

from medacy.learn import Learner

#Some more powerful pipelines require an outside knowledge source such as MetaMap.
metamap = MetaMap(metamap_path="/home/share/programs/metamap/2016/public_mm/bin/metamap")

#Automatically organizes your training files.
train_loader = DataLoader("/directory/containing/your/training/data/")

#Pre-metamap our training data to speed up building models.
train_loader.metamap(metamap)

#Create pipeline and specify entities to learn.
pipeline = ClinicalPipeline(metamap, entities=['Strength'])

#create a Learner using our pipeline and data
learner = Learner(pipeline, train_loader)

#Build a model (defaults to Conditional Random Field)
model = learner.train()
joblib.dump(model,'/location/to/save/model')

Prediction utilizing medaCy:

from medacy.pipelines import ClinicalPipeline
from medacy.tools import DataLoader
from medacy.pipeline_component import MetaMap
import joblib

from medacy.predict import Predictor

model = joblib.load('/location/containing/saved/model')

#Some more powerful pipelines require an outside knowledge source such as MetaMap.
metamap = MetaMap(metamap_path="/home/share/programs/metamap/2016/public_mm/bin/metamap")

data_loader = DataLoader("/directory/containing/your/text/to/label")

#Pre-metamap our data we wish to label to speed up prediction. Not necessary.
data_loader.metamap(metamap)

pipeline = ClinicalPipeline(metamap, entities=['Strength'])

#create a Learner using our pipeline and data
predictor = Predictor(pipeline, data_loader, model=model)

predictor.predict()

#prediction appear in a /predictions sub-directory of your data.

An example combined pipeline script:

from medacy.learn import Learner
from medacy.predict import Predictor
from medacy.pipelines import ClinicalPipeline
from medacy.tools import DataLoader
from medacy.pipeline_components import MetaMap
import logging, sys, joblib

#See what medaCy is doing at any part of the learning or prediction process
logging.basicConfig(stream=sys.stdout,level=logging.INFO) #set level=logging.DEBUG for more information

train_loader = DataLoader("/training/directory")
test_loader = DataLoader("/evaluation/directory")
metamap = MetaMap(metamap_path="/home/share/programs/metamap/2016/public_mm/bin/metamap")

train_loader.metamap(metamap)
test_loader.metamap(metamap)

pipeline = ClinicalPipeline(metamap, entities=['Drug', 'Form', 'Route', 'ADE', 'Reason', 'Frequency', 'Duration', 'Dosage', 'Strength'])

learner = Learner(pipeline, train_loader)

model = learner.train()
joblib.dump(model,'medacy_model')

learner.cross_validate() #perform 10 fold cross validation on predicted model, this takes time.

predictor = Predictor(pipeline, test_loader, model=model)

predictor.predict()

#prediction appear in a /predictions sub-directory of your data.

Note, the ClinicalPipeline requires spaCy's small model - install it with pip:

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz

Set-up

To install this repository from source do the following:

Enter into a python3 virtual envirorment, once inside make sure to upgrade pip to the latest version.
Run the following instruction - this should take a bit and may throw some non-fatal warnings.

pip install git+https://github.com/NanoNLP/medaCy.git

Install spaCy's small model.

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz

How medaCy works

MedaCy leverages the text-processing power of spaCy with state-of-the-art research tools and techniques in medical named entity recognition. MedaCy consists of a set of lightning-fast pipelines that are specialized for learning specific types of medical entities. A pipeline consists of a stackable and interchangeable set of PipelineComponents - these are bite-sized code blocks that each overlay a feature onto the text being processed.

Components

You can write your own PipelineComponents to utilize in custom pipelines by interfacing the BasePipeline and BaseComponent classes. Alternatively use the components already included with medaCy. Some more powerful components require outside software - an example is the MetaMapComponent which interfaces with MetaMap to overlay rich medical concept information onto text. Components are chained or stacked in pipelines and can themselves depend on the outputs of previous components to function.

Contribution

To contribute do the following:

Enter into a python3 virtual envirorment, once inside make sure to upgrade pip to the latest version.
Fork and clone this repository, enter into the cloned repo and run:

pip install -e .

This will install medaCy in editable mode. Any changes you make to medaCy sources code will be reflected immediately when used.

Insure you are developing in the development branch or your own branch of the development branch.

License

This package is licensed under the GNU General Public License

Authors

Andriy Mulyar, Bobby Best, Steele Farnsworth, Yadunandan Pillai, Corey Sutphin, Bridget McInnes

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
docs		docs
medacy		medacy
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

medaCy

Features

User Guide

Set-up

How medaCy works

Components

Contribution

License

Authors

Acknowledgments

About

Releases

Packages

Languages

License

daniela-llivina/medaCy

Folders and files

Latest commit

History

Repository files navigation

medaCy

Features

User Guide

Set-up

How medaCy works

Components

Contribution

License

Authors

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages