a toolkit for ontologies. Since we all know ontologies are "nonsense".
note: see setup.py
in your local copy for version number | or if used the Releases
then v1.0.0.0 [29/06/2020]
don't hesitate to create an issue
or pull request
(see guidelines first).
command | description |
---|---|
catch |
to extract elements of text using key words |
bite |
to look at important words from text |
arise |
adding new synonyms to an ontology |
$ pip3 install click BeautifulSoup4 scikit-learn pandas lxml
or after installing, use the requirements.txt
file:
$ pip3 install -r requirements.txt
$ git clone https://github.com/sap218/jabberwocky
$ cd jabberwocky
$ python3 setup.py install --user
note: if you are using a virtual environment you can avoid --user
jabberwocky
works with OWL
ontology formats such as OWL/XML
and also RDF/XML
. for example biomedical ontologies such as doid.owl
, hpo.owl
, and uberon.owl
will all work, plus your own created.
note: make sure annotations are defined with the oboInOWL:
schema, e.g. hasExactSynonym
should have the IRI http://www.geneontology.org/formats/oboInOWL#hasExactSynonym
. but ensure you fix the prefix to <Prefix name="oboInOWL" IRI="http://www.geneontology.org/formats/oboInOWL#"/>
.
for examples of Jabberwocky's commands in use, please see the jabberwocky-tests
repository.
OR see SCENARIO.md for further explanation.
OR to run the automated tests (in the cloned directory):
$ git submodule init
$ git submodule update
$ tox
catch
essentially "catches" key elements from textual data using an ontology's classes & synonyms, with a set of keywords one can limit their search. note: it is recommended your list of keywords are exactly the classes from your chosen ontology (all in lowercase). note: if a .json
is provided, you need specify the field inside the JSON that contains the textual data to process.
$ catch --help
Usage: catch [OPTIONS]
Options:
-o, --ontology TEXT file of ontology. [required]
-k, --keywords TEXT list of classes/terms you want to use to search.
-t, --textfile TEXT JSON or TXT file of text you want annotate. [required]
-p, --parameter TEXT parameter/field for the JSON text data.
--help Show this message and exit.
$ catch -o ../ontology/pocketmonsters.owl -k listofwords.txt -t public_forum.json -p post
- a
.json
file of the classes and synonyms for your reference catch
prints out the key texts which included these classes/synonyms- you can use
>
to put into a separate file - see
jabberwocky-tests
for an example ofcatch
with an example output file
bite
runs a tf-idf statistical analysis: searching for important terms in a text corpus. a user can use an ontology to avoid key terms being in the statistical model. note: with the .json
input you need specify the field inside the JSON that contains the textual data to process.
$ bite --help
Usage: bite [OPTIONS]
Options:
-o, --ontology TEXT file of ontology.
-t, --textfile TEXT JSON file of text you want to observe. [required]
-p, --parameter TEXT parameter for the JSON file text. [required]
--help Show this message and exit.
$ bite -t public_forum.json -p post
- a
.txt
file of all classes and synonyms which were in the ontology - for your reference bite
prints out the important terms from the textual data: sorted by value - which also makes a.csv
- see
jabberwocky-tests
for an example ofbite
arise
inserts synonyms in an ontology based on your chosing: you define if these synonyms are "exact", "broad", "related", or "narrow" - these new synonyms may be based on the tf-idf statistical analysis from bite
.
$ arise --help
Usage: arise [OPTIONS]
Options:
-o, --ontology TEXT file of ontology. [required]
-f, --tfidf TEXT tf-idf CSV file of the synonyms you want to add. [required]
--help Show this message and exit.
$ arise -o ../ontology/pocketmonsters.owl -f new_synonyms_tfidf.csv
- file titled,
updated_ontology.owl
in the directory you run - see
jabberwocky-tests
for an example ofarise
the poem "Jabberwocky" written by Lewis Carrol is described as a "nonsense" poem.
Contributors - thank you!
- @majensen for setting up automated testing w/
pytest
- see pull request #13 for more details
Citing
@article{Pendleton2020,
doi = {10.21105/joss.02168},
url = {https://doi.org/10.21105/joss.02168},
year = {2020},
publisher = {The Open Journal},
volume = {5},
number = {51},
pages = {2168},
author = {Samantha C. Pendleton and Georgios V. Gkoutos},
title = {Jabberwocky: an ontology-aware toolkit for manipulating text},
journal = {Journal of Open Source Software}
}
You can combine these commands together to form a process of steps of ontology synonym development and text analysis. See jabberwocky-tests
repo for the jabberwocky-tests/process
directory for a chain of commands (as described in the image below).