Skip to content

sap218/jabberwocky

Repository files navigation

jabberwocky

a toolkit for ontologies. Since we all know ontologies are "nonsense".

DOI DOI

note: see setup.py in your local copy for version number | or Releases:

  • v1.0.0.0 [29/06/2020]
  • v2.0.0.0 [10/05/2021]

don't hesitate to create an issue or pull request (see guidelines first).

Elements

command description
bandersnatch to extract synonyms from an RDF/XML syntax OWL ontology
catch to extract elements of text using key words
bite to look at important words from text
arise adding new synonyms to an ontology
Prerequisites
$ pip3 install click BeautifulSoup4 scikit-learn pandas lxml pytest spacy matplotlib

or after installing, use the requirements.txt file:

$ pip3 install -r requirements.txt
Install
$ git clone https://github.com/sap218/jabberwocky
$ cd jabberwocky
$ python3 setup.py install --user

note: if you are using a virtual environment you can avoid --user

Ontology formats

jabberwocky works with OWL ontology formats such as OWL/XML and also RDF/XML. for example biomedical ontologies such as doid.owl, hpo.owl, and uberon.owl will all work, plus your own created.

note: make sure annotations are defined with the oboInOWL: schema, e.g. hasExactSynonym should have the IRI http://www.geneontology.org/formats/oboInOWL#hasExactSynonym. but ensure you fix the prefix to <Prefix name="oboInOWL" IRI="http://www.geneontology.org/formats/oboInOWL#"/>.

Examples

for examples of Jabberwocky's commands in use, please see the jabberwocky-tests repository.

OR see SCENARIO.md for further explanation.

OR to run the automated tests (in the cloned directory):

$ git submodule init
$ git submodule update
$ tox


catch

catch essentially "catches" key elements from textual data using an ontology's classes & synonyms, with a set of keywords one can limit their search. note: it is recommended your list of keywords are exactly the classes from your chosen ontology (all in lowercase). note: if a .json is provided, you need specify the field inside the JSON that contains the textual data to process.

Usage

$ catch --help
Usage: catch [OPTIONS]

Options:
  -o, --ontology TEXT   file of ontology. [required]
  -k, --keywords TEXT   list of classes/terms you want to use to search.
  -t, --textfile TEXT   JSON or TXT file of text you want annotate. [required]
  -p, --parameter TEXT  parameter/field for the JSON text data.
  --help                Show this message and exit.

Running

$ catch -o ../ontology/pocketmonsters.owl -k listofwords.txt -t public_forum.json -p post
Output
  • a .json file of the classes and synonyms for your reference
  • catch prints out the key texts which included these classes/synonyms
  • you can use > to put into a separate file
  • see jabberwocky-tests for an example of catch with an example output file

bite

bite runs a tf-idf statistical analysis: searching for important terms in a text corpus. a user can use an ontology to avoid key terms being in the statistical model. note: with the .json input you need specify the field inside the JSON that contains the textual data to process.

Usage

$ bite --help
Usage: bite [OPTIONS]

Options:
  -o, --ontology TEXT   file of ontology.
  -t, --textfile TEXT   JSON file of text you want to observe.  [required]
  -p, --parameter TEXT  parameter for the JSON file text.  [required]
  --help                Show this message and exit.

Running

$ bite -t public_forum.json -p post
Output
  • a .txt file of all classes and synonyms which were in the ontology - for your reference
  • bite prints out the important terms from the textual data: sorted by value - which also makes a .csv
  • see jabberwocky-tests for an example of bite

arise

arise inserts synonyms in an ontology based on your chosing: you define if these synonyms are "exact", "broad", "related", or "narrow" - these new synonyms may be based on the tf-idf statistical analysis from bite.

Usage

$ arise --help
Usage: arise [OPTIONS]

Options:
  -o, --ontology TEXT  file of ontology.  [required]
  -f, --tfidf TEXT     tf-idf CSV file of the synonyms you want to add.  [required]
  --help               Show this message and exit.

Running

$ arise -o ../ontology/pocketmonsters.owl -f new_synonyms_tfidf.csv
Output
  • file titled, updated_ontology.owl in the directory you run
  • see jabberwocky-tests for an example of arise

Thanks! 🐉

the poem "Jabberwocky" written by Lewis Carrol is described as a "nonsense" poem.

Contributors - thank you!

Citing

@article{Pendleton2020,
  doi = {10.21105/joss.02168},
  url = {https://doi.org/10.21105/joss.02168},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {51},
  pages = {2168},
  author = {Samantha C. Pendleton and Georgios V. Gkoutos},
  title = {Jabberwocky: an ontology-aware toolkit for manipulating text},
  journal = {Journal of Open Source Software}
}

ONE LAST THING...

You can combine these commands together to form a process of steps of ontology synonym development and text analysis. See jabberwocky-tests repo for the jabberwocky-tests/process directory for a chain of commands (as described in the image below).

jabberwocky cycle