-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Samantha
committed
May 10, 2021
1 parent
a4d08e2
commit e31d5c3
Showing
2 changed files
with
26 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,83 +1,8 @@ | ||
# jabberwocky-tests | ||
tests for the jabberwocky toolkit - before installation, please install jabberwocky toolkit (see [jabberwocky](https://github.com/sap218/jabberwocky) repo) for installation instructions. | ||
# Jabberwocky | ||
Full-depth explanation, informative scenarios, and working examples for the Jabberwocky toolkit - for installation instructions, see the [Jabberwocky](https://github.com/sap218/jabberwocky) repository. Here | ||
|
||
**note**: this repository is only for the jabberwocky toolkit examples, no official `.py` scripts in this repo are available (see [jabberwocky](https://github.com/sap218/jabberwocky) repo)! only examples in the `README`. | ||
see [SCENARIO](https://github.com/sap218/jabberwocky/docs/SCENARIO.md) | ||
|
||
#### see [**SCENARIO.md**](https://github.com/sap218/jabberwocky-tests/blob/master/SCENARIO.md) for a full-depth explanation of each command, the inputs, and outputs. | ||
### About the Commands | ||
|
||
--- | ||
|
||
### `catch` | ||
see directory `catch` for the example test - **note**: code was performed in the directory | ||
* `ontology/pocketmonsters.owl` is a very brief ontology with classes, including exact and related synonyms | ||
* `listofwords.txt` is a list of terms a user wants to search with, these are exact same as those class labels | ||
* `blogs.json` is file of user blogs - **note**: this was completely fabricated for the exampple | ||
* `blog_post` is the key for the text, using this will collate the users' blog posts and ignore their names | ||
|
||
**ontology, keywords, json file w/ parameter** - *this is the current `ontology_dict_class_synonyms.json` output* | ||
|
||
`$ catch --ontology ../ontology/pocketmonsters.owl --keywords listofwords.txt --textfile blogs_formatted.json --parameter blog_post > catch_output.txt` | ||
|
||
**ontology, keywords, txt file, saves to file** | ||
|
||
`$ catch -o ../ontology/pocketmonsters.owl -k listofwords.txt -t blogs_unformatted.txt` | ||
|
||
**ontology, txt file** | ||
|
||
`$ catch -o ../ontology/pocketmonsters.owl --textfile blogs_unformatted.txt` | ||
|
||
|
||
--- | ||
|
||
|
||
### `bite` | ||
see directory `bite` for the example test - **note**: code was performed in the directory | ||
* `ontology/pocketmonsters.owl` is a very brief ontology with classes, including exact and related synonyms | ||
* `public_forum` is the public forum example which will be used | ||
|
||
**ontology, json file w/ parameter** - *this is the current `ontology_all_terms.txt` & `tfidf_results.csv` output* | ||
|
||
`$ bite -o ../ontology/pocketmonsters.owl -t public_forum.json -p post` | ||
|
||
**json file w/ parameter** - *this will not remove any ontology terms and so output will be larger* | ||
|
||
`$ bite -t public_forum.json -p post` | ||
|
||
|
||
--- | ||
|
||
|
||
### `arise` | ||
see directory `arise` for the example test - **note**: code was performed in the directory | ||
* `ontology/pocketmonsters.owl` is a very brief ontology with classes, including exact and related synonyms | ||
* `new_synonyms_tfidf.csv` is the new synonyms you want to add - based on the `bite` output | ||
* `updated-ontology.owl` is the output | ||
|
||
`$ arise --ontology ../ontology/pocketmonsters.owl --tfidf new_synonyms_tfidf.csv` | ||
|
||
the other way to use the same command (shorter parameters): | ||
|
||
`$ arise -o ../ontology/pocketmonsters.owl -f new_synonyms_tfidf.csv` | ||
|
||
|
||
--- | ||
--- | ||
|
||
|
||
### `process/` | ||
See [`jabberwocky-tests/process/`](https://github.com/sap218/jabberwocky-tests/tree/master/process) for the directory which uses all commands together to form an ontology development / text analysis process - see [jabberwocky](https://github.com/sap218/jabberwocky) repo for the image of the workflow. | ||
|
||
**Note**: during these steps I renamed files accordingly to display the differences | ||
* `catch` using the ontology [pocketmonsters.owl], keywords [`listofwords.txt`], text data [public_forum.json] (& parameter) to `catch_01_output.txt` - current classes and synonyms are: `catch_01_ontology_dict_class_synonyms.json` | ||
* `bite` using the text data (& parameter) - the full results are in `bite_01_tfidf_results` and then made `new_synonyms_tfidf.csv` based on it | ||
* `arise` using the ontology and the `new_synonyms_tfidf.csv` file, providing `updated-ontology.owl` output | ||
* `bite` ran a second time to oberve rewighing, provided the `updated-ontology.owl` and the text data (& parameter) - the full results are in `bite_02_tfidf_results` plus a list of all ontology class terms & synonyms in `bite_02_ontology_all_terms.txt` | ||
* `catch` is the final step: using the `updated-ontology.owl`, keywords, text data (& parameter) to `catch_02_output.txt` - with newly updated classes and synonyms are: `catch_02_ontology_dict_class_synonyms.json` | ||
|
||
``` | ||
$ catch -o ../ontology/pocketmonsters.owl -k listofwords.txt -t public_forum.json -p post > catch_01_output.txt | ||
$ bite -t public_forum.json -p post | ||
$ arise -o ../ontology/pocketmonsters.owl -f new_synonyms_tfidf.csv | ||
$ bite -o updated-ontology.owl -t public_forum.json -p post | ||
$ catch -o updated-ontology.owl -k listofwords.txt -t public_forum.json -p post > catch_02_output.txt | ||
``` | ||
.. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Jabberwocky Scenario | ||
|
||
go back to [main page](https://github.com/sap218/jabberwocky/docs/README.md) | ||
|
||
You have extracted textual data: blog posts from a social media platform. These social media posts include varios individuals discussing a topic, which you are researching. In this scenario the users are talking about [*pokemon*](https://simple.wikipedia.org/wiki/Pok%C3%A9mon). | ||
|
||
Some example posts from the text data: | ||
|
||
> I think only gen 6 pokemon are on this path, try route 2 - wanderer wendy | ||
> No thanks, I'm, trying to catch a flying type in the mountains with the clear air - trainer penelope | ||
Your aim is to extract particular posts which individuals use specific terms, e.g. "gen" or "flying". | ||
|
||
This is where ontologies are **useful**. Ontologies are a controlled set of vocabulary with terms logically related to the other, e.g. in anatomy our hand is a part of the arm. The purpose of Jabberwocky is looking at these terms and their synonyms, as in the example above that the arm has a synonym "upper limb". | ||
|
||
You have access to `pocketmonsters.owl` - an ontology with some concepts of pokemon, e.g. pokemon types. You have some terms of interest, e.g. *** *** | ||
|
||
Using the `catch` command, | ||
|
||
you are trying to extract the posts which include key terms, e.g. "generation one", "dragon", and more. in [jabberwocky-tests/process/](https://github.com/sap218/jabberwocky-tests/tree/master/process) you will see the `listofwords.txt` which is a simple text file of these terms you are looking for, they match the exact terms from the ontology. there is also the textual data file: `public_forum` with a **total of 24 posts**, but split into 5 threads. |