Skip to content

Commit

Permalink
docs: Add text_tokenize.py example
Browse files Browse the repository at this point in the history
  • Loading branch information
hiepph committed Oct 19, 2017
1 parent 44dbcc8 commit 3828ced
Showing 1 changed file with 32 additions and 24 deletions.
56 changes: 32 additions & 24 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,39 @@
# torchMoji examples

## Initialization
[create_twitter_vocab.py](create_twitter_vocab.py)
Create a new vocabulary from a tsv file.
[tokenize_dataset.py](tokenize_dataset.py)
Tokenize a given dataset using the prebuilt vocabulary.
[vocab_extension.py](vocab_extension.py)
Extend the given vocabulary using dataset-specific words.
[dataset_split.py](dataset_split.py)
## Initialization
[create_twitter_vocab.py](create_twitter_vocab.py)
Create a new vocabulary from a tsv file.

[tokenize_dataset.py](tokenize_dataset.py)
Tokenize a given dataset using the prebuilt vocabulary.

[vocab_extension.py](vocab_extension.py)
Extend the given vocabulary using dataset-specific words.

[dataset_split.py](dataset_split.py)
Split a given dataset into training, validation and testing.

## Use pretrained model/architecture
[score_texts_emojis.py](score_texts_emojis.py)
Use torchMoji to score texts for emoji distribution.

[encode_texts.py](encode_texts.py)
## Use pretrained model/architecture
[score_texts_emojis.py](score_texts_emojis.py)
Use torchMoji to score texts for emoji distribution.

[text_emojize.py](text_emojize.py)
Use torchMoji to output emoji visualization from a single text input (mapped from `emoji_overview.png`)

```sh
python examples/text_emojize.py --text "I love mom's cooking\!"
# => I love mom's cooking! 😋 😍 💓 💛 ❤
```

[encode_texts.py](encode_texts.py)
Use torchMoji to encode the text into 2304-dimensional feature vectors for further modeling/analysis.

## Transfer learning
[finetune_youtube_last.py](finetune_youtube_last.py)
Finetune the model on the SS-Youtube dataset using the 'last' method.
[finetune_insults_chain-thaw.py](finetune_insults_chain-thaw.py)
Finetune the model on the Kaggle insults dataset (from blog post) using the 'chain-thaw' method.
[finetune_semeval_class-avg_f1.py](finetune_semeval_class-avg_f1.py)
Finetune the model on the SemeEval emotion dataset using the 'full' method and evaluate using the class average F1 metric.
[finetune_youtube_last.py](finetune_youtube_last.py)
Finetune the model on the SS-Youtube dataset using the 'last' method.

[finetune_insults_chain-thaw.py](finetune_insults_chain-thaw.py)
Finetune the model on the Kaggle insults dataset (from blog post) using the 'chain-thaw' method.

[finetune_semeval_class-avg_f1.py](finetune_semeval_class-avg_f1.py)
Finetune the model on the SemeEval emotion dataset using the 'full' method and evaluate using the class average F1 metric.

0 comments on commit 3828ced

Please sign in to comment.