Text enrichment (paraphrasing/rephrasing) of sign language datasets How2Sign and YoutubeASL using GPT-3.5/4o/4o-mini, Llama3-8B, Llama3-70B, Claude3.5, Gemini.
Code repository affiliated with the SignLLaVa team at the JSALT 2024 workshop.
-
paraphrase_scripts - core scripts for paraphrasing by Václav Javorek
- LLMrephrase.ipynb - concept proof code + dataset statistics
- GPTrephrase.ipynb - core script using GPT API
- GPTrephrase_4o-mini.ipynb - 4o-mini version
- GPTrephrase_H2S.ipynb - How2Sign version
- GPTrephrase_H2S_4o-mini.ipynb - 4o-mini How2Sign
- Llama3rephrase.ipynb - Llama3 (8B) version
- Llama3rephrase+context.ipynb - Contextual paraphrasing
- Llama3rephrase+context-iter.ipynb - Iterative rephrasing
- Llama3rephrase_HF.ipynb - HuggingFace API version
- error counter.ipynb - fast bugfix and verification scripts
- LLama3-70B - scripts for rephrasing with Llama3-70B
-
paraphrase_evaluation_analysis - evaluation scripts by Alessa Carbo
-
data-normalization - text preprocessing by Alessa and Dominik
-
keywords, text-distance - multitask and utility scripts by Dominik Macháček
Note: All HF and OpenAI tokens contained within code repository are legacy and were generated and used solely for JSALT2024 purposes.