Skip to content
This repository has been archived by the owner on Jan 17, 2024. It is now read-only.

Commit

Permalink
Datasets added
Browse files Browse the repository at this point in the history
  • Loading branch information
igormorgado committed Jun 9, 2021
1 parent 08726ff commit 101e80b
Show file tree
Hide file tree
Showing 11 changed files with 129,384 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
__pycache__/
*png
OFF/
data/GoogleNews-vectors-negative300.bin
data/en_embeddings_subset.p
data/fr_embeddings_subset.p
data/Sarcasm_Headlines_Dataset.json
data/Sarcasm_Headlines_Dataset_v2.json
data/WSJ_02-21.pos
data/WSJ_24.pos
data/wiki.fr.vec
data/sarcasm.json
15 changes: 15 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Data used

# GoogleNews-vector-negative300.bin
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing

Language embeddings
https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec

SarcasmHeadlines
https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection

Wall Stree Journal PoS Corpus
https://github.com/keon/nlp/tree/master/hw4-viterbi/data

Binary file added data/carta.pdf
Binary file not shown.
643 changes: 643 additions & 0 deletions data/carta.txt

Large diffs are not rendered by default.

Loading

0 comments on commit 101e80b

Please sign in to comment.