Investigation: Add support for word vectors

Implementation of this should be generic enough to allow used any other model, for example:

- We'd like to use vectors obtained from a Vector Space model (Raw Counts, Tf-idf, LSA)
- We also need to support using other types of word embeddings, such as GloVe or Word2Vec or embeddings obtained from BERT (for sentences), for example: https://huggingface.co/

Our representation should allow us to implement metrics such as:

- Average sentence similarity (e.g. cosine distance, euclidean distance)
- Other metrics based on sentence similarity (e.g. max distance between two sentences, average distance to the cluster center)
- Givenness using semantic spaces
- etc.

One design idea is having a `callable` that takes the text and returns de vectors as a `numpy` array. From the spacy dependency we should already have `numpy` in our dependencies, so no worries about that.

The, we will need to file issues to implement different metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: Add support for word vectors #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development