Open
Description
Implementation of this should be generic enough to allow used any other model, for example:
- We'd like to use vectors obtained from a Vector Space model (Raw Counts, Tf-idf, LSA)
- We also need to support using other types of word embeddings, such as GloVe or Word2Vec or embeddings obtained from BERT (for sentences), for example: https://huggingface.co/
Our representation should allow us to implement metrics such as:
- Average sentence similarity (e.g. cosine distance, euclidean distance)
- Other metrics based on sentence similarity (e.g. max distance between two sentences, average distance to the cluster center)
- Givenness using semantic spaces
- etc.
One design idea is having a callable
that takes the text and returns de vectors as a numpy
array. From the spacy dependency we should already have numpy
in our dependencies, so no worries about that.
The, we will need to file issues to implement different metrics.