Skip to content

Investigation: Add support for word vectors #25

Open
@dpalmasan

Description

Implementation of this should be generic enough to allow used any other model, for example:

  • We'd like to use vectors obtained from a Vector Space model (Raw Counts, Tf-idf, LSA)
  • We also need to support using other types of word embeddings, such as GloVe or Word2Vec or embeddings obtained from BERT (for sentences), for example: https://huggingface.co/

Our representation should allow us to implement metrics such as:

  • Average sentence similarity (e.g. cosine distance, euclidean distance)
  • Other metrics based on sentence similarity (e.g. max distance between two sentences, average distance to the cluster center)
  • Givenness using semantic spaces
  • etc.

One design idea is having a callable that takes the text and returns de vectors as a numpy array. From the spacy dependency we should already have numpy in our dependencies, so no worries about that.

The, we will need to file issues to implement different metrics.

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions