You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. May I ask if there is a way to extract word embeddings using multiple cores?
Right now, I'm getting the word embeddings representation for the 20 newsgroups dataset, and it still takes a while to complete the whole dataset. Thank you.
For reference, this is my current function,
defextract_sentence_embeddings(
texts: strorList, batch_size: int=2048
) ->np.ndarray:
""" Returns the sentence embeddings for the input texts. Parameter --------- texts: str or List The input text to vectorize. batch_size: int The mini-batch size to use for computation. Returns ------- vectors: np.ndarray The sentence embeddings representation for the input texts. """vectorizer=pymagnitude.Magnitude("data/glove.840B.300d.magnitude")
ifisinstance(texts, str):
vectors=vectorizer.query(texts.split())
vectors=np.mean(vectors, axis=0)
returnvectorselifisinstance(texts, list):
vectors= []
forindexinrange(len(texts) //batch_size):
offset= (index*batch_size) %len(texts)
vector=vectorizer.query(
list(
map(
lambdatext: ["", ""]
iflen(text.split()) ==0elsetext.split(),
texts[offset : offset+batch_size],
)
)
)
vector=np.mean(vector, axis=1)
vectors.append(vector)
returnvectors
Since I'm using 300D vectors, the memory can easily be exhausted, that's why I opt for batching the text data.
Looking forward to your response! Thank you!
The text was updated successfully, but these errors were encountered:
Hello. May I ask if there is a way to extract word embeddings using multiple cores?
Right now, I'm getting the word embeddings representation for the 20 newsgroups dataset, and it still takes a while to complete the whole dataset. Thank you.
For reference, this is my current function,
Since I'm using 300D vectors, the memory can easily be exhausted, that's why I opt for batching the text data.
Looking forward to your response! Thank you!
The text was updated successfully, but these errors were encountered: