-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Byte storage integration into segment #4049
Conversation
55e2deb
to
da88a49
Compare
raw scorer integration config and test are you happy fmt fn renamings cow refactor use quantization branch quantization update
c8a8468
to
686cf90
Compare
Self { | ||
query: TMetric::preprocess(query), | ||
query: TElement::slice_from_float_cow(&preprocessed_vector).to_vec(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good to know, that Cow won't actually invoke copy for owned vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I did like that because slice_from_float_cow
is only one function to convert from floats into bytes. I find Cow
helpful in storages upserting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: 0edd630
Now we don't do unnecessary copying. In this place for float and non-cosine distance there is no any reallocation anymore
* byte storage with quantization raw scorer integration config and test are you happy fmt fn renamings cow refactor use quantization branch quantization update * are you happy clippy * don't use distance in quantized scorers * fix build * add fn quantization_preprocess * apply preprocessing for only cosine float metric * fix sparse vectors tests * update openapi * more complicated integration test * update openapi comment * mmap byte storages support * fix async test * move .unwrap closer to the actual check of the vector presence * fmt * remove distance similarity function * avoid copying data while working with cow --------- Co-authored-by: generall <andrey@vasnetsov.com>
This PR adds byte vector storage support into segment.
Main changes in this PR to do integration:
datatype
field into segment config (intoVectorDataConfig
). It allows storing the type of storage and loading segment properlyVectorStorageEnum
:DenseSimpleByte
,DenseMemmapByte
andDenseAppendableMemmapByte
.Distance::preprocess_vector
andDistance::similarity
.Distance
does not describe which type of vector is processing so it cannot trigger functions from the genericMetric
trait.Distance::preprocess_vector
triggers large refactor in quantization scorers.All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: