-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility for mmap sparse vectors #5454
Conversation
}); | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also tests the simple
storage for better coverage 🤓
pub enum SparseVectorStorageType { | ||
/// Storage on disk | ||
// (rocksdb storage) | ||
#[default] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing this default in the future will help us make the switch
6b4cb58
to
da71d4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great already 👍
e5671d7
to
d4d9f15
Compare
@@ -79,7 +79,7 @@ impl SegmentBuilder { | |||
}; | |||
|
|||
let payload_storage = | |||
create_payload_storage(database.clone(), segment_config, segment_path)?; | |||
create_payload_storage(database.clone(), segment_config, temp_dir.path())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also fixes the mmap payload storage to be stored in the right place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToDo: avoid iterating over vector storage on load in another PR
Plus some extra refactors
d4d9f15
to
4835944
Compare
* implement mmap sparse vector storage * add to VectorStorageEnum * clippy * add tests, fix both simple and mmap storages * smol correction on total_vector_count * add sparse storage type to config * fix reading config without storage type * generate openapi * use blob_store by path * hidden setting to enable new storage * validate existing path in `BlobStore::open()` * use new dir for each sparse vector name * fix and rename `max_point_offset` Plus some extra refactors * add storage compat test, to always check both storages work * fix opening of storage + other misc fixes * FIX!!! `Unset` operations in the Tracker weren't updating the `next_pointer_id`. So, when reopening the storage, those points wouldn't get marked as deleted in the bitslice, thus creating the illusion that they should exist, when they did not. * refactor naming from `iter_*` to `for_each_*` * fix checking for BlobStore existance * fix typo * fix error message * better docs for open_or_create * fix after rebase
* implement mmap sparse vector storage * add to VectorStorageEnum * clippy * add tests, fix both simple and mmap storages * smol correction on total_vector_count * add sparse storage type to config * fix reading config without storage type * generate openapi * use blob_store by path * hidden setting to enable new storage * validate existing path in `BlobStore::open()` * use new dir for each sparse vector name * fix and rename `max_point_offset` Plus some extra refactors * add storage compat test, to always check both storages work * fix opening of storage + other misc fixes * FIX!!! `Unset` operations in the Tracker weren't updating the `next_pointer_id`. So, when reopening the storage, those points wouldn't get marked as deleted in the bitslice, thus creating the illusion that they should exist, when they did not. * refactor naming from `iter_*` to `for_each_*` * fix checking for BlobStore existance * fix typo * fix error message * better docs for open_or_create * fix after rebase
Adds the infrastructure to integrate
blob_store
as a new mmap storage for sparse vectors.This PR does not yet create segments with this storage, but should be able to read it.
There are some additional changes to the
simple
storage, specifically:total_sparse_vector_size
:None
when reading the storageEdits for
blob_store
:None
value (deleting a point above the range of the storage) now updates thenext_pointer_offset
for_each_unfiltered
function which iterates through the ALL the storage with a callback, including deleted pointsopen_or_create
function which does a better check to choose between opening or creating a storageTODO