-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init collection from #1364
Init collection from #1364
Conversation
|
||
/// Handlers for migration data from one collection into another within single cluster | ||
|
||
/// Get a list of local shards, which can be used for migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the usage of the term migration
in this file and this context a bit confusing.
As far as I can see, it is actually a copy and not a migration as the source is not deleted at the end of the routine.
Source and target coexist at the end of the process.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renames, not sure it became better, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time, I like it better. It is internal stuff anyway, we can rename it later 👍
source_collection: &CollectionId, | ||
) -> Result<(), StorageError> { | ||
let collection = self.get_collection(source_collection).await?; | ||
let collection_vectors_schema = collection.state().await.config.params.vectors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this preventing users from changing the distance function used for measuring distance between vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and one reason for this is if you try to migrate from Cosine => Dot it might not work as expected, as we normalized the vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate, would have been a good use case.
An important point for the docs I guess 📝
|
||
let source_collection = | ||
handle_get_collection(collections_read.get(source_collection_name))?; | ||
let _updates_guard = source_collection.lock_updates().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this properly, this is not a transactional copy of the collection as it was at a given point in time.
As we progress through the offset
, we will fetch the latest values for each point.
Update occurring on already fetched ids will be missed.
Am I understanding this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do not forward updates into the new collection automatically. But the lock allows the user to start pushing updates into both collections from the external source, and, therefore, switch from one collection to another transparently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, it could be in the docs as well 📝
* WIP * WIP: add shards selection * fmt * WIP: collection init from other collection * fmt * fix bugs * add collection write lock + use collection manager runtime instead of search runtime * fmt * integration test * better names ? * payload indexes transfer
Add a parameter to initialize collection from another collection.
Might be useful for:
ToDo:
Shards status during data migration (?)