-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: astra db chunks deletion based on metadata field #5537
feat: astra db chunks deletion based on metadata field #5537
Conversation
… document management - Introduced a new 'deletion_field' input to specify a metadata field for deleting documents before loading new data. - Enhanced the _add_documents_to_vector_store method to handle document deletion based on the specified field, improving data management capabilities.
CodSpeed Performance ReportMerging #5537 will degrade performances by 25.38%Comparing Summary
Benchmarks breakdown
|
…ove readability. - Optimized the deletion logic by using a set comprehension to eliminate duplicates when gathering delete values from documents.
@@ -607,6 +616,18 @@ def _add_documents_to_vector_store(self, vector_store) -> None: | |||
msg = "Vector Store Inputs must be Data objects." | |||
raise TypeError(msg) | |||
|
|||
if documents and self.deletion_field: | |||
self.log(f"Deleting documents where {self.deletion_field}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we remove this log line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msmygit I can see the argument to delete it for sure - i'd say we can keep it for now... I don't think it exposes any unnecessary information, and might help debug some issues.
Co-authored-by: Madhavan <msmygit@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @msmygit 's suggestions, just left the log message in for now. I commited the change to the error log message suggested. Everything else looks great - i think this is a good addition, though it will have to eventually be nicely documented to be used to its full potential. But i'm going to approve! Thanks!
I should add - the team is actively investigating a better ingestion experience for situations like this - but i think this is a solid solution for the current time. |
* feat: Add deletion_field parameter to AstraDBVectorStoreComponent for document management - Introduced a new 'deletion_field' input to specify a metadata field for deleting documents before loading new data. - Enhanced the _add_documents_to_vector_store method to handle document deletion based on the specified field, improving data management capabilities. * Merging with main * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * - Enhanced the info string for the 'deletion_field' parameter to improve readability. - Optimized the deletion logic by using a set comprehension to eliminate duplicates when gathering delete values from documents. * [autofix.ci] apply automated fixes * Update src/backend/base/langflow/components/vectorstores/astradb.py Co-authored-by: Madhavan <msmygit@users.noreply.github.com> * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Madhavan <msmygit@users.noreply.github.com>
) * feat: Add deletion_field parameter to AstraDBVectorStoreComponent for document management - Introduced a new 'deletion_field' input to specify a metadata field for deleting documents before loading new data. - Enhanced the _add_documents_to_vector_store method to handle document deletion based on the specified field, improving data management capabilities. * Merging with main * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * - Enhanced the info string for the 'deletion_field' parameter to improve readability. - Optimized the deletion logic by using a set comprehension to eliminate duplicates when gathering delete values from documents. * [autofix.ci] apply automated fixes * Update src/backend/base/langflow/components/vectorstores/astradb.py Co-authored-by: Madhavan <msmygit@users.noreply.github.com> * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Madhavan <msmygit@users.noreply.github.com>
Purpose
This PR addresses the need to reload specific documents without affecting others. To achieve this, a new option, "deletion_field", has been introduced.
Functionality
When "deletion_field" is set (e.g., "file_path"), the system will delete all documents in the target collection where metadata["file_path"] matches the corresponding value in the incoming documents.
This ensures that chunks from the specific file are removed before reloading it, preventing duplicates or conflicts.