-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add isNull
condition for payload filtering
#1617
Conversation
Alright, now I think it should be ready (except for the cardinality estimation since I need some help to make sure it is right) I found that qdrant/lib/segment/src/common/utils.rs Lines 148 to 151 in ab03267
Since it treats serde_json::value::Value::Array as a Single value. So this can be fixed but I am not sure that it doesn't have regressions elsewhere.Another approach which I think will be less error prone is fixing
I am willing to implement either fix of them. Just waiting for your opinion and approval. |
I went ahead and added a fix for the |
lib/segment/src/common/utils.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a specific implementation for the MultiValue<&Value>
used in the is_empty
and is_null
conditions. I couldn't change the main is_empty()
method since it's being used elsewhere and the new specialized methods couldn't have the same name since the compiler won't attempt to resolve the ambiguity in case they overlap.
Hey @agourlay, could you please review this and let me know of any required changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a very basic review. Someone else will have to go over this in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/qdrant/qdrant/blob/master/docs/DEVELOPMENT.md#api-changes
REST: run /tools/generate_openapi_models.sh to generate specs
gRPC: generate docs ./tools/generate_grpc_docs.sh
This
@@ -49,7 +49,7 @@ def basic_collection_setup(collection_name='test_collection', on_disk_payload=Fa | |||
{ | |||
"id": 1, | |||
"vector": [0.05, 0.61, 0.76, 0.74], | |||
"payload": {"city": "Berlin"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not change tests that are already working to test new functionality. Create new fixtures if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that tests shouldn't be changed. However I thought the points in the basic_collection_setup()
was too standard and didn't account for "key":[]
which is why it was thought to be working while it didn't. Also if ,for example, for some strange reason the "values_count"
condition was changed such that it mistakenly count/miscount the "key": null
or "key": []
, the old tests will pass.
That's why I thought adding some diversity and changing the values accordingly is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any values_count
changes is most likely related to this recent fix #1502
I agree that changing existing payloads is not optimal for us.
I understand where you are coming from, if you want to add diversity to the existing data, I'd be ok with adding new points to the basic_collection_setup
at the cost of tracking down potential failing count assertions.
Use your best judgement :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values_count
change is just an example. What I meant to say is without having the test data representing all cases, some change may break another case and never fail a test since they're not existing in the same collection.
I already reverted the test changes to their previous state and added a new collection with the diverse payloads as per @generall's request. However, I am not happy with this solution for the reasons mentioned above. I'd rather add to the old test points.
"filter": { | ||
"should": [ | ||
{ | ||
"is_null": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is strange, that this passes validation without an updated openapi.json
. Apparently jsonschema.validate
is not able to process this example correctly.
That is something we need to keep in mind. FYI @agourlay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to reproduce the issue locally which cause malformed requests to not fail.
I even have a lead for a fix, will report asap 🤞
@@ -234,6 +240,8 @@ mod tests { | |||
"rating": vec![3, 7, 9, 9], | |||
"color": "red", | |||
"has_delivery": true, | |||
"parts": [], | |||
"packaging": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a test for a field that is [null]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the filtering supposed to behave in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess [null]
is equivalent to an empty array, so not null.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation, it will not be matched with neither is_empty
nor is_null
.
If you want it to be matched with is_empty
that's fine. However, what about [null, [null]]
, is that even a valid payload? if so, should it be equivalent to empty as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another concern , wouldn't it be confusing for "key":[null]
to get matched as empty and not as null with the word "null" in it? I don't see it making sense in the documentation 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about my comment, I don't think it is necessary to test this.
Sorry for derailing your research.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's alright, I added it anyway to test for any future changes regarding this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ibrahim-akrab , thanks for the contribution! Please do not forget to |
@generall, happy to help and thanks for the bounty 🤑 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution 👍
If you are still feeling motivated you could add a short doc about it in the Filtering section.
@agourlay Just submitted a PR there. Thanks for the suggestion 👍 |
* add minimal working is_null filter * add is_null condition to grpc api (backward compatible) * add unit tests is_null and is_empty conditions * add is_null to points.proto file * add some failing OpenAPI tests * fix a failing test due to change in collection data * refactor MultiValue's check for is_null * fix is_empty condition not picking up "key":[] * remove duplicate OpenAPI integration test * reuse same variable in condition checker tests * update grpc docs * fix is_null cardinality estimation to match is_empty * update openapi specs * remove unused debug statements * add new test points to original test_collection * fix failing tests according to newly added points * add the `"key":[null]` test_case
All Submissions:
New Feature Submissions:
cargo fmt
command prior to submission?cargo clippy
command?This PR will solves issue #1609 with some rough edges.
TODO:
/claim #1609