add `isNull` condition for payload filtering #1617

ibrahim-akrab · 2023-03-28T15:18:53Z

All Submissions:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you lint your code locally using cargo fmt command prior to submission?
Have you checked your code using cargo clippy command?

This PR will solves issue #1609 with some rough edges.

TODO:

check the cardinality estimation correctness
unit tests
OpenAPI integration tests
grpc implementation

/claim #1609

ibrahim-akrab · 2023-03-29T00:27:06Z

Alright, now I think it should be ready (except for the cardinality estimation since I need some help to make sure it is right)

I found that IsEmpty condition only gets matched if the key is not present at all in the payload. It doesn't get matched when it is [] as it was mentioned in the original issue. I checked that against the docker image without my changes as well. I think this is in part because of

qdrant/lib/segment/src/common/utils.rs

Lines 148 to 151 in ab03267

    
           pub fn get_value_from_json_map<'a>( 
        
               path: &str, 
        
               value: &'a serde_json::Map<String, Value>, 
        
           ) -> MultiValue<&'a Value> {

Since it treats serde_json::value::Value::Array as a Single value. So this can be fixed but I am not sure that it doesn't have regressions elsewhere.
Another approach which I think will be less error prone is fixing

qdrant/lib/segment/src/payload_storage/query_checker.rs

Line 94 in ab03267

    
           pub fn check_is_empty_condition(is_empty: &IsEmptyCondition, payload: &Payload) -> bool {

to take the previous fact into consideration.

I am willing to implement either fix of them. Just waiting for your opinion and approval.

ibrahim-akrab · 2023-03-29T11:06:42Z

I went ahead and added a fix for the is_empty condition of "key":[] mentioned above. Now it behaves according to the issue description. It can always be reverted if it wasn't meant to behave that way.

ibrahim-akrab · 2023-03-29T11:11:33Z

lib/segment/src/common/utils.rs

I added a specific implementation for the MultiValue<&Value> used in the is_empty and is_null conditions. I couldn't change the main is_empty() method since it's being used elsewhere and the new specialized methods couldn't have the same name since the compiler won't attempt to resolve the ambiguity in case they overlap.

ibrahim-akrab · 2023-03-29T13:56:42Z

Hey @agourlay, could you please review this and let me know of any required changes.

timvisee

Did a very basic review. Someone else will have to go over this in more detail.

lib/segment/src/payload_storage/query_checker.rs

generall

https://github.com/qdrant/qdrant/blob/master/docs/DEVELOPMENT.md#api-changes

REST: run /tools/generate_openapi_models.sh to generate specs
gRPC: generate docs ./tools/generate_grpc_docs.sh

This

generall · 2023-03-30T08:37:11Z

openapi/tests/openapi_integration/helpers/collection_setup.py

@@ -49,7 +49,7 @@ def basic_collection_setup(collection_name='test_collection', on_disk_payload=Fa
                {
                    "id": 1,
                    "vector": [0.05, 0.61, 0.76, 0.74],
-                    "payload": {"city": "Berlin"}


Please do not change tests that are already working to test new functionality. Create new fixtures if needed

I agree with you that tests shouldn't be changed. However I thought the points in the basic_collection_setup() was too standard and didn't account for "key":[] which is why it was thought to be working while it didn't. Also if ,for example, for some strange reason the "values_count" condition was changed such that it mistakenly count/miscount the "key": null or "key": [], the old tests will pass.
That's why I thought adding some diversity and changing the values accordingly is a good idea.

Any values_count changes is most likely related to this recent fix #1502

I agree that changing existing payloads is not optimal for us.

I understand where you are coming from, if you want to add diversity to the existing data, I'd be ok with adding new points to the basic_collection_setup at the cost of tracking down potential failing count assertions.

Use your best judgement :)

The values_count change is just an example. What I meant to say is without having the test data representing all cases, some change may break another case and never fail a test since they're not existing in the same collection.

I already reverted the test changes to their previous state and added a new collection with the diverse payloads as per @generall's request. However, I am not happy with this solution for the reasons mentioned above. I'd rather add to the old test points.

openapi/tests/openapi_integration/helpers/collection_setup.py

generall · 2023-03-30T09:18:37Z

openapi/tests/openapi_integration/test_basic_retrieve_api.py

+            "filter": {
+                "should": [
+                    {
+                        "is_null": {


That is strange, that this passes validation without an updated openapi.json. Apparently jsonschema.validate is not able to process this example correctly.

That is something we need to keep in mind. FYI @agourlay

I was able to reproduce the issue locally which cause malformed requests to not fail.

I even have a lead for a fix, will report asap 🤞

lib/segment/src/index/struct_payload_index.rs

agourlay · 2023-03-30T12:35:21Z

lib/segment/src/payload_storage/query_checker.rs

@@ -234,6 +240,8 @@ mod tests {
            "rating": vec![3, 7, 9, 9],
            "color": "red",
            "has_delivery": true,
+            "parts": [],
+            "packaging": null


Is it possible to add a test for a field that is [null]?

How is the filtering supposed to behave in that case?

I guess [null] is equivalent to an empty array, so not null.
WDYT?

With the current implementation, it will not be matched with neither is_empty nor is_null.
If you want it to be matched with is_empty that's fine. However, what about [null, [null]], is that even a valid payload? if so, should it be equivalent to empty as well?

Another concern , wouldn't it be confusing for "key":[null] to get matched as empty and not as null with the word "null" in it? I don't see it making sense in the documentation 😕

Forget about my comment, I don't think it is necessary to test this.
Sorry for derailing your research.

It's alright, I added it anyway to test for any future changes regarding this case.

lib/segment/src/types.rs

generall

LGTM

generall · 2023-03-31T13:13:13Z

@ibrahim-akrab , thanks for the contribution! Please do not forget to /claim #1609 the bounty before I merge this

ibrahim-akrab · 2023-03-31T13:23:58Z

@generall, happy to help and thanks for the bounty 🤑

agourlay

Thanks for your contribution 👍

If you are still feeling motivated you could add a short doc about it in the Filtering section.

ibrahim-akrab · 2023-03-31T15:17:32Z

@agourlay Just submitted a PR there. Thanks for the suggestion 👍

* add minimal working is_null filter * add is_null condition to grpc api (backward compatible) * add unit tests is_null and is_empty conditions * add is_null to points.proto file * add some failing OpenAPI tests * fix a failing test due to change in collection data * refactor MultiValue's check for is_null * fix is_empty condition not picking up "key":[] * remove duplicate OpenAPI integration test * reuse same variable in condition checker tests * update grpc docs * fix is_null cardinality estimation to match is_empty * update openapi specs * remove unused debug statements * add new test points to original test_collection * fix failing tests according to newly added points * add the `"key":[null]` test_case

ibrahim-akrab added 5 commits March 28, 2023 17:08

add minimal working is_null filter

82d9f56

add is_null condition to grpc api (backward compatible)

75decc4

add unit tests is_null and is_empty conditions

d988d55

add is_null to points.proto file

aafb6ef

add some failing OpenAPI tests

de40037

ibrahim-akrab marked this pull request as ready for review March 29, 2023 00:14

ibrahim-akrab added 3 commits March 29, 2023 03:36

fix a failing test due to change in collection data

c09a621

refactor MultiValue's check for is_null

47837c5

fix is_empty condition not picking up "key":[]

ce3feb8

ibrahim-akrab commented Mar 29, 2023

View reviewed changes

remove duplicate OpenAPI integration test

7ecbd64

generall requested a review from agourlay March 29, 2023 14:37

timvisee requested changes Mar 29, 2023

View reviewed changes

lib/segment/src/payload_storage/query_checker.rs Outdated Show resolved Hide resolved

reuse same variable in condition checker tests

6c4baae

timvisee approved these changes Mar 29, 2023

View reviewed changes

generall requested changes Mar 30, 2023

View reviewed changes

ibrahim-akrab added 3 commits March 30, 2023 13:00

update grpc docs

ed0f3e9

fix is_null cardinality estimation to match is_empty

19977b4

update openapi specs

48b303e

agourlay reviewed Mar 30, 2023

View reviewed changes

lib/segment/src/types.rs Outdated Show resolved Hide resolved

agourlay reviewed Mar 30, 2023

View reviewed changes

lib/segment/src/types.rs Outdated Show resolved Hide resolved

ibrahim-akrab added 4 commits March 30, 2023 15:16

remove unused debug statements

2a6a828

add new test points to original test_collection

4eef537

fix failing tests according to newly added points

974ceb2

add the "key":[null] test_case

3bcac3b

generall approved these changes Mar 31, 2023

View reviewed changes

algora-pbc bot mentioned this pull request Mar 31, 2023

Explicit isNull condition for payload filter #1609

Closed

agourlay approved these changes Mar 31, 2023

View reviewed changes

generall merged commit f85ae02 into qdrant:dev Apr 1, 2023

ibrahim-akrab deleted the handle_null branch April 2, 2023 08:46

generall mentioned this pull request Apr 19, 2023

upd wal commit #1749

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `isNull` condition for payload filtering #1617

add `isNull` condition for payload filtering #1617

ibrahim-akrab commented Mar 28, 2023 •

edited

Loading

ibrahim-akrab commented Mar 29, 2023

ibrahim-akrab commented Mar 29, 2023

ibrahim-akrab Mar 29, 2023

ibrahim-akrab commented Mar 29, 2023

timvisee left a comment

generall left a comment

generall Mar 30, 2023

ibrahim-akrab Mar 30, 2023 •

edited

Loading

agourlay Mar 30, 2023

ibrahim-akrab Mar 30, 2023 •

edited

Loading

generall Mar 30, 2023

agourlay Mar 30, 2023

agourlay Mar 30, 2023

ibrahim-akrab Mar 30, 2023

agourlay Mar 30, 2023

ibrahim-akrab Mar 30, 2023

ibrahim-akrab Mar 30, 2023 •

edited

Loading

agourlay Mar 31, 2023

ibrahim-akrab Mar 31, 2023

generall left a comment

generall commented Mar 31, 2023

ibrahim-akrab commented Mar 31, 2023

agourlay left a comment •

edited

Loading

ibrahim-akrab commented Mar 31, 2023

add isNull condition for payload filtering #1617

add isNull condition for payload filtering #1617

Conversation

ibrahim-akrab commented Mar 28, 2023 • edited Loading

All Submissions:

New Feature Submissions:

TODO:

ibrahim-akrab commented Mar 29, 2023

ibrahim-akrab commented Mar 29, 2023

Choose a reason for hiding this comment

ibrahim-akrab commented Mar 29, 2023

timvisee left a comment

Choose a reason for hiding this comment

generall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibrahim-akrab Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibrahim-akrab Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibrahim-akrab Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

generall left a comment

Choose a reason for hiding this comment

generall commented Mar 31, 2023

ibrahim-akrab commented Mar 31, 2023

agourlay left a comment • edited Loading

Choose a reason for hiding this comment

ibrahim-akrab commented Mar 31, 2023

add `isNull` condition for payload filtering #1617

add `isNull` condition for payload filtering #1617

ibrahim-akrab commented Mar 28, 2023 •

edited

Loading

ibrahim-akrab Mar 30, 2023 •

edited

Loading

ibrahim-akrab Mar 30, 2023 •

edited

Loading

ibrahim-akrab Mar 30, 2023 •

edited

Loading

agourlay left a comment •

edited

Loading