Allow complex filters being set #1401

jluethi · 2024-04-16T14:17:48Z

If my understanding is correct, we currently can filter for the following:
Inclusion only
AND combination of filters
Single filter per attribute / type
For types: Filter for boolean true vs. false
For attributes: Filter for exact match of the attribute

Correct me if I'm wrong and we already support more in V2 @tcompa

Because it comes up in web discussions, I'll try to give a larger perspective of where this could go in the future. This is not meant as blocking for the v2 release.

There are many areas where we could expand the filters to allow for more complex filters, ordered by approximate importance:

Important:

Exclude given matches: e.g. process everything except well B07 (because there was some issue with that well)
Support for OR filters per attribute: Filter for wells B03, B05, C04, C07 => allows for selection of multiple items

Nice to have:

Filter with complex patterns, e.g. all B wells by saying filter "well = B*"

I don't currently see the value in:
Support OR filters between attributes: Look for well B03 or acquisition 1

tcompa · 2024-04-23T09:56:50Z

Reminder: the action of merging two filter sets together is now also implemented in fractal-web (ref fractal-analytics-platform/fractal-web#442). When we change the filter structure, we should either change it in both places or offer a dedicated endpoint for the merge operation.

tcompa · 2024-12-12T09:44:00Z

Here is the latest update.

Images subpackage

The current Filters model looks like

class Filters(BaseModel, extra=Extra.forbid):
    attributes: dict[str, Any] = Field(default_factory=dict)
    types: dict[str, bool] = Field(default_factory=dict)

    # Validators
    _attributes = validator("attributes", allow_reuse=True)(
        valdictkeys("attributes")
    )
    _types = validator("types", allow_reuse=True)(valdictkeys("types"))

    @validator("attributes")
    def validate_attributes(
        cls, v: dict[str, Any]
    ) -> dict[str, Union[int, float, str, bool, None]]:
        for key, value in v.items():
            if not isinstance(value, (int, float, str, bool, type(None))):
                raise ValueError(
                    f"Filters.attributes[{key}] must be a scalar "
                    "(int, float, str, bool, or None). "
                    f"Given {value} ({type(value)})"
                )
        return v

The new one should look like

The current `Filters` model looks like
```python
class Filters(BaseModel, extra=Extra.forbid):
    attributes_include: dict[str, list[Any]] = Field(default_factory=dict)
    attributes_exclude: dict[str, list[Any]] = Field(default_factory=dict)
    types: dict[str, bool] = Field(default_factory=dict)

... + validators

where the lists are meant as internally joined by a logical OR.

Validators will now apply to the list elements:

Each list element must be of the same scalar type as before (int, float, str, bool, None)
List can have any length 1 (-> include or exclude an exact match) or larger
List cannot have length 0.
We cannot have the same key both in attributes_include and attributes_exclude. This is a validation error.

The match_filter function should be updated accordingly

Examples:

{"attributes_include": {}, "attributes_exclude": {}, "types": {}}
--> no filtering

{"attributes_include": {keyA: []},  "attributes_exclude": {}, "types": {}}
--> invalid

{"attributes_include": {keyA: ["value1"]},  "attributes_exclude": {}, "types": {}}
--> requires exact match with "value1"

{"attributes_include": {keyA: ["value1", "value2"]},  "attributes_exclude": {}, "types": {}}
--> requires exact match with "value1" OR "value2"

{"attributes_include": {keyA: ["value1"]},  "attributes_exclude": {keyA: ["value2"]}, "types": {}}
--> invalid filters (even though the two filters would be logically compatible)

{"attributes_include": {keyA: ["value1"]},  "attributes_exclude": {keyB: ["value2"]}, "types": {}}
--> valid filters

DB

Current version

    input_filters: dict[
        Literal["attributes", "types"], dict[str, Any]
    ] = Field(
        sa_column=Column(
            JSON,
            nullable=False,
            server_default='{"attributes": {}, "types": {}}',
        )
    )

to be updated accordingly.

API

Changes are the expected ones (that is, using the new Filters model).
Pay attention to the fact that ORM objects only have Python dict attributes for filters, and we should always cast to Filters when appropriate. For instance in _workflow_insert_task we have a wrong type hint, saying that input_filters are a Filters object while they are a dict.

Data migration for current DB values

Existing relevant db columns are

DatasetV2.filters
WorkflowTaskV2.input_filters

A data-migration script is required, and it will make this change:

Loop over all DatasetV2s
If filters = {"attributes": {}, "types": WHATEVER}, replace it with filters = {"attributes_include": {}, "attributes_exclude": {}, "types": WHATEVER}.
If filters = {"attributes": {"key1: "value1"}, "types": WHATEVER}, replace it with filters = {"attributes_include": {"key1: ["value1"]}, "attributes_exclude": {}, "types": WHATEVER}
Repeat the same for WorkflowTaskV2s

Backwards-compatibility for importing old workflows/datasets from JSON

If a workflow was exported in the past, and it has some filters set, these would be in the "old" version.

This model

class WorkflowTaskImportV2(BaseModel, extra=Extra.forbid):
    ...
    input_filters: Optional[Filters] = None

should become something like

class WorkflowTaskImportV2(BaseModel, extra=Extra.forbid):
    ...
    input_filters: Optional[Union[Filters, FiltersLegacy]] = None

We should double check that:

When the input value is a valid Filters object, nothing special occurs
When it is a FiltersLegacy object, it is automatically replaced by the corresponding Filters object

The same applies to DatasetImportV2

jluethi · 2024-12-18T14:08:48Z

Let's not invest too much into handling older attribute filters.

Suggested behavior for workflow import: If any attribute filters are set in a workflow, upon import show a warning that it's dropped.

Suggested behavior for dataset import: TBD. Could raise an informative NotImplementedError

tcompa · 2024-12-18T14:17:53Z

Two additions:

Workflowtask filters must not include any attribute (only types)
The job will gain some attribute filters (the ones that were originally associated to the wftask) - but not types

jluethi added this to Fractal Project Management Apr 16, 2024

github-project-automation bot moved this to TODO in Fractal Project Management Apr 16, 2024

jluethi mentioned this issue Apr 16, 2024

Support adding input_filters to a V2 WorkflowTask fractal-analytics-platform/fractal-web#437

Closed

tcompa mentioned this issue Apr 23, 2024

Show current filters during workflow submission fractal-analytics-platform/fractal-web#442

Closed

jluethi added the flexibility Support more workflow-execution use cases label Jul 3, 2024

tcompa mentioned this issue Dec 12, 2024

[placeholder] UI for more complex filters fractal-analytics-platform/fractal-web#678

Open

tcompa assigned mfranzon and ychiucco Dec 12, 2024

ychiucco linked a pull request Dec 16, 2024 that will close this issue

Allow complex Filters #2145

Draft

3 tasks

tcompa mentioned this issue Dec 19, 2024

Define backwards compatibility for complex filters #2156

Open

tcompa mentioned this issue Jan 7, 2025

Show list of images that would be processed by the first task of a job fractal-analytics-platform/fractal-web#682

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow complex filters being set #1401

Allow complex filters being set #1401

jluethi commented Apr 16, 2024

tcompa commented Apr 23, 2024

tcompa commented Dec 12, 2024

jluethi commented Dec 18, 2024 •

edited

Loading

tcompa commented Dec 18, 2024

Allow complex filters being set #1401

Allow complex filters being set #1401

Comments

jluethi commented Apr 16, 2024

tcompa commented Apr 23, 2024

tcompa commented Dec 12, 2024

Images subpackage

DB

API

Data migration for current DB values

Backwards-compatibility for importing old workflows/datasets from JSON

jluethi commented Dec 18, 2024 • edited Loading

tcompa commented Dec 18, 2024

jluethi commented Dec 18, 2024 •

edited

Loading