Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baysesian Config Flow #122552

Open
wants to merge 73 commits into
base: dev
Choose a base branch
from
Open

Baysesian Config Flow #122552

wants to merge 73 commits into from

Conversation

HarvsG
Copy link
Contributor

@HarvsG HarvsG commented Jul 24, 2024

Proposed change

Implement a config flow for the Bayesian Helper
Will likely require #117355 Not required
Since this may increase the user base I am keen to merge #119281 first as it is breaking. ✅ DONE

Explanation for reviewers:
The Bayesian helper is used to create a virtual binary sensor that gets its state by from a combination of other sensor states. For example a user could predict whether binary_sensor.bayesian_cooking is on from a variety of correlated, but imperfect sensors sensor.kitchen_humidity, {{(sensor.kitchen_temperature - sensor.avergae_temperature) > 3}}, sensor.gas_usage, sensor.electricity_use_power, binary_sensor.no_one_home
Rather than using "if this, then that" logic, Bayesian uses a statistical/probability technique called Bayes' Rule.
Essentially, the user configures these imperfect sensors and an expected states e.g. binary_sensor.no_one_home == True and provides the probability of these being True if bayesian binary_sensor.bayesian_cooking is on and if it is off. Each of these is known as an observation and in nearly all cases users will specify >1 observation for each Bayesian binary sensor. Observations may be configured as state (checks for exact matching of a state), numeric_state, which checks if a sensor is in a range and template which evaluates the template and coerces it into a Boolean.
There are also variables which apply to the Bayesian binary sensor rather than per-observation; prior and probability_threshold. prior is the baseline probability that sensor should be on, and probability_threshold is the probability at which the sensor would be considered on.

Because of this one-to-many relationship of Bayesian binary sensors to their constituent observations this is a complex ConfigFlow. A similar thing has been achieved before by the 'Scrape' component.

ConfigFlow Example video

OptionsFlow Example video

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

@gjohansson-ST
Copy link
Member

I don't think this should use subentries as in managing observations (which I think is the driver here).
It should all be handled by an OptionsFlow perhaps built similar to how scrape is done currently.

@HarvsG
Copy link
Contributor Author

HarvsG commented Jul 25, 2024

I don't think this should use subentries as in managing observations (which I think is the driver here). It should all be handled by an OptionsFlow perhaps built similar to how scrape is done currently.

Thank you for this advice, perhaps you could explain in a little more detail? I have had a look at the scape component.

Each Bayesian sensor may have none, one or many of a template, state or numeric_state. Those themselves have between 3 and 6 config options. How should I allow a user to keep adding more? Should I just allow the addition of multiple entities and templates and then ask them to configure the to_state, prob_given_false for each etc on the next page?

@HarvsG

This comment was marked as outdated.

STATE_SUBSCHEMA = vol.Schema(
{
vol.Required(CONF_ENTITY_ID): selector.EntitySelector(
selector.EntitySelectorConfig(domain=[SENSOR_DOMAIN, BINARY_SENSOR_DOMAIN])
Copy link
Contributor Author

@HarvsG HarvsG Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the only sensor domains we want to support? Or is this too constraining?

Copy link
Contributor Author

@HarvsG HarvsG Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 5b0d09f I have enabled many more domains in the absence of a centrally managed list of all possible domains. Broadly I have enabled all domains for state observations that a user could reasonably predict e.g those that are not timestamps
For numeric domains I have included all those that I can find which have a state that is usually a number.

@HarvsG

This comment was marked as resolved.

@HarvsG HarvsG requested a review from gjohansson-ST October 25, 2024 08:10
@HarvsG

This comment was marked as resolved.

@HarvsG HarvsG marked this pull request as ready for review October 25, 2024 08:18
Copy link
Member

@gjohansson-ST gjohansson-ST left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further input.

Maybe you can provide some images or a vid in the PR description how this is working so we can see the flow makes sense.

homeassistant/components/bayesian/binary_sensor.py Outdated Show resolved Hide resolved
homeassistant/components/bayesian/binary_sensor.py Outdated Show resolved Hide resolved
@@ -490,7 +541,10 @@ def _build_observations_by_template(self) -> dict[Template, list[Observation]]:
for observation in self._observations:
if observation.value_template is None:
continue

if isinstance(observation.value_template, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to make it a Template in async_setup_entry so here it's treated the same

Copy link
Contributor Author

@HarvsG HarvsG Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, by adding this to async_setup_entry:

    for observation in observations:
        if observation[CONF_PLATFORM] != CONF_TEMPLATE:
            continue
        observation[CONF_VALUE_TEMPLATE] = Template(
            observation[CONF_VALUE_TEMPLATE], hass=hass
        )

but something and async_setup_entry doesn't like non JSON serializable objects, whereas async_setup_platform doesnt mind

2024-11-11 22:54:16.072 ERROR (MainThread) [homeassistant.helpers.http] Unable to serialize to JSON. Bad data found at $.options.observations[0].value_template=Template<template=({{is_state('input_boolean.ic','off') and ((as_timestamp(now()) - as_timestamp(states.input_boolean.ic.last_changed)) > 300)}}) renders=4>(<class 'homeassistant.helpers.template.Template'>
2024-11-11 22:54:17.074 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved (None):   File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspaces/core/homeassistant/__main__.py", line 227, in <module>
    sys.exit(main())
  File "/workspaces/core/homeassistant/__main__.py", line 213, in main
    exit_code = runner.run(runtime_conf)
  File "/workspaces/core/homeassistant/runner.py", line 154, in run
    return loop.run_until_complete(setup_and_run_hass(runtime_config))
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 674, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 641, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1978, in _run_once
    handle._run()
  File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/workspaces/core/homeassistant/helpers/storage.py", line 483, in _async_schedule_callback_delayed_write
    self.hass.async_create_task_internal(
  File "/workspaces/core/homeassistant/core.py", line 841, in async_create_task_internal
    task = create_eager_task(target, name=name, loop=self.loop)
  File "/workspaces/core/homeassistant/util/async_.py", line 45, in create_eager_task
    return Task(coro, loop=loop, name=name, eager_start=True)
Traceback (most recent call last):
  File "/workspaces/core/homeassistant/helpers/json.py", line 87, in json_encoder_default
    raise TypeError
TypeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspaces/core/homeassistant/helpers/storage.py", line 516, in _async_callback_delayed_write
    await self._async_handle_write_data()
  File "/workspaces/core/homeassistant/helpers/storage.py", line 541, in _async_handle_write_data
    await self._async_write_data(self.path, data)
  File "/workspaces/core/homeassistant/helpers/storage.py", line 546, in _async_write_data
    await self.hass.async_add_executor_job(self._write_data, self.path, data)
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/core/homeassistant/helpers/storage.py", line 553, in _write_data
    data["data"] = data.pop("data_func")()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/core/homeassistant/config_entries.py", line 2426, in _data_to_save
    "entries": [entry.as_storage_fragment for entry in self._entries.values()]  # type: ignore[misc]
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/propcache/_helpers_c.pyx", line 88, in propcache._helpers_c.cached_property.__get__
  File "/workspaces/core/homeassistant/config_entries.py", line 532, in as_storage_fragment
    return json_fragment(json_bytes_sorted(self.as_dict()))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Type is not JSON serializable: Template

Copy link
Contributor Author

@HarvsG HarvsG Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some digging this is because async_setup_platform() is called with the callback _async_schedule_add_entities() whereas async_setup_entry() is called with _async_schedule_add_entities_for_entry() as the callback.

Should I ?:

  1. Leave it as is
  2. Change binary_sensor.async_setup_platform() to also use strings?
  3. Something else?

homeassistant/components/bayesian/config_flow.py Outdated Show resolved Hide resolved
return [c.value for c in ObservationTypes]


class ConfigFlowSteps(StrEnum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see this as constants.

Copy link
Contributor Author

@HarvsG HarvsG Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b7ea3ec
Let me know if you are happy with where they are in config_flow.py or if you think they should be in const.py?

).extend(OBSERVATION_BOILERPLATE.schema)


def _convert_percentages_to_fractions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure what these are doing but we could just store it as the user inputs it instead of converting.
We could do that conversion once when loading the entry instead (in the platform in setup_entry).

Copy link
Contributor Author

@HarvsG HarvsG Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are they because I am taking user input as a percentage probability (0-100%) but storing it and using it as a fractional probability (0-1) as this is what the YAML config and binary_sensor.py uses. This was chosen because percentages are a more commonly understood representation of probability. "Theres a 50% chance that a coin comes up heads" is more commonly said than "Theres a p=0.5 chance that a coin comes up heads".

I tried to make this work with converting the percentages in async_setup_entry but couldn't

    # convert from percentages to fractional probabilities
    prior: float = config[CONF_PRIOR] / 100
    probability_threshold: float = config[CONF_PROBABILITY_THRESHOLD] / 100
    for observation in observations:
        observation[CONF_P_GIVEN_T] = observation[CONF_P_GIVEN_T] / 100
        observation[CONF_P_GIVEN_F] = observation[CONF_P_GIVEN_F] / 100

The only way to convert from a percentage to a fraction is to divide by 100, however there is no obvious way of checking if it has already been done. Therefore any repeated calls to async_setup_entry duplicates the division, it seems async_setup_entry() should be idempotent. This particularly becomes an issue when using options flows etc.

Even if we did manage to remove then need for _convert_percentages_to_fractions() we would still need to have _convert_fractions_to_percentages() when loading the stored values into the options flow UI for editing.

Solutions I see:

  1. Keep as is
  2. Abandon trying to use percentages in config and options flow and switch to using probabilities of 0 to 1 as used in YAML, although I think percentages are more intuitive for user-facing UIs, it may keep things simple for users that switch from YAML or find old tutorials online
  3. Create some new Percentage type and pass that to async_setup_entry() and only convert from that type to float
  4. Something else?

homeassistant/components/bayesian/config_flow.py Outdated Show resolved Hide resolved
@home-assistant home-assistant bot marked this pull request as draft November 11, 2024 19:32
@HarvsG HarvsG marked this pull request as ready for review November 11, 2024 23:57
@HarvsG
Copy link
Contributor Author

HarvsG commented Nov 12, 2024

Maybe you can provide some images or a vid in the PR description how this is working so we can see the flow makes sense.

I will do this tomorrow

Edit: Done, videos uploaded to YT as they are too large for github. Links in the Proposed changes.

_LOGGER = logging.getLogger(__name__)
USER = "user"
OBSERVATION_SELECTOR = "observation_selector"
ALLOWED_STATE_DOMAINS = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @HarvsG, thank you for the PR.

I just wanted to check it out, cloned the branch locally, and while setting up my Bayesian sensor, I realized I'm not able to add a binary sensor observation. Simply adding BINARY_SENSOR_DOMAIN here fixes the issue.

Is this intentional? Or am I using the wrong branch? 🤔 Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants