[Refactor] the usage of tensordict keys in loss modules #1175

Blonck · 2023-05-22T11:39:54Z

Description

We have various loss modules in RL.
They work as

loss_module = LossModule(network, …)
loss_module(data)

These loss modules access the actual data by keys. Some keys are configurable via ctor,

rl/torchrl/objectives/ppo.py

Line 125 in 3c8197b

advantage_key: str = "advantage",

others are hardcoded,

rl/torchrl/objectives/ddpg.py

Line 139 in 714d645

return -td_copy.get("state_action_value")

This is refactored such that all relevant keys can be set via

loss_module.set_keys(sample_log_prob=“some_other_key”)

The same is done for the advantage modules.

advantage_module.set_keys(advantage="other_advantage")

Closes #1174

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

Thanks a mil for this!

Is there a way for tensordict_keys to be defined at the class level (and not instance)? Such that one can do PPOLoss.tensordict_keys without instantiating an object?

Also, what about making tensordict_keys a property using @abc.abstractmethod in LossModule parent class, such that we force all new losses to have that attribute?

torchrl/objectives/ddpg.py

test/test_cost.py

vmoens · 2023-05-23T15:32:02Z

torchrl/objectives/common.py

+            if key not in self.tensordict_keys.keys():
+                raise ValueError(f"{key} not a valid tensordict key")
+            set_value = value if value is not None else self.tensordict_keys[key]
+            setattr(self, key, set_value)


I'd rather have these keys contained in a separate container, like a dictionary or else.
I'm afraid that as the number of keys increase, we'll end up with a class with many attributes and no easy access to the list of them all.

No, the container is already there. However, using the keys would be involve always one extra step.

tensordict.get(self.action_key) -> tensordict.get(self.tensordict_keys["action_key"])

From the data organisation it would be the better solution, but the code will be a little more noisy.

I agree it's a bit clunky.
@matteobettini do you have an opinion on the matter?

I prefer tensordict.get(self.tensordict_keys["action_key"]).

Also, would loss_keys make sense instead of tensordict_keys?

What about an Enum instead of a dict (to make sure only a finite set of keys is present)?

tensordict.get(self.loss_keys.action_key)

What about an Enum instead of a dict (to make sure only a finite set of keys is present)?

This would mean constructing the enum in the base class from data provided by the child class. I guess this is possible, since almost everything is possible in python. However, the syntax would not look like

tensordict.get(self.loss_keys.action_key)

since the key must be converted to str (or pair of strings). StrEnum would be a solution, but are introduced in python 3.11.

vmoens

Thanks a mil for this!

Is there a way for tensordict_keys to be defined at the class level (and not instance)? Such that one can do PPOLoss.tensordict_keys without instantiating an object?

Also, what about making tensordict_keys a property using @abc.abstractmethod in LossModule parent class, such that we force all new losses to have that attribute?

Blonck · 2023-05-23T15:57:20Z

Is there a way for tensordict_keys to be defined at the class level (and not instance)? Such that one can do PPOLoss.tensordict_keys without instantiating an object?

That would lead that the behavior of one instance of a loss module would change if .set_keys() is called somewhere else. I think that could surprise the user of torchrl. (Although, usually only one loss module is used.)

Also, what about making tensordict_keys a property using @abc.abstractmethod in LossModule parent class, such that we force all new losses to have that attribute?

That is a good idea, I will try to implement this. It should lead to an easier interface.

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

vmoens · 2023-05-23T16:43:03Z

That would lead that the behavior of one instance of a loss module would change if .set_keys() is called somewhere else. I think that could surprise the user of torchrl. (Although, usually only one loss module is used.)

Not necessarily.
Here you clearly separate the default and non-default. The default should be defined at the class level if they're kept separated.

Blonck · 2023-05-23T17:28:30Z

Not necessarily.
Here you clearly separate the default and non-default. The default should be defined at the class level if they're kept separated.

Got it. I've implemented this idea and it looks good. Although, I had to use a abstract staticmethod. Afaik, there is no static property decorator.

matteobettini · 2023-05-23T18:02:20Z

Also one thing to mention is the the new keys have to be transparently reflected in the value_estimators

loss = loss()
loss.set_keys()
loss.make_value_estim()

and

loss = loss()
loss.make_value_estim()
loss.set_keys()

Should both work with the keys reflected in the value estimators

vmoens · 2023-05-23T18:19:01Z

Also one thing to mention is the the new keys have to be transparently reflected in the value_estimators
loss = loss()
loss.set_keys()
loss.make_value_estim()
and
loss = loss()
loss.make_value_estim()
loss.set_keys()
Should both work with the keys reflected in the value estimators

ideally keys of the value estimator should only be set through the value estimator no?

matteobettini · 2023-05-23T18:25:19Z

But the value estimator is created by the loss.

do we want to make users to call set_keys twice?

already calling it once is a added complexity for MARL users

Blonck · 2023-05-23T18:43:08Z

But the value estimator is created by the loss.

do we want to make users to call set_keys twice?

already calling it once is a added complexity for MARL users

That is a good point. This problem is introduced by allowing to configure the tensordict keys after constructing the losses.
I could solve this by extending .set_keys(...) so that it also sets the keys for the value estimator.
Another solution would be to remove .set_keys(...) and add a generic constructor argument for all loss modules, e.g., tensordict_keys.
If there is no need to configure the keys during runtime, I would prefer the later one, although the code becomes a bit clunky.

Blonck · 2023-05-23T18:50:23Z

The solution with the ctor would roughly look like:

class MyLoss(LossModule):
    def __init__(self, ...., tensordict_keys = {}):
         super().__init__(tensordict_keys=tensordict_keys)
         ...

class LossModule(nn.Module, ABC):
    def __init__(self, tensordict_keys={}):
        #merge default tensordict keys with provided keys
       default_keys = self.default_tensordict_keys()
       for key, value in tensordict_keys.items():
          if key not in default_keys:
              raise ValueError(...)
          default_keys[key] = value
      // create attributes
      ...

Blonck · 2023-05-24T06:58:24Z

Forwarding the new keys via .set_keys() to the value estimator would also require to maintain a mapping from key_name_loss to key_name_value_estimator. In the default case both key names are identical, see ppo.py

# from ppo.py
def make_value_estimator(...):
    ...
    value_key = self.value_key
    if value_type == ValueEstimators.TD1:
        self._value_estimator = TD1Estimator(
            value_network=self.critic, value_key=value_key, **hp
        )

However, there is also this case ddpg.py:

# from ddpg.py
def make_value_estimator(...):
    ...
    value_key = "state_action_value" # <- would correspond to self.state_action_value_key
    if value_type == ValueEstimators.TD1:
        self._value_estimator = TD1Estimator(
            value_network=self.actor_critic, value_key=value_key, **hp
        )

Blonck · 2023-05-26T18:28:06Z

Btw, this code will raise an exception in case the value key name has a non-default value

loss = loss()
loss.make_value_estim()
loss.set_keys()

because

actor = # construct tensordict with non-default value key
loss = loss() # constructing loss with default key names works because the ctor doesn't use any key names
loss.make_value_estim() # will raise an exception because

The value estimator checks that the value key is in the value_network.out_keys.
Since the actor uses a non-default value key, but loss and hence the value estimator are instantiated with the default values this will raise an exception.

vmoens

Great work! Must have been quite a headache to come about!
See my comments in the code

torchrl/objectives/ppo.py

torchrl/objectives/a2c.py

torchrl/objectives/common.py

torchrl/objectives/ddpg.py

torchrl/objectives/value/advantages.py

vmoens

Great work! Must have been quite a headache to come about!
See my comments in the code

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

vmoens

Great work!
I'm happy to merge this but I have one more question: we have a tutorial about losses and this feature seems pretty advanced for a regular user that would like to code up a new loss with hard coded keys.
Is it mandatory to code _AcceptedKeys?
It's a great tool but maybe we can reserve it for internal usage.

Have you checked that this tutorial is running under this PR?

vmoens · 2023-05-31T13:43:59Z

torchrl/objectives/common.py

    def __new__(cls, *args, **kwargs):
        cls.forward = set_exploration_type(ExplorationType.MODE)(cls.forward)
+        cls._tensor_keys = cls._AcceptedKeys()


maybe make this optional (only if _AcceptedKeys is present)?

The current implementation does not prevent users from crafting a loss module that lacks configurable keys, as _AcceptedKeys is defined as an empty set in such cases. However the abstract method prevents users from doing so:

@abstractmethod def _forward_value_estimator_keys(self, **kwargs) -> None: """Passes updated tensordict keys to the underlying value estimator.""" ...

In this case, the set_keys method will not function if supplied with any arguments, a behavior that aligns with my expectations.

We can remove the @AbstractMethod decorator and introducing an error condition if the .set_keys method is invoked while _forward_value_estimator_keys() remains undefined by the loss module. This adjustment would ensure an exception is triggered when .set_keys() is called from the custom loss module.

Got it
Up to you for the exception. In a way, if someone writes a loss module then calls set_keys without having written a set of keys they're probably way off the road...

vmoens · 2023-05-31T13:44:38Z

torchrl/objectives/common.py

+            >>> dqn_loss = DQNLoss(actor, action_space="one-hot")
+            >>> dqn_loss.set_keys(priority_key="td_error", action_value_key="action_value")
+        """
+        for key, value in kwargs.items():


if we make _AcceptedKeys optional, we can raise an exception if it is not present?

[Refactor] the usage of tensordict keys in loss modules

83dc591

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 22, 2023

Blonck added 8 commits May 22, 2023 14:59

Add more loss modules

09ced18

Add more loss modules

bc04cae

Refactor remaining loss modules

75c8ea1

Remove unnecessary tests

5a74a16

tensordict_keys dict is not longer overwritten from child classes

32725b4

Merge branch 'main' into refactor_loss_keys

ab94848

Harmonize key name for "state_value"

802fe48

Polish refactoring

c6186fc

Blonck force-pushed the refactor_loss_keys branch from e1b2350 to c6186fc Compare May 23, 2023 13:26

Merge branch 'main' into refactor_loss_keys

b694e8c

Blonck added the Refactoring Refactoring of an existing feature label May 23, 2023

Blonck self-assigned this May 23, 2023

Blonck requested a review from vmoens May 23, 2023 13:52

vmoens reviewed May 23, 2023

View reviewed changes

Apply suggestions from code review

9150b74

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Use abstract staticmethod to provide default values

bcd8a28

Merge branch 'main' into refactor_loss_keys

6f10920

Merge branch 'main' and rename tensordict_keys to loss_keys

67941df

Blonck added 13 commits May 25, 2023 21:22

Move tensordict key logig to base class

5d00ca0

Fix make_value_estimator of a2c.py

4db47e5

Remvove '_key' from keynames in ppo.py + polish

6b422f9

Remvove '_key' from keynames in ddpg.py + polish

317755d

Fix documentation in advantages.py

fe9fba0

Remvove '_key' from keynames in dqn.py + polish

34091e0

Remvove '_key' from keynames in dreamer.py + polish

4baa5dc

Remvove '_key' from keynames in iql.py and redq.py + polish

4595546

Remove tensor_keys from advantage ctor

8ae6ad9

Add documentation to a2c.py

a15e220

Change documentation of loss modules

f1187f3

Add unit test for advantages tensordict keys

3e09c58

Merge branch 'main' into refactor_loss_key_advanced

e52a3f2

Improve wording of docstrings

2dc81c9

vmoens reviewed May 27, 2023

View reviewed changes

Blonck and others added 5 commits May 28, 2023 09:36

Apply suggestions from code review

655c28d

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Merge branch 'pytorch:main' into refactor_loss_keys

226d4d3

Apply code review changes

75d33c6

Merge branch 'main' into refactor_loss_keys_github

4320db6

Change line breaking in docstrings for _AcceptedKeys

cf4cd09

vmoens approved these changes May 31, 2023

View reviewed changes

Blonck and others added 3 commits May 31, 2023 16:43

LossModule is not longer an abstract base class.

81c0413

Merge branch 'main' into refactor_loss_keys_github

6e753a4

Merge branch 'main' into refactor_loss_keys

cc784a1

vmoens merged commit 86c69df into pytorch:main May 31, 2023

skandermoalla mentioned this pull request Jun 1, 2023

[BUG] Misleading error message in advantage_module.set_keys() #1216

Closed

3 tasks

vmoens mentioned this pull request Jun 7, 2023

Port TorchRL tutorial for DDPG Loss to pytorch.org pytorch/tutorials#2413

Merged

4 tasks

skandermoalla mentioned this pull request Jun 20, 2023

[Feature Request] Keep intermediate keys when calling advantage modules and loss modules. #1299

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] the usage of tensordict keys in loss modules #1175

[Refactor] the usage of tensordict keys in loss modules #1175

Blonck commented May 22, 2023 •

edited

Loading

vmoens left a comment

vmoens May 23, 2023

Blonck May 23, 2023

vmoens May 23, 2023

matteobettini May 23, 2023 •

edited

Loading

vmoens May 23, 2023

Blonck May 24, 2023 •

edited

Loading

vmoens left a comment

Blonck commented May 23, 2023

vmoens commented May 23, 2023

Blonck commented May 23, 2023

matteobettini commented May 23, 2023

vmoens commented May 23, 2023

matteobettini commented May 23, 2023 •

edited

Loading

Blonck commented May 23, 2023

Blonck commented May 23, 2023

Blonck commented May 24, 2023

Blonck commented May 26, 2023

vmoens left a comment

vmoens left a comment

vmoens left a comment •

edited

Loading

vmoens May 31, 2023

Blonck May 31, 2023

vmoens May 31, 2023

vmoens May 31, 2023

[Refactor] the usage of tensordict keys in loss modules #1175

[Refactor] the usage of tensordict keys in loss modules #1175

Conversation

Blonck commented May 22, 2023 • edited Loading

Description

Types of changes

Checklist

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matteobettini May 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blonck May 24, 2023 • edited Loading

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

Blonck commented May 23, 2023

vmoens commented May 23, 2023

Blonck commented May 23, 2023

matteobettini commented May 23, 2023

vmoens commented May 23, 2023

matteobettini commented May 23, 2023 • edited Loading

Blonck commented May 23, 2023

Blonck commented May 23, 2023

Blonck commented May 24, 2023

Blonck commented May 26, 2023

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

vmoens left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Blonck commented May 22, 2023 •

edited

Loading

matteobettini May 23, 2023 •

edited

Loading

Blonck May 24, 2023 •

edited

Loading

matteobettini commented May 23, 2023 •

edited

Loading

vmoens left a comment •

edited

Loading