[Feature] Making action masks compatible with q value modules and e-greedy #1499

matteobettini · 2023-09-06T16:52:10Z

Action masks where introduced in #1421.

This PR has the job of making the components in the training pipeline use this mask.

The components that need updating are:

q modules and actors
e-greedy

This addresses some of the issues brought up in #1404

cc @Kang-SungKu @1030852813 @fedebotu

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini · 2023-09-06T16:54:02Z

torchrl/modules/tensordict_module/actors.py

+                raise KeyError(
+                    f"Action mask key {self.action_mask_key} not found in {tensordict}."
+                )
+            action_values[action_mask] = torch.finfo(action_values.dtype).min


we need to discuss if this is the best choice for representing masked values

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini · 2023-09-07T09:01:45Z

We have to decide what to do with the wrapper.
As of now i just added a new greedy module.
Do we want to deprecate the wrapper? what is the process for that?

Signed-off-by: Matteo Bettini <matbet@meta.com>

torchrl/modules/tensordict_module/exploration.py

test/test_exploration.py

torchrl/modules/tensordict_module/exploration.py

vmoens · 2023-09-07T09:11:11Z

torchrl/modules/tensordict_module/actors.py

+                raise KeyError(
+                    f"Action mask key {self.action_mask_key} not found in {tensordict}."
+                )
+            action_values[action_mask] = torch.finfo(action_values.dtype).min


2 things
(1) I think this is wrong, should be ~action_mask no?
(2) we should not modify the values in-place

rather torch.where(action_mask, action_values, torch.finfo(action_values.dtype).min) wdyt?

oops this was left over

those that pass gradients when the condition is applied? not sure

torchrl/modules/tensordict_module/actors.py

vmoens · 2023-09-07T09:13:09Z

We have to decide what to do with the wrapper. As of now i just added a new greedy module. Do we want to deprecate the wrapper? what is the process for that?

Add a deprecation warning in the constructor, say that it'll be deprecated in v0.3

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini · 2023-09-07T09:57:10Z

should be ready

torchrl/modules/tensordict_module/exploration.py

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens

LGTM thanks for taking care of this

…reedy (pytorch#1499) Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

init

7c420d7

Signed-off-by: Matteo Bettini <matbet@meta.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 6, 2023

matteobettini commented Sep 6, 2023

View reviewed changes

matteobettini added 6 commits September 6, 2023 18:43

amend

3b79ee4

Signed-off-by: Matteo Bettini <matbet@meta.com>

test

eb541cd

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

c328249

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

2c021e3

Signed-off-by: Matteo Bettini <matbet@meta.com>

fix

ff8e058

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

00365ca

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini marked this pull request as ready for review September 7, 2023 08:59

test typo

b9b1bb6

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens reviewed Sep 7, 2023

View reviewed changes

matteobettini added 3 commits September 7, 2023 10:51

review

fc4d2eb

Signed-off-by: Matteo Bettini <matbet@meta.com>

review

ba18d3a

Signed-off-by: Matteo Bettini <matbet@meta.com>

typo

32ef644

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens added enhancement New feature or request Refactoring Refactoring of an existing feature labels Sep 7, 2023

vmoens reviewed Sep 7, 2023

View reviewed changes

matteobettini and others added 4 commits September 7, 2023 13:39

Update torchrl/modules/tensordict_module/exploration.py

4d06ed9

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Apply suggestions from code review

17da46e

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

review

005a83f

Signed-off-by: Matteo Bettini <matbet@meta.com>

typo

364ee71

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens approved these changes Sep 7, 2023

View reviewed changes

vmoens merged commit 786020d into pytorch:main Sep 7, 2023

matteobettini deleted the mask_qvalue branch September 7, 2023 14:30

vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023

[Feature] Making action masks compatible with q value modules and e-g…

7ee8f13

…reedy (pytorch#1499) Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Making action masks compatible with q value modules and e-greedy #1499

[Feature] Making action masks compatible with q value modules and e-greedy #1499

matteobettini commented Sep 6, 2023

matteobettini Sep 6, 2023

matteobettini commented Sep 7, 2023

vmoens Sep 7, 2023

matteobettini Sep 7, 2023 •

edited

Loading

vmoens commented Sep 7, 2023

matteobettini commented Sep 7, 2023

vmoens left a comment

[Feature] Making action masks compatible with q value modules and e-greedy #1499

[Feature] Making action masks compatible with q value modules and e-greedy #1499

Conversation

matteobettini commented Sep 6, 2023

matteobettini Sep 6, 2023

Choose a reason for hiding this comment

matteobettini commented Sep 7, 2023

vmoens Sep 7, 2023

Choose a reason for hiding this comment

matteobettini Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

vmoens commented Sep 7, 2023

matteobettini commented Sep 7, 2023

vmoens left a comment

Choose a reason for hiding this comment

matteobettini Sep 7, 2023 •

edited

Loading