[Algorithm] Update TD3 Example #1523

BY571 · 2023-09-14T13:14:54Z

Description

Updated TD3 script similar to PPO Update, added time logging, more comments, cleaner structure, and fixes here and there. Running some tests right now to verify performance.
What do you think @vmoens, @albertbou92 @matteobettini, how could we improve the example further?

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

LGTM! Some minor comments

examples/td3/config.yaml

examples/td3/td3.py

examples/td3/utils.py

torchrl/objectives/td3.py

BY571 · 2023-09-21T16:33:26Z

test/test_cost.py

+            for i in loss_val:
+                assert i in loss_val_td.values(), f"{i} not in {loss_val_td.values()}"
+            # for i, key in enumerate(loss_val_td.keys()):
+            # torch.testing.assert_close(loss_val_td.get(key), loss_val[i])


This is dangerous as keys in the tensordict get ordered by name but output tuple loss_val doesn't. So for now im just checking if all values in the loss_val tuple are also in the loss_val_td.

# actor metadata metadata = { "state_action_value_actor": state_action_value_actor.mean().detach(), } # value metadata metadata = { "td_error": td_error, "next_state_value": next_target_qvalue.mean().detach(), "pred_value": current_qvalue.mean().detach(), "target_value": target_value.mean().detach(), } # out tensordict td_out = TensorDict( source={ "loss_actor": loss_actor, "loss_qvalue": loss_qval, **metadata_actor, **metadata_value, }, batch_size=[], )

loss_vals will be in that order (loss_actor, loss_qvalue, state_action_value_actor, next_state_value, pred_value, target_value)
However, as the items are getting ordered in the TD by the keys the output tensordict has actually this order:
(loss_actor, loss_qvalue, next_state_value, pred_value, state_action_value_actor, target_value)

dispatch returns the keys in the order of out_keys.
So this is predictable, we can just do

for i, key in enumerate(loss.out_keys): torch.testing.assert_close(loss_val_td.get(key), loss_val[i])

does that solve the problem?

# Conflicts: # examples/td3/utils.py

vmoens

LGTM
let's wait for the tests to pass!

BY571 added 12 commits September 6, 2023 14:26

update executable

6339a07

fix objective

9e890b3

fix objective

117c477

Update initial frames and general structure

d2b3ad4

fixes

9c6c358

Merge branch 'main' into td3_benchmark

1adbff5

naming fix

2422ef8

single step td3

0e67de2

small fixes

1fc0847

fix

7a02b83

add update counter

243d712

naming fixes

af31bd9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 14, 2023

This was referenced Sep 14, 2023

[Algorithm] Update SAC Example #1524

Merged

[Algorithm] Update DDPG Example #1525

Merged

vmoens added the new algo New algorithm request or PR label Sep 14, 2023

vmoens changed the title ~~Update TD3 Example~~ [Algorithm] Update TD3 Example Sep 14, 2023

vmoens approved these changes Sep 14, 2023

View reviewed changes

examples/td3/config.yaml Outdated Show resolved Hide resolved

examples/td3/config.yaml Outdated Show resolved Hide resolved

examples/td3/td3.py Outdated Show resolved Hide resolved

examples/td3/td3.py Show resolved Hide resolved

update logging and small fixes

1122808

BY571 marked this pull request as ready for review September 15, 2023 07:39

BY571 added 2 commits September 18, 2023 10:37

no eps

b4df32b

update tests

13f367a

vmoens reviewed Sep 20, 2023

View reviewed changes

examples/td3/utils.py Outdated Show resolved Hide resolved

torchrl/objectives/td3.py Outdated Show resolved Hide resolved

BY571 added 2 commits September 20, 2023 18:42

update objective

72ddf7e

set gym backend

c830891

BY571 mentioned this pull request Sep 21, 2023

[Algorithm] Update DT #1560

Merged

9 tasks

vmoens and others added 2 commits September 21, 2023 08:42

Merge branch 'main' into td3_benchmark

1a2f08e

update tests

4cdbb3b

BY571 commented Sep 21, 2023

View reviewed changes

update fix max episode steps

76dcdeb

BY571 and others added 15 commits September 26, 2023 08:43

Merge branch 'main' into td3_benchmark

68d4c26

fix

ec8b089

fix

bcc3bc6

amend

42748e0

Merge remote-tracking branch 'BY571/td3_benchmark' into td3_benchmark

0052cd9

# Conflicts: # examples/td3/utils.py

amend

e2c28c8

update scratch_dir, frame skip, config

bb496ef

Merge branch 'main' into td3_benchmark

9b4704b

merge main

e622bf7

merge main

57bc54a

step counter

29977df

merge main

854e2a2

small fixes

619f2ea

solve logger issue

8d36787

reset notensordict test

a24ab8d

vmoens approved these changes Oct 3, 2023

View reviewed changes

vmoens merged commit df03cac into pytorch:main Oct 3, 2023
56 of 59 checks passed

vmoens pushed a commit to hyerra/rl that referenced this pull request Oct 10, 2023

[Algorithm] Update TD3 Example (pytorch#1523)

b750097

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Algorithm] Update TD3 Example #1523

[Algorithm] Update TD3 Example #1523

BY571 commented Sep 14, 2023

vmoens left a comment

BY571 Sep 21, 2023

vmoens Oct 3, 2023

vmoens left a comment

[Algorithm] Update TD3 Example #1523

[Algorithm] Update TD3 Example #1523

Conversation

BY571 commented Sep 14, 2023

Description

Motivation and Context

Types of changes

Checklist

vmoens left a comment

Choose a reason for hiding this comment

BY571 Sep 21, 2023

Choose a reason for hiding this comment

vmoens Oct 3, 2023

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment