Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Algorithm] Update TD3 Example #1523
[Algorithm] Update TD3 Example #1523
Changes from 4 commits
6339a07
9e890b3
117c477
d2b3ad4
9c6c358
1adbff5
2422ef8
0e67de2
1fc0847
7a02b83
243d712
af31bd9
1122808
b4df32b
13f367a
72ddf7e
c830891
1a2f08e
4cdbb3b
76dcdeb
68d4c26
ec8b089
bcc3bc6
42748e0
0052cd9
e2c28c8
bb496ef
9b4704b
e622bf7
57bc54a
29977df
854e2a2
619f2ea
8d36787
a24ab8d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous as keys in the tensordict get ordered by name but output tuple loss_val doesn't. So for now im just checking if all values in the loss_val tuple are also in the loss_val_td.
loss_vals will be in that order
(loss_actor, loss_qvalue, state_action_value_actor, next_state_value, pred_value, target_value)
However, as the items are getting ordered in the TD by the keys the output tensordict has actually this order:
(loss_actor, loss_qvalue, next_state_value, pred_value, state_action_value_actor, target_value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dispatch returns the keys in the order of out_keys.
So this is predictable, we can just do
does that solve the problem?