Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Algorithm] CrossQ #2033

Merged
merged 49 commits into from
Jul 10, 2024
Merged
Changes from 6 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
0a23ae8
add crossQ examples
BY571 Mar 20, 2024
9bdee71
add loss
BY571 Mar 20, 2024
570a20e
Update naming experiment
BY571 Mar 21, 2024
5086249
update
BY571 Mar 21, 2024
c3a927f
update add tests
BY571 Mar 21, 2024
d1c9c34
detach
BY571 Mar 21, 2024
e879b7c
update tests
BY571 Mar 21, 2024
75255e7
update run_test.sh
BY571 Mar 21, 2024
a7b79c3
move crossq to sota-implementations
BY571 Mar 21, 2024
be84f3f
update loss
BY571 Mar 26, 2024
2170ad8
update cat prediction
BY571 Mar 26, 2024
75d4cee
Merge branch 'main' into crossQ
vmoens Jun 12, 2024
7711a4e
Merge branch 'main' into crossQ
BY571 Jun 26, 2024
f0ac167
add batchrenorm to crossq
BY571 Jun 26, 2024
37abb14
Merge branch 'crossQ' of github.com:BY571/rl into crossQ
BY571 Jun 26, 2024
bc7675a
small fixes
BY571 Jun 26, 2024
9543f2e
update docs and sota checks
BY571 Jun 26, 2024
53e35f7
hyperparam fix
BY571 Jun 26, 2024
172e1c0
test
BY571 Jun 27, 2024
fdb7e8b
update batch norm tests
BY571 Jun 27, 2024
5501d43
tests
BY571 Jul 3, 2024
c47ac84
cleanup
BY571 Jul 5, 2024
e718c3f
Merge branch 'main' into crossQ
BY571 Jul 5, 2024
f94165e
update
BY571 Jul 7, 2024
02c94ff
update lr param
BY571 Jul 8, 2024
93b6a7b
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
BY571 Jul 8, 2024
4b914e6
Apply suggestions from code review
vmoens Jul 8, 2024
af8c64a
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 8, 2024
845c8a9
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
vmoens Jul 8, 2024
7b4a69d
set qnet eval in actor loss
BY571 Jul 8, 2024
77de044
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
BY571 Jul 8, 2024
35c7a98
take off comment
BY571 Jul 8, 2024
68a1a9f
amend
vmoens Jul 8, 2024
c04eb3b
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
vmoens Jul 8, 2024
12672ee
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 8, 2024
7fbb27d
amend
vmoens Jul 8, 2024
ff80481
amend
vmoens Jul 8, 2024
caf702e
amend
vmoens Jul 8, 2024
70e2882
amend
vmoens Jul 8, 2024
ccd1b7f
amend
vmoens Jul 8, 2024
d3c8b0e
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 9, 2024
d3e0bb1
Apply suggestions from code review
vmoens Jul 9, 2024
349cb28
amend
vmoens Jul 9, 2024
75a43e7
amend
vmoens Jul 9, 2024
abada6c
fix device error
BY571 Jul 9, 2024
c878b81
Update objective delay actor
BY571 Jul 9, 2024
f222b11
Update tests not expecting target update
BY571 Jul 9, 2024
067b560
update example utils
BY571 Jul 9, 2024
c010e39
amend
vmoens Jul 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 3 additions & 11 deletions torchrl/objectives/crossq.py
Original file line number Diff line number Diff line change
Expand Up @@ -521,12 +521,15 @@ def actor_loss(
log_prob = dist.log_prob(a_reparm)

td_q = tensordict.select(*self.qvalue_network.in_keys, strict=False)
self.qvalue_network.eval()
td_q.set(self.tensor_keys.action, a_reparm)
td_q = self._vmap_qnetworkN0(
td_q,
self._cached_detached_qvalue_params,
)

min_q = td_q.get(self.tensor_keys.state_action_value).min(0)[0].squeeze(-1)
self.qvalue_network.train()

if log_prob.shape != min_q.shape:
raise RuntimeError(
Expand All @@ -550,17 +553,6 @@ def qvalue_loss(
next_tensordict.set(self.tensor_keys.action, next_action)
next_sample_log_prob = next_dist.log_prob(next_action)

# TODO: separate forward pass seems faster than the combined.
# next_state_action_value = self._vmap_qnetworkN0(
# next_tensordict.select(*self.qvalue_network.in_keys, strict=False),
# self.qvalue_network_params,
# ).get(self.tensor_keys.state_action_value)

# current_state_action_value = self._vmap_qnetworkN0(
# tensordict.select(*self.qvalue_network.in_keys, strict=False),
# self.qvalue_network_params,
# ).get(self.tensor_keys.state_action_value)

combined = torch.cat(
vmoens marked this conversation as resolved.
Show resolved Hide resolved
[
tensordict.select(*self.qvalue_network.in_keys, strict=False),
Expand Down
Loading