Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Algorithm] CrossQ #2033

Merged
merged 49 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
0a23ae8
add crossQ examples
BY571 Mar 20, 2024
9bdee71
add loss
BY571 Mar 20, 2024
570a20e
Update naming experiment
BY571 Mar 21, 2024
5086249
update
BY571 Mar 21, 2024
c3a927f
update add tests
BY571 Mar 21, 2024
d1c9c34
detach
BY571 Mar 21, 2024
e879b7c
update tests
BY571 Mar 21, 2024
75255e7
update run_test.sh
BY571 Mar 21, 2024
a7b79c3
move crossq to sota-implementations
BY571 Mar 21, 2024
be84f3f
update loss
BY571 Mar 26, 2024
2170ad8
update cat prediction
BY571 Mar 26, 2024
75d4cee
Merge branch 'main' into crossQ
vmoens Jun 12, 2024
7711a4e
Merge branch 'main' into crossQ
BY571 Jun 26, 2024
f0ac167
add batchrenorm to crossq
BY571 Jun 26, 2024
37abb14
Merge branch 'crossQ' of github.com:BY571/rl into crossQ
BY571 Jun 26, 2024
bc7675a
small fixes
BY571 Jun 26, 2024
9543f2e
update docs and sota checks
BY571 Jun 26, 2024
53e35f7
hyperparam fix
BY571 Jun 26, 2024
172e1c0
test
BY571 Jun 27, 2024
fdb7e8b
update batch norm tests
BY571 Jun 27, 2024
5501d43
tests
BY571 Jul 3, 2024
c47ac84
cleanup
BY571 Jul 5, 2024
e718c3f
Merge branch 'main' into crossQ
BY571 Jul 5, 2024
f94165e
update
BY571 Jul 7, 2024
02c94ff
update lr param
BY571 Jul 8, 2024
93b6a7b
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
BY571 Jul 8, 2024
4b914e6
Apply suggestions from code review
vmoens Jul 8, 2024
af8c64a
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 8, 2024
845c8a9
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
vmoens Jul 8, 2024
7b4a69d
set qnet eval in actor loss
BY571 Jul 8, 2024
77de044
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
BY571 Jul 8, 2024
35c7a98
take off comment
BY571 Jul 8, 2024
68a1a9f
amend
vmoens Jul 8, 2024
c04eb3b
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
vmoens Jul 8, 2024
12672ee
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 8, 2024
7fbb27d
amend
vmoens Jul 8, 2024
ff80481
amend
vmoens Jul 8, 2024
caf702e
amend
vmoens Jul 8, 2024
70e2882
amend
vmoens Jul 8, 2024
ccd1b7f
amend
vmoens Jul 8, 2024
d3c8b0e
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 9, 2024
d3e0bb1
Apply suggestions from code review
vmoens Jul 9, 2024
349cb28
amend
vmoens Jul 9, 2024
75a43e7
amend
vmoens Jul 9, 2024
abada6c
fix device error
BY571 Jul 9, 2024
c878b81
Update objective delay actor
BY571 Jul 9, 2024
f222b11
Update tests not expecting target update
BY571 Jul 9, 2024
067b560
update example utils
BY571 Jul 9, 2024
c010e39
amend
vmoens Jul 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review
  • Loading branch information
vmoens authored Jul 9, 2024
commit d3e0bb1a2e9a4f1c693225612911bb3911739052
3 changes: 0 additions & 3 deletions test/test_cost.py
Original file line number Diff line number Diff line change
Expand Up @@ -4217,9 +4217,6 @@ def test_discrete_sac_reduction(self, reduction):
assert loss[key].shape == torch.Size([])


@pytest.mark.skipif(
not _has_functorch, reason=f"functorch not installed: {FUNCTORCH_ERR}"
)
class TestCrossQ(LossModuleTestBase):
seed = 0

Expand Down
49 changes: 46 additions & 3 deletions torchrl/objectives/crossq.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,15 @@ class CrossQLoss(LossModule):
Presented in "CROSSQ: BATCH NORMALIZATION IN DEEP REINFORCEMENT LEARNING
FOR GREATER SAMPLE EFFICIENCY AND SIMPLICITY" https://openreview.net/pdf?id=PczQtTsTIX

This class has three loss functions that will be called sequentially by the `forward` method:
:meth:`~.qvalue_loss`, :meth:`~.actor_loss` and :meth:`~.alpha_loss`. Alternatively, they can
be called by the user that order.

Args:
vmoens marked this conversation as resolved.
Show resolved Hide resolved
actor_network (ProbabilisticActor): stochastic actor
qvalue_network (TensorDictModule): Q(s, a) parametric model.
This module typically outputs a ``"state_action_value"`` entry.

Keyword Args:
vmoens marked this conversation as resolved.
Show resolved Hide resolved
num_qvalue_nets (integer, optional): number of Q-Value networks used.
BY571 marked this conversation as resolved.
Show resolved Hide resolved
Defaults to ``2``.
Expand Down Expand Up @@ -331,6 +336,10 @@ def __init__(

@property
def target_entropy_buffer(self):
vmoens marked this conversation as resolved.
Show resolved Hide resolved
"""The target entropy.

This value can be controlled via the `target_entropy` kwarg in the constructor.
"""
return self.target_entropy

@property
Expand Down Expand Up @@ -467,6 +476,13 @@ def out_keys(self, values):

@dispatch
def forward(self, tensordict: TensorDictBase) -> TensorDictBase:
vmoens marked this conversation as resolved.
Show resolved Hide resolved
"""The forward method.

Computes successively the :meth:`~.qvalue_loss`, :meth:`~.actor_loss` and :meth:`~.alpha_loss`, and returns
a tensordict with these values along with the `"alpha"` value and the `"entropy"` value (detached).
To see what keys are expected in the input tensordict and what keys are expected as output, check the
class's `"in_keys"` and `"out_keys"` attributes.
"""
shape = None
if tensordict.ndimension() > 1:
shape = tensordict.shape
Expand Down Expand Up @@ -511,7 +527,17 @@ def _cached_detached_qvalue_params(self):
def actor_loss(
vmoens marked this conversation as resolved.
Show resolved Hide resolved
self, tensordict: TensorDictBase
) -> Tuple[Tensor, Dict[str, Tensor]]:
"""Compute the actor loss."""
"""Compute the actor loss.


The actor loss should be computed after the :meth:`~.qvalue_loss` and before the `~.alpha_loss` which requires the `log_prob` field of the `metadata` returned by this method.

Args:
tensordict (TensorDictBase): the input data for the loss. Check the class's `in_keys` to see what fields
are required for this to be computed.

Returns: a differentiable tensor with the alpha loss along with a metadata dictionary containing the detached `"log_prob"` of the sampled action.
"""
with set_exploration_type(
ExplorationType.RANDOM
), self.actor_network_params.to_module(self.actor_network):
Expand Down Expand Up @@ -540,7 +566,16 @@ def actor_loss(
def qvalue_loss(
vmoens marked this conversation as resolved.
Show resolved Hide resolved
self, tensordict: TensorDictBase
) -> Tuple[Tensor, Dict[str, Tensor]]:
"""Compute the CrossQ-value loss."""
"""Compute the q-value loss.

The q-value loss should be computed before the :meth:`~.actor_loss`.

Args:
tensordict (TensorDictBase): the input data for the loss. Check the class's `in_keys` to see what fields
are required for this to be computed.

Returns: a differentiable tensor with the qvalue loss along with a metadata dictionary containing the detached `"td_error"` to be used for prioritized sampling.
"""
# # compute next action
with torch.no_grad():
with set_exploration_type(
Expand Down Expand Up @@ -594,7 +629,15 @@ def qvalue_loss(
return loss_qval, metadata

def alpha_loss(self, log_prob: Tensor) -> Tensor:
vmoens marked this conversation as resolved.
Show resolved Hide resolved
"""Compute the entropy loss."""
"""Compute the entropy loss.

The entropy loss should be computed last.

Args:
log_prob: a log-probability as computed by the :meth:`~.actor_loss` and returned in the `metadata`.

Returns: a differentiable tensor with the entropy loss.
"""
if self.target_entropy is not None:
# we can compute this loss even if log_alpha is not a parameter
alpha_loss = -self.log_alpha * (log_prob + self.target_entropy)
Expand Down
Loading