[Algorithm] Discrete CQL #1666

BY571 · 2023-10-30T18:43:08Z

Description

Adds discrete (DQN) CQL objective and example

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

BY571 · 2023-11-02T16:24:22Z

Converges on Cartpole as expected. Just needs some cleanup + tests

vmoens

Great work! I left some high level comments, can you have a look?
Thanks for this

examples/cql/discrete_cql_online.py

test/test_cost.py

torchrl/objectives/cql.py

vmoens · 2023-11-03T16:57:41Z

torchrl/objectives/cql.py

+        logsumexp = torch.logsumexp(q_values, dim=-1, keepdim=True)
+        q_a = (q_values * current_action).sum(dim=-1, keepdim=True)
+
+        return (logsumexp - q_a).mean()


can we return metadata too, like we're hoping to do for all losses in the future?

vmoens · 2023-11-03T16:58:26Z

torchrl/objectives/cql.py

+        self._in_keys = values
+
+    @dispatch
+    def forward(self, tensordict: TensorDictBase) -> TensorDict:


this should just be a couple of lines with dqn_loss and cql_loss IMO

I actually tried to inherit from the DQN class and then do something like super.forward(tensordict) and only have the cql_loss calculation added but I got circular importing issues. Do you have any suggestions?

Oh I wasn't suggesting to inherit from DQN, it's ok if they're separated. But the forward should just be a composition of loss_actor and loss_critic like we did in other losses (eg, TD3), where each sub-loss returns a tensor and a dict of metadata.

ah, got it! Should be adapted accordingly now.

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

…ete_CQL

pytorch-bot · 2023-11-06T17:10:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1666

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 25 Unrelated Failures

As of commit 9941055 with merge base 4ab5b10 ():

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens

I don't understand it.
CQL loss is called in value loss, not in forward, why is that?
Why do we call item() on CQL loss value? Conventionally all losses in the output tensordict of a loss module should be differentials.
Can you give me some context?

BY571 · 2023-11-06T18:23:49Z

I don't understand it. CQL loss is called in value loss, not in forward, why is that? Why do we call item() on CQL loss value? Conventionally all losses in the output tensordict of a loss module should be differentials. Can you give me some context?

The CQL loss is more like an auxiliary term for the value loss not for a separate model like the actor. It just augments the value loss. We could separate it but then we would need to forward pass through the model again to obtain the current q values, which would slow down the process and I think there is no need to obtain only the cql loss as in itself it's incomplete.

vmoens · 2023-11-06T18:48:04Z

torchrl/objectives/cql.py

+        cql_loss = self.cql_loss(pred_val, action)
+
+        # calculate target value
+        with torch.no_grad():
+            target_value = self.value_estimator.value_estimate(
+                td_copy,
+                target_params=self._cached_detached_target_value_params,
+            ).squeeze(-1)
+
+        with torch.no_grad():
+            td_error = (pred_val_index - target_value).pow(2)
+            td_error = td_error.unsqueeze(-1)
+        if tensordict.device is not None:
+            td_error = td_error.to(tensordict.device)
+
+        tensordict.set(
+            self.tensor_keys.priority,
+            td_error,
+            inplace=True,
+        )
+        loss = distance_loss(pred_val_index, target_value, self.loss_function).mean()
+
+        metadata = {
+            "td_error": td_error.mean(0).detach(),
+            "loss_cql": cql_loss.item(),
+            "pred_value": pred_val.mean().detach(),
+            "target_value": target_value.mean().detach(),
+        }


where is the cql_loss used?

you are right, I must have deleted it. Sorry for the confusion, I just updated and fixed it :)

vmoens · 2023-11-06T22:59:33Z

What do you think of BY571#1? I think being able to run ablation studies has some value.

We need to fix the categorical case.

BY571 · 2023-11-07T08:04:22Z

What do you think of BY571#1? I think being able to run ablation studies has some value.

We need to fix the categorical case.

I think yes, if someone wants to check how the cql loss term influences the agent performance and want to have simple "on/off" capability it makes sense. The changes you did look good, I also pushed some adaption for the categorical case to calculate the cql loss.

vmoens · 2023-11-07T16:00:27Z

Cool LMK when you've merge the PR

torchrl/objectives/cql.py

Discrete CQL refactor

BY571 · 2023-11-08T09:40:06Z

Just merged and fixed the open issues. Let me know what you think.
Also, thank you for insisting on making the losses separate, I took advantage of it already and compared base DQN vs DQN+CQL loss :)

vmoens · 2023-11-08T13:01:32Z

That looks great!
There are still 19 broken tests in the new test class and the example isn't running either.

…ete_CQL

vmoens

Cool let's merge this!

init discrete cql objective

f71e2e7

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 30, 2023

add converging base version

49d98c2

BY571 added 4 commits November 3, 2023 10:04

fixes

ecc0454

update example tests

60cf8ec

cleanup add tests

97dd57f

update loss docstring

0800132

BY571 marked this pull request as ready for review November 3, 2023 15:37

BY571 changed the title ~~[WIP] Discrete CQL~~ [Algorithm] Discrete CQL Nov 3, 2023

fix warning

a0f7189

vmoens added the new algo New algorithm request or PR label Nov 3, 2023

vmoens reviewed Nov 3, 2023

View reviewed changes

BY571 and others added 6 commits November 6, 2023 09:56

Update examples/cql/discrete_cql_online.py

6a30a7d

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update test/test_cost.py

20bbd8a

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update test/test_cost.py

c5341c3

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/cql.py

e8847c4

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

objective fixes

f9427ef

Merge branch 'discrete_CQL' of https://github.com/BY571/rl into discr…

dcba00c

…ete_CQL

vmoens reviewed Nov 6, 2023

View reviewed changes

BY571 and others added 2 commits November 6, 2023 19:52

fix

368d2fb

init

b310858

update categorical cql loss case

8a7f75a

Merge remote-tracking branch 'origin/main' into discrete_CQL

2fd5812

vmoens reviewed Nov 7, 2023

View reviewed changes

torchrl/objectives/cql.py Outdated Show resolved Hide resolved

torchrl/objectives/cql.py Outdated Show resolved Hide resolved

BY571 and others added 3 commits November 8, 2023 09:55

Merge branch 'discrete_CQL' into discrete_CQL_refact

e113951

Merge pull request #1 from vmoens/discrete_CQL_refact

bfd7189

Discrete CQL refactor

fix loss sum

ab92393

BY571 and others added 6 commits November 8, 2023 16:54

example test fixes

9af9f99

fix categorical action in cql loss

53ef411

Merge remote-tracking branch 'origin/main' into discrete_CQL

dfaf89f

lint

6bcf240

Merge branch 'main' into discrete_CQL

1bf7d0d

Merge branch 'discrete_CQL' of https://github.com/BY571/rl into discr…

4885005

…ete_CQL

vmoens approved these changes Nov 10, 2023

View reviewed changes

vmoens added 2 commits November 9, 2023 20:13

add doc

054a8cb

Merge remote-tracking branch 'origin/main' into discrete_CQL

9941055

vmoens merged commit 44dd79f into pytorch:main Nov 10, 2023
1 check passed

spallaccini mentioned this pull request Dec 1, 2023

Import error when running recently added 'discrete_cql_online.py' from examples/cql #1725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Algorithm] Discrete CQL #1666

[Algorithm] Discrete CQL #1666

BY571 commented Oct 30, 2023

BY571 commented Nov 2, 2023

vmoens left a comment

vmoens Nov 3, 2023

vmoens Nov 3, 2023

BY571 Nov 6, 2023

vmoens Nov 6, 2023

BY571 Nov 6, 2023

pytorch-bot bot commented Nov 6, 2023 •

edited

Loading

vmoens left a comment

BY571 commented Nov 6, 2023

vmoens Nov 6, 2023

BY571 Nov 6, 2023

vmoens commented Nov 6, 2023

BY571 commented Nov 7, 2023

vmoens commented Nov 7, 2023

BY571 commented Nov 8, 2023

vmoens commented Nov 8, 2023

vmoens left a comment

[Algorithm] Discrete CQL #1666

[Algorithm] Discrete CQL #1666

Conversation

BY571 commented Oct 30, 2023

Description

Motivation and Context

Types of changes

Checklist

BY571 commented Nov 2, 2023

vmoens left a comment

Choose a reason for hiding this comment

vmoens Nov 3, 2023

Choose a reason for hiding this comment

vmoens Nov 3, 2023

Choose a reason for hiding this comment

BY571 Nov 6, 2023

Choose a reason for hiding this comment

vmoens Nov 6, 2023

Choose a reason for hiding this comment

BY571 Nov 6, 2023

Choose a reason for hiding this comment

pytorch-bot bot commented Nov 6, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1666

❌ 4 New Failures, 25 Unrelated Failures

vmoens left a comment

Choose a reason for hiding this comment

BY571 commented Nov 6, 2023

vmoens Nov 6, 2023

Choose a reason for hiding this comment

BY571 Nov 6, 2023

Choose a reason for hiding this comment

vmoens commented Nov 6, 2023

BY571 commented Nov 7, 2023

vmoens commented Nov 7, 2023

BY571 commented Nov 8, 2023

vmoens commented Nov 8, 2023

vmoens left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Nov 6, 2023 •

edited

Loading