[Feature] Implicit Q-Learning (IQL) #933

BY571 · 2023-02-22T09:49:15Z

Description

Adding the Implicit Q-Learning (IQL) objective and an online rl example.

Motivation and Context

Adds the first Offline RL Algorithm to TorchRL. However, currently only an online learning example. Converged for the Pendulum-v1 gym environment:

With the merge of the offline datasets #928 an extra offline example will be added.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

Can you add the class to the docs?
We should also add the example to the tests (there is one workflow dedicated to these)
Otherwise LGTM, just a few minor comments

torchrl/objectives/iql.py

examples/iql/iql_online.py

vmoens · 2023-02-27T12:40:22Z

The examples are failing
https://app.circleci.com/pipelines/github/pytorch/rl/6511/workflows/3deb03f3-aa94-400c-8756-895bfa050557/jobs/200111
Can you check why?

BY571 · 2023-02-28T08:16:22Z

The examples are failing https://app.circleci.com/pipelines/github/pytorch/rl/6511/workflows/3deb03f3-aa94-400c-8756-895bfa050557/jobs/200111 Can you check why?

It seems like the test can't handle the situation when wandb asks you to select wandb setup options:

I changed now the logging to tensorboard in the tests. This should fix it for now but not sure if it's a permanent solution.

vmoens · 2023-02-28T10:42:23Z

The examples are failing https://app.circleci.com/pipelines/github/pytorch/rl/6511/workflows/3deb03f3-aa94-400c-8756-895bfa050557/jobs/200111 Can you check why?

It seems like the test can't handle the situation when wandb asks you to select wandb setup options:

I changed now the logging to tensorboard in the tests. This should fix it for now but not sure if it's a permanent solution.

Can't you put it in local mode? I thing this can be done via an env variable.

Also there's an error remaining in TD3

TypeError: distribution keywords and tensordict keys indicated by ProbabilisticTensorDictModule.in_keys must match.Got this error message: 
    __init__() got an unexpected keyword argument 'tanh_loc'

BY571 · 2023-03-01T09:31:45Z

fixed the td3 issues and added the mode parameter so that for the tests wandb can be run in offline mode.

vmoens · 2023-03-09T16:41:50Z

Sorry for dropping the ball
Can you merge main in this branch and let me know if I can help in any way?

vmoens

LGTM

BY571 added 5 commits February 21, 2023 19:44

update iql loss

f2ce2ba

update delay value

d5c7888

update iql objective

b6215ee

update objective and test

484037d

Merge branch 'main' into iql

7fce95f

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 22, 2023

BY571 marked this pull request as ready for review February 22, 2023 13:53

vmoens approved these changes Feb 24, 2023

View reviewed changes

BY571 added 6 commits February 27, 2023 09:51

update description iql loss

9f6e5a1

Merge branch 'main' into iql

1f0cdcb

fix actor net creation with NormalParamExtractor

5378081

make expectile value diff static method of iql objective class

107e272

add iql loss to docs

2a0ff9e

add iql and td3 example script to test

bdef10d

change test logger to tensorboard

0170d7e

BY571 added 3 commits March 1, 2023 10:20

add wandb logging mode

b42f3c4

update iql with wandb logging mode

6cb7445

add wandb offline mode to runtest

8dfb655

BY571 and others added 4 commits March 10, 2023 07:46

Merge branch 'pytorch:main' into iql

5a89c68

update iql loss and iql test

1514e35

Merge branch 'main' into iql

ad517d1

fix n_steps naming

6dd7183

vmoens added the new algo New algorithm request or PR label Mar 13, 2023

vmoens approved these changes Mar 14, 2023

View reviewed changes

vmoens merged commit 878d023 into pytorch:main Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implicit Q-Learning (IQL) #933

[Feature] Implicit Q-Learning (IQL) #933

BY571 commented Feb 22, 2023 •

edited

Loading

vmoens left a comment •

edited

Loading

vmoens commented Feb 27, 2023

BY571 commented Feb 28, 2023

vmoens commented Feb 28, 2023

BY571 commented Mar 1, 2023

vmoens commented Mar 9, 2023

vmoens left a comment

[Feature] Implicit Q-Learning (IQL) #933

[Feature] Implicit Q-Learning (IQL) #933

Conversation

BY571 commented Feb 22, 2023 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

vmoens left a comment • edited Loading

Choose a reason for hiding this comment

vmoens commented Feb 27, 2023

BY571 commented Feb 28, 2023

vmoens commented Feb 28, 2023

BY571 commented Mar 1, 2023

vmoens commented Mar 9, 2023

vmoens left a comment

Choose a reason for hiding this comment

BY571 commented Feb 22, 2023 •

edited

Loading

vmoens left a comment •

edited

Loading