[BUG] Calculation of GAE fails with recurrent critic #2372

thomasbbrunner · 2024-08-06T14:19:54Z

Describe the bug

The calculation of the GAE with a recurrent critic fails with the error:

RuntimeError: vmap: It looks like you're attempting to use a Tensor in some data-dependent control flow. We don't support that yet, please shout over at https://github.com/pytorch/functorch/issues/257 .

It seems that setting the flag shifted to True prevents this error.

Is this behavior expected? If so, should we maybe document that the shifted flag is necessary for recurrent critics?

To Reproduce

Minimal snippet to reproduce the issue:

import torch
from tensordict.nn import TensorDictModule, TensorDictSequential
from torch import nn
from torchrl.collectors import SyncDataCollector
from torchrl.data import UnboundedContinuousTensorSpec
from torchrl.envs import GymEnv, TransformedEnv, transforms
from torchrl.envs.utils import check_env_specs
from torchrl.modules import LSTMModule
from torchrl.objectives.value import GAE

env = GymEnv(env_name="HalfCheetah-v4", device="cpu")
env = TransformedEnv(env)
env.append_transform(transforms.DoubleToFloat(in_keys=["observation"]))
env.append_transform(transforms.InitTracker())
env.append_transform(
    transforms.TensorDictPrimer(
        {
            "recurrent_state_h": UnboundedContinuousTensorSpec(shape=(1, 128)),
            "recurrent_state_c": UnboundedContinuousTensorSpec(shape=(1, 128)),
        }
    )
)
check_env_specs(env)

observation_size = env.observation_spec["observation"].shape[-1]
action_size = env.action_spec.shape[-1]

rnn = LSTMModule(
    input_size=observation_size,
    hidden_size=128,
    num_layers=1,
    device="cpu",
    in_key="observation",
    out_key="features",
)

value_net = TensorDictModule(
    module=nn.Sequential(
        nn.Linear(128, 128),
        nn.ReLU(),
        nn.Linear(128, 1),
    ),
    in_keys=["features"],
    out_keys=["state_value"],
)
critic_module = TensorDictSequential(rnn, value_net)

collector = SyncDataCollector(
    env,
    None,
    frames_per_batch=512,
    device="cpu",
)

batch = collector.next()

# With shifted=True calculation of advantages works
advantage_module = GAE(gamma=0.99, lmbda=0.95, value_network=critic_module, shifted=True)
with torch.no_grad():
    advantage_module(batch)

# With shifted=False calculation of advantage fails!
advantage_module = GAE(gamma=0.99, lmbda=0.95, value_network=critic_module, shifted=False)
with torch.no_grad():
    # NOTE: Should raise
    # RuntimeError: vmap: It looks like you're attempting to use a Tensor in some data-dependent control flow (...)
    advantage_module(batch)

System info

> pip list | grep torch
torch                          2.4.0
torchrl                        0.5.0

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

The text was updated successfully, but these errors were encountered:

vmoens · 2024-08-07T12:18:48Z

#2376 should fix it.
You'll still need to add python_based=True in your LSTMModule

thomasbbrunner added the bug Something isn't working label Aug 6, 2024

thomasbbrunner assigned vmoens Aug 6, 2024

vmoens linked a pull request Aug 7, 2024 that will close this issue

[BugFix] Fix LSTM in GAE with vmap #2376

Merged

vmoens closed this as completed in #2376 Aug 7, 2024

levelrin mentioned this issue Sep 19, 2024

GAE does not support LSTM-based value network. #2444

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Calculation of GAE fails with recurrent critic #2372

[BUG] Calculation of GAE fails with recurrent critic #2372

thomasbbrunner commented Aug 6, 2024

vmoens commented Aug 7, 2024

[BUG] Calculation of GAE fails with recurrent critic #2372

[BUG] Calculation of GAE fails with recurrent critic #2372

Comments

thomasbbrunner commented Aug 6, 2024

Describe the bug

To Reproduce

System info

vmoens commented Aug 7, 2024