[BUG] `target_value_network_params` initialization bug in `convert_to_functional()`

## Bug Description

1. During the initialization of `LossModule` and execution of `convert_to_functional()`, the first layer parameters are set as `UninitializedParameter(shape=torch.Size(-1))`. Consequently, when `target_value_network_params` is cloned from this, it becomes `Parameter(torch.Size(0))`, leading to the error: `RuntimeError: mat2 must be a matrix, got 1-D tensor`.
   
2. In the `DiscreteCQLLoss` class, when calling `value_estimate`, the `params` argument should reference `target_params` instead, as indicated in [this line of code](https://github.com/pytorch/rl/blob/a70b258cd0c6b5109460750d6ad7159bebd67ee4/torchrl/objectives/cql.py#L1191).

## To Reproduce

Steps to reproduce the behavior.

```python
import torch
from tensordict import TensorDict
from torchrl.data import OneHotDiscreteTensorSpec
from torchrl.modules import DistributionalQValueActor, MLP
from torchrl.objectives import DistributionalDQNLoss

nbins = 3
batch_size = 5
action_dim = 2
module = MLP(out_features=(nbins, action_dim), depth=2)
action_spec = OneHotDiscreteTensorSpec(action_dim)
qvalue_actor = DistributionalQValueActor(
    module=module,
    spec=action_spec,
    support=torch.arange(nbins),
)

loss_module = DistributionalDQNLoss(
    qvalue_actor,
    gamma=0.99,
    delay_value=True,
)
td = TensorDict(
    {
        "observation": torch.randn(batch_size, 4),
        "action": torch.nn.functional.one_hot(
            torch.randint(0, action_dim, (batch_size,)), action_dim
        ).float(),
        "next": {
            "observation": torch.randn(batch_size, 4),
            "reward": torch.randn(batch_size, 1),
            "done": torch.zeros(batch_size, 1, dtype=torch.bool),
        },
    },
    batch_size=[batch_size],
)
loss = loss_module(td)
print("Computed loss:", loss)
```

```bash
File ../../python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat2 must be a matrix, got 1-D tensor
```

## System info

python=3.10.15
torchrl=0.5.0
torch=2.4.1

```python
import torchrl, numpy, sys
print(torchrl.__version__, numpy.__version__, sys.version, sys.platform)

0.5.0 2.1.2 3.10.15 | packaged by conda-forge | (main, Oct 16 2024, 01:24:20) [Clang 17.0.6 ] darwin
```

## Possible fixes

Specify input features in `MLP` module. 
`module = MLP(in_features=4, out_features=(nbins, action_dim), depth=2)`
This prevents the module parameters from being `UninitializedParameter`

## Checklist

- [x] I have checked that there is no similar issue in the repo (**required**)
- [x] I have read the [documentation](https://github.com/pytorch/rl/tree/main/docs/) (**required**)
- [x] I have provided a minimal working example to reproduce the bug (**required**)

Thanks to @BY571 and @vmoens 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `target_value_network_params` initialization bug in `convert_to_functional()` #2523

Bug Description

To Reproduce

System info

Possible fixes

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] target_value_network_params initialization bug in convert_to_functional() #2523

Description