[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator

## Motivation

Attempting to implement [Parallel Q Networks](https://www.researchgate.net/publication/382080747_Simplifying_Deep_Temporal_Difference_Learning) (online DQN without replay buffer or target networks). Uses QLambda returns.

## Solution

TDLambdaEstimator expects `state_value` keys but we would now need `action_value` keys

## Checklist

- [x] I have checked that there is no similar issue in the repo (**required**)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator #2397

Motivation

Solution

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development