[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator #2397
Open
Description
Motivation
Attempting to implement Parallel Q Networks (online DQN without replay buffer or target networks). Uses QLambda returns.
Solution
TDLambdaEstimator expects state_value
keys but we would now need action_value
keys
Checklist
- I have checked that there is no similar issue in the repo (required)