Skip to content

[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator #2397

Open
@roger-creus

Description

Motivation

Attempting to implement Parallel Q Networks (online DQN without replay buffer or target networks). Uses QLambda returns.

Solution

TDLambdaEstimator expects state_value keys but we would now need action_value keys

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions