Open
Description
Motivation
In many scenarios we need to perform a step only on a subset of batched envs. This includes collecting a complete trajectory for many envs when they end asynchronously, or partial frame skip and such.
Solution
Serial and parallel envs could read an index key that would indicate which env is to be reset/stepped over.
We need to decide if this key will be a bool or long tensor, which name it'll have, whether it'll be private or not.
For sure users will need to be able to mask the data so we'll need to provide a mask indicating what data is valid.
Alternatives
Eventually we could also index batched envs directly but for now this is a long stretch.
Cc @albertbou92