[Feature Request] A way to specify the input of environment resets through the DataCollector #1906
Description
Motivation
I'm working on an RL task in a (continuous) domain, however, the initial state the environment assumes on a reset comes
from a curated dataset, since we have prior knowledge of how the state of the environment typically looks in practice.
The environment should ideally not contain the entire dataset but only work with a single example (since that's all it "needs" to know to simulate the agent's actions and their effect). However, I would also like to make use of DataCollectors for training and validation.
Solution
Add an optional parameter reset_env_kwargs
or similar to DataCollectors that allows to specify the arguments that are used when the collector calls the reset function of the environment.
This way one can specify the input of the reset (outside of the environment code) and hence does not have to move the entire dataset into the environment to be able to reset to specific environment states.
Alternatives
It is possible that I missed another (easier) way of doing this by registering some sort of hook? In that case, I'd appreciate a pointer or small example of how this could be implemented.
Additional context
In the case that you find this addition useful, I'd be happy to contribute.
Checklist
- I have checked that there is no similar issue in the repo (required)