Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to run a full episode and collate results? For training on real-time hardware. #1036

Open
crobarcro opened this issue Nov 10, 2020 · 3 comments
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@crobarcro
Copy link

We would like to train perform training in contexts where the typical calling sequence of transferring data at every time step is problematic.

For example, we have a hardware-in-the loop system where we would ideally be able to run a full episode of training, collate the results and process them as a block. The reason this is desirable is because there are communication and synchronization issues which make transferring the data on every step problematic.

The same can be true though of other situation, where there simply isn't a good bridge between the training environment software and python that can easily work on every time step.

Therefore my question is, is there any capability to achieve this within stable baselines? If not, how difficult would it be to modify stable baselines to work this way? As we understand it some of the algorithms effectively operate in this way already, i.e. learning is based on the actions and rewards gathered from a full episode.

This is a question, but I can't add the question tag.

@pstansell

@araffin araffin added custom gym env Issue related to Custom Gym Env question Further information is requested labels Nov 10, 2020
@araffin
Copy link
Collaborator

araffin commented Nov 10, 2020

Therefore my question is, is there any capability to achieve this within stable baselines? If not, how difficult would it be to modify stable baselines to work this way? As we understand it some of the algorithms effectively operate in this way already, i.e. learning is based on the actions and rewards gathered from a full episode.

So, it is possible for off-policy algorithms when using Stable-Baselines3, this is the n_episodes_rollout parameter.
Otherwise, you need to create custom classes that derives from SB2.
You can find a concrete example here: https://github.com/araffin/learning-to-drive-in-5-minutes

@crobarcro
Copy link
Author

Thanks for this, the repo you linked to looks very helpful, and very close to what we want to do, so it could be a solution for us.

Is there any example of the n_episodes_rollout parameter in SB3? I searched and have found it in the documentation but it is a bit too terse for me to understand how to use it in practice.

@araffin
Copy link
Collaborator

araffin commented Nov 11, 2020

Is there any example of the n_episodes_rollout parameter in SB3? I searched and have found it in the documentation but it is a bit too terse for me to understand how to use it in practice.

https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/sac.yml#L19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants