Skip to content

Possible to run a full episode and collate results? For training on real-time hardware.  #1036

Open
@crobarcro

Description

@crobarcro

We would like to train perform training in contexts where the typical calling sequence of transferring data at every time step is problematic.

For example, we have a hardware-in-the loop system where we would ideally be able to run a full episode of training, collate the results and process them as a block. The reason this is desirable is because there are communication and synchronization issues which make transferring the data on every step problematic.

The same can be true though of other situation, where there simply isn't a good bridge between the training environment software and python that can easily work on every time step.

Therefore my question is, is there any capability to achieve this within stable baselines? If not, how difficult would it be to modify stable baselines to work this way? As we understand it some of the algorithms effectively operate in this way already, i.e. learning is based on the actions and rewards gathered from a full episode.

This is a question, but I can't add the question tag.

@pstansell

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    custom gym envIssue related to Custom Gym EnvquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions