-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PPO open env #98
Fix PPO open env #98
Conversation
@@ -119,6 +119,8 @@ def make_args(): | |||
recorder_rm = recorder | |||
|
|||
recorder_rm.load_state_dict(create_env_fn.state_dict()["worker0"]) | |||
create_env_fn.close() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I follow but shouldnt we close the env after training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really
here what happens is that we might be normalizing observations on the fly using VecNorm
VecNorm
uses shared memory to place the normalization stats (location and scale) such that they're available for every process.
If you have a recording environment, you must sync it with the others (i.e. give it access to the shared tensors). This is what load_state_dict
does.
Now our create_env_fn
is actually a ParallelEnv
instance, and when we query its state dict, it must (1) run every process (which is not done upon creation of the object but only when needed, for instance when calling reset
or state_dict
) and (2) read the constant from one process.
Thing is that once we have queries state_dict
we have also run the processes. Then they should be closed.
As a side note, I mentioned that create_env_fn
is a parallel env. If you look at its __init__
, it stores a (list of) picklable environment creation functions and stays idle until it is queried. When we create a data collector, we pass it a list of functions/objects to run on each subprocess. If it's a function, then each subprocess will create an environement from that function and use it to collect data. If it's an object (e.g. ParallelEnv) then each process will play with it to collect data. The ParallelEnv can be sent to the workers as it does not contain an instantiated env but only a pickable function that creates envs.
I made this very ugly sketch to show how creating envs in the collector works in the single and multi process case, hope you'll like it :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response! Appreciate it :)
closes the env as per #97
Long-term fix should be to close all collectors and envs upon deletion, but we should be careful with that.