Fix PPO open env #98

vmoens · 2022-04-26T10:54:23Z

closes the env as per #97
Long-term fix should be to close all collectors and envs upon deletion, but we should be careful with that.

shagunsodhani · 2022-04-26T11:21:33Z

examples/ppo/ppo.py

@@ -119,6 +119,8 @@ def make_args():
        recorder_rm = recorder

    recorder_rm.load_state_dict(create_env_fn.state_dict()["worker0"])
+    create_env_fn.close()
+


Not sure if I follow but shouldnt we close the env after training?

not really
here what happens is that we might be normalizing observations on the fly using VecNorm
VecNorm uses shared memory to place the normalization stats (location and scale) such that they're available for every process.
If you have a recording environment, you must sync it with the others (i.e. give it access to the shared tensors). This is what load_state_dict does.
Now our create_env_fn is actually a ParallelEnv instance, and when we query its state dict, it must (1) run every process (which is not done upon creation of the object but only when needed, for instance when calling reset or state_dict) and (2) read the constant from one process.
Thing is that once we have queries state_dict we have also run the processes. Then they should be closed.

As a side note, I mentioned that create_env_fn is a parallel env. If you look at its __init__, it stores a (list of) picklable environment creation functions and stays idle until it is queried. When we create a data collector, we pass it a list of functions/objects to run on each subprocess. If it's a function, then each subprocess will create an environement from that function and use it to collect data. If it's an object (e.g. ParallelEnv) then each process will play with it to collect data. The ParallelEnv can be sent to the workers as it does not contain an instantiated env but only a pickable function that creates envs.
I made this very ugly sketch to show how creating envs in the collector works in the single and multi process case, hope you'll like it :p

Thanks for the detailed response! Appreciate it :)

Fix PPO open env

7004db2

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 26, 2022

vmoens requested a review from shagunsodhani April 26, 2022 10:56

If we do this, let's do it properly

531c270

shagunsodhani reviewed Apr 26, 2022

View reviewed changes

vmoens added the bug Something isn't working label Apr 26, 2022

vmoens merged commit 95f94d9 into pytorch:main Apr 26, 2022

vmoens mentioned this pull request Apr 26, 2022

Batched environment is not closed #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PPO open env #98

Fix PPO open env #98

vmoens commented Apr 26, 2022

shagunsodhani Apr 26, 2022

vmoens Apr 26, 2022

shagunsodhani Apr 26, 2022

Fix PPO open env #98

Fix PPO open env #98

Conversation

vmoens commented Apr 26, 2022

shagunsodhani Apr 26, 2022

Choose a reason for hiding this comment

vmoens Apr 26, 2022

Choose a reason for hiding this comment

shagunsodhani Apr 26, 2022

Choose a reason for hiding this comment