Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PPO open env #98

Merged
merged 2 commits into from
Apr 26, 2022
Merged

Fix PPO open env #98

merged 2 commits into from
Apr 26, 2022

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 26, 2022

closes the env as per #97
Long-term fix should be to close all collectors and envs upon deletion, but we should be careful with that.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 26, 2022
@vmoens vmoens requested a review from shagunsodhani April 26, 2022 10:56
@@ -119,6 +119,8 @@ def make_args():
recorder_rm = recorder

recorder_rm.load_state_dict(create_env_fn.state_dict()["worker0"])
create_env_fn.close()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I follow but shouldnt we close the env after training?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really
here what happens is that we might be normalizing observations on the fly using VecNorm
VecNorm uses shared memory to place the normalization stats (location and scale) such that they're available for every process.
If you have a recording environment, you must sync it with the others (i.e. give it access to the shared tensors). This is what load_state_dict does.
Now our create_env_fn is actually a ParallelEnv instance, and when we query its state dict, it must (1) run every process (which is not done upon creation of the object but only when needed, for instance when calling reset or state_dict) and (2) read the constant from one process.
Thing is that once we have queries state_dict we have also run the processes. Then they should be closed.

As a side note, I mentioned that create_env_fn is a parallel env. If you look at its __init__, it stores a (list of) picklable environment creation functions and stays idle until it is queried. When we create a data collector, we pass it a list of functions/objects to run on each subprocess. If it's a function, then each subprocess will create an environement from that function and use it to collect data. If it's an object (e.g. ParallelEnv) then each process will play with it to collect data. The ParallelEnv can be sent to the workers as it does not contain an instantiated env but only a pickable function that creates envs.
I made this very ugly sketch to show how creating envs in the collector works in the single and multi process case, hope you'll like it :p

env dispatch 001

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response! Appreciate it :)

@vmoens vmoens added the bug Something isn't working label Apr 26, 2022
@vmoens vmoens merged commit 95f94d9 into pytorch:main Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants