-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAN problem in PPO1 and PPO2 #634
Comments
Please take a look at the documentation about nans. Other than that I suggest debugging the observations and actions going between agent and environment to see when/if you get non-numeric values like infs and nans. |
Maybe related: #340 (try setting the entropy coeff to zero) |
I also got this problem with PPO2. I note in issues #340 the entropy coefficient was to blame. More concretely, the OP suggested that it was too high at a value of 0.5 and @araffin suggested that this parameter is usually 0.01. However, I am using default values, or values that are found to work in PPO2 in RL Zoo. I have also tried A2C and get a different but similar error. Having followed the read-me and guides, I hoping that this is the correct approach to instantiating the environment:
As you can see, I am using a custom environment, so the problem could lie there. However, it is a very simple environment with action space Discrete(2) and the observation space is MultiDiscrete of length 15 w/ bound values between 0 and 14 across the dimensions. Any pointers to start debugging would be great. |
it seems your are using the same env 256 times... you should pass the env id. |
try more smaller learning_rate |
Thanks for the suggestion. I will try this. However, randomly trying different Hyper Parameters may not be the most optimal way to debug. I'm after a more technical approach in terms of following the issues within the code. |
For anyone landing here with this problem, it may or may not help, but I had this problem with nan in PPO2 training, that I resolved by reducing very large integers (~1e8) on observation, by a scaling factor. Took some time to find, since I checked for nan+inf when creating the observations. During testing, some returned obs were inf. I'm guessing these large ints caused an overflow somewhere in the network. (Very informative doc page: https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html) |
I'm trying to apply PPO1/PPO2 agents with my custom environment, however after some epoches, the policy_loss, policy_entropy and approxkl all become nan. If i use default layers(two hidden layers with sizes 64) as policy network, it just ok, but not ok for a bigger network( like two layers with sizes 256).
so is there any good idea or solution to this problem?
The text was updated successfully, but these errors were encountered: