NAN problem in PPO1 and PPO2 #634

MeiJuanLiu · 2019-12-28T12:18:20Z

I'm trying to apply PPO1/PPO2 agents with my custom environment, however after some epoches, the policy_loss, policy_entropy and approxkl all become nan. If i use default layers(two hidden layers with sizes 64) as policy network, it just ok, but not ok for a bigger network( like two layers with sizes 256).
so is there any good idea or solution to this problem?

Miffyli · 2019-12-28T17:00:49Z

Please take a look at the documentation about nans. Other than that I suggest debugging the observations and actions going between agent and environment to see when/if you get non-numeric values like infs and nans.

araffin · 2020-01-19T15:44:12Z

Maybe related: #340 (try setting the entropy coeff to zero)

jtromans · 2020-04-05T19:49:28Z

I also got this problem with PPO2. I note in issues #340 the entropy coefficient was to blame. More concretely, the OP suggested that it was too high at a value of 0.5 and @araffin suggested that this parameter is usually 0.01. However, I am using default values, or values that are found to work in PPO2 in RL Zoo.

I have also tried A2C and get a different but similar error.

Having followed the read-me and guides, I hoping that this is the correct approach to instantiating the environment:

env = gym.make('highcard-v0')
env = make_vec_env(lambda: env, n_envs=256)
env = VecCheckNan(env, raise_exception=True)
env = VecNormalize(env) # when training norm_reward = True

As you can see, I am using a custom environment, so the problem could lie there. However, it is a very simple environment with action space Discrete(2) and the observation space is MultiDiscrete of length 15 w/ bound values between 0 and 14 across the dimensions.

Any pointers to start debugging would be great.

araffin · 2020-04-05T20:12:00Z

it seems your are using the same env 256 times... you should pass the env id.

jtromans · 2020-04-05T21:56:35Z

Thanks for catching that - I've corrected that but I still get the same issue. What would be the appropriate approach for debugging this.

MeiJuanLiu · 2020-04-06T16:12:21Z

Thanks for catching that - I've corrected that but I still get the same issue. What would be the appropriate approach for debugging this.

try more smaller learning_rate

jtromans · 2020-04-08T16:26:26Z

Thanks for the suggestion. I will try this. However, randomly trying different Hyper Parameters may not be the most optimal way to debug. I'm after a more technical approach in terms of following the issues within the code.

ghost · 2020-08-18T11:24:42Z

For anyone landing here with this problem, it may or may not help, but I had this problem with nan in PPO2 training, that I resolved by reducing very large integers (~1e8) on observation, by a scaling factor. Took some time to find, since I checked for nan+inf when creating the observations. During testing, some returned obs were inf. I'm guessing these large ints caused an overflow somewhere in the network. (Very informative doc page: https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html)

Miffyli added custom gym env Issue related to Custom Gym Env question Further information is requested labels Dec 28, 2019

araffin mentioned this issue Feb 18, 2020

network returning nan actions [question] #693

Closed

pirate-lofy mentioned this issue Apr 14, 2020

Getting Nan and inf in PPO2 #806

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAN problem in PPO1 and PPO2 #634

NAN problem in PPO1 and PPO2 #634

MeiJuanLiu commented Dec 28, 2019

Miffyli commented Dec 28, 2019

araffin commented Jan 19, 2020

jtromans commented Apr 5, 2020 •

edited

Loading

araffin commented Apr 5, 2020

jtromans commented Apr 5, 2020 •

edited

Loading

MeiJuanLiu commented Apr 6, 2020

jtromans commented Apr 8, 2020

ghost commented Aug 18, 2020 •

edited by ghost

Loading

NAN problem in PPO1 and PPO2 #634

NAN problem in PPO1 and PPO2 #634

Comments

MeiJuanLiu commented Dec 28, 2019

Miffyli commented Dec 28, 2019

araffin commented Jan 19, 2020

jtromans commented Apr 5, 2020 • edited Loading

araffin commented Apr 5, 2020

jtromans commented Apr 5, 2020 • edited Loading

MeiJuanLiu commented Apr 6, 2020

jtromans commented Apr 8, 2020

ghost commented Aug 18, 2020 • edited by ghost Loading

jtromans commented Apr 5, 2020 •

edited

Loading

jtromans commented Apr 5, 2020 •

edited

Loading

ghost commented Aug 18, 2020 •

edited by ghost

Loading