Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAN problem in PPO1 and PPO2 #634

Open
MeiJuanLiu opened this issue Dec 28, 2019 · 8 comments
Open

NAN problem in PPO1 and PPO2 #634

MeiJuanLiu opened this issue Dec 28, 2019 · 8 comments
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@MeiJuanLiu
Copy link

I'm trying to apply PPO1/PPO2 agents with my custom environment, however after some epoches, the policy_loss, policy_entropy and approxkl all become nan. If i use default layers(two hidden layers with sizes 64) as policy network, it just ok, but not ok for a bigger network( like two layers with sizes 256).
so is there any good idea or solution to this problem?

image

@Miffyli Miffyli added custom gym env Issue related to Custom Gym Env question Further information is requested labels Dec 28, 2019
@Miffyli
Copy link
Collaborator

Miffyli commented Dec 28, 2019

Please take a look at the documentation about nans. Other than that I suggest debugging the observations and actions going between agent and environment to see when/if you get non-numeric values like infs and nans.

@araffin
Copy link
Collaborator

araffin commented Jan 19, 2020

Maybe related: #340 (try setting the entropy coeff to zero)

@jtromans
Copy link

jtromans commented Apr 5, 2020

I also got this problem with PPO2. I note in issues #340 the entropy coefficient was to blame. More concretely, the OP suggested that it was too high at a value of 0.5 and @araffin suggested that this parameter is usually 0.01. However, I am using default values, or values that are found to work in PPO2 in RL Zoo.

I have also tried A2C and get a different but similar error.

Having followed the read-me and guides, I hoping that this is the correct approach to instantiating the environment:

env = gym.make('highcard-v0')
env = make_vec_env(lambda: env, n_envs=256)
env = VecCheckNan(env, raise_exception=True)
env = VecNormalize(env) # when training norm_reward = True

As you can see, I am using a custom environment, so the problem could lie there. However, it is a very simple environment with action space Discrete(2) and the observation space is MultiDiscrete of length 15 w/ bound values between 0 and 14 across the dimensions.

Any pointers to start debugging would be great.

@araffin
Copy link
Collaborator

araffin commented Apr 5, 2020

it seems your are using the same env 256 times... you should pass the env id.

@jtromans
Copy link

jtromans commented Apr 5, 2020

Thanks for catching that - I've corrected that but I still get the same issue. What would be the appropriate approach for debugging this.

image

image

@MeiJuanLiu
Copy link
Author

Thanks for catching that - I've corrected that but I still get the same issue. What would be the appropriate approach for debugging this.

try more smaller learning_rate

@jtromans
Copy link

jtromans commented Apr 8, 2020

Thanks for the suggestion. I will try this. However, randomly trying different Hyper Parameters may not be the most optimal way to debug. I'm after a more technical approach in terms of following the issues within the code.

@ghost
Copy link

ghost commented Aug 18, 2020

For anyone landing here with this problem, it may or may not help, but I had this problem with nan in PPO2 training, that I resolved by reducing very large integers (~1e8) on observation, by a scaling factor. Took some time to find, since I checked for nan+inf when creating the observations. During testing, some returned obs were inf. I'm guessing these large ints caused an overflow somewhere in the network. (Very informative doc page: https://stable-baselines.readthedocs.io/en/master/guide/checking_nan.html)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants