Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network returning nan actions [question] #693

Closed
cevans3098 opened this issue Feb 18, 2020 · 9 comments
Closed

network returning nan actions [question] #693

cevans3098 opened this issue Feb 18, 2020 · 9 comments
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@cevans3098
Copy link

Describe the bug
I am working with a custom environment and vectorizing it using SubprocVecEnv. Recently while using PPO2 I have started receiving Nan.

Illustrated below are two consecutive results. The first is good and then all training messages after have nans and the network is sending nan actions to the environment,

------------------------------------
| approxkl           | 0.05277536  |
| clipfrac           | 0.39106447  |
| explained_variance | 0.0262      |
| fps                | 17          |
| n_updates          | 29          |
| policy_entropy     | 6.770977    |
| policy_loss        | 0.016628642 |
| serial_timesteps   | 59392       |
| time_elapsed       | 2.44e+04    |
| total_timesteps    | 237568      |
| value_loss         | 3735940.5   |
------------------------------------


---------------------------------
| approxkl           | nan      |
| clipfrac           | 0.0      |
| explained_variance | 8.29e-06 |
| fps                | 15       |
| n_updates          | 30       |
| policy_entropy     | nan      |
| policy_loss        | nan      |
| serial_timesteps   | 61440    |
| time_elapsed       | 2.48e+04 |
| total_timesteps    | 245760   |
| value_loss         | nan      |
---------------------------------

I was hoping to use VecCheckNan to help debug the root cause, but I do not get a warning or exception. Looking to see if I am implementing VecCheckNan correctly or if this will even help with my problem.

Code example

from stable_baselines.common.policies import MlpPolicy, FeedForwardPolicy, ActorCriticPolicy, register_policy
from stable_baselines.common.vec_env import SubprocVecEnv, DummyVecEnv, VecCheckNan
from stable_baselines.common import set_global_seeds
from stable_baselines import PPO2

if __name__ == '__main__':

    register_policy('CustomPolicy', CustomPolicyDetailed)
    env = SubprocVecEnv(env_list)
    env = VecCheckNan(env, raise_exception=True)

    model = PPO2(policy ='CustomPolicy',
                             env = env, 
                             verbose = 1, 
                              vf_coef = VF_COEFF,
                              noptepochs = EPOCHS,
                              ent_coef = ENT_COEFF,
                              learning_rate = LEARNING_RATE,
                              tensorboard_log = tensorboard_log_location,
                              n_steps = NSTEPS,
                              nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)
        
        env.close()
        
        path = results_folder + logname
        model.save(path)

I have determined that the network is returning Nan for the actions, but unable to determine the cause. I have included some checks in my environment to print to console if an nan or infinite is detected in the observations or reward, but receive no messages; so I assume the environment is operating correctly. I was hoping VecCheckNan would provide some material guidance, but perhaps I am using it incorrectly

System Info

  • Stable-Baselines version = 2.8.0
  • Python version = 3.6
  • Tensorflow version = 1.14

Any Insights?

@Miffyli
Copy link
Collaborator

Miffyli commented Feb 18, 2020

I assume you took a look at the docs on this issue?

Looking at your prints, you see a huge value loss of "3735940.5", which probably relates to reason why network nan-ed out. What is the scale of your one-step rewards? These should be in (roughly) [-1, 1] interval to avoid large value estimates. If episodes are long (thousands of steps) and reward is often -1/+1, consider smaller reward (e.g. 0.01 per step).

@Miffyli Miffyli added question Further information is requested custom gym env Issue related to Custom Gym Env labels Feb 18, 2020
@araffin
Copy link
Collaborator

araffin commented Feb 18, 2020

Maybe related: #340 (try setting the entropy coeff to zero) and #634

@Miffyli
Copy link
Collaborator

Miffyli commented Feb 18, 2020

Note for future: Issues (and solutions) like these could be documented on docs, similar to how exporting models is done. Perhaps even in more stand-alone fashion where user should not have to check out Github issues for a (possible) solution.

@cevans3098
Copy link
Author

Thank you for your comments.

I will try the two proposed solutions.

  • My Entropy coeff was already relatively small, 0,005, but I will try setting it to zero to eliminate it as a possible culprit.
  • I can also try normalizing the reward

Am I implementing VecCheckNan incorrectly? or am I not understanding it's function? I would have expected it to throw an exception. Instead nan actions are set to the environment

@araffin
Copy link
Collaborator

araffin commented Feb 18, 2020

I can also try normalizing the reward

You can also use VecNormalize for that.

Am I implementing VecCheckNan incorrectly?

The use looks good (@hill-a ?). The NaN may come from the training, not the env/actions.

@cevans3098
Copy link
Author

I changed the Entropy coeff to zero and VecCheckNan fired and gave me an exception. Not sure why the Entropy coeff would impact that. Not sure if this is a bug, or something else. I'll test a couple more times with different entropy coeff.

Nevertheless, I'll look into my error as well. Looks like I am getting some observations that are quite large, which probably are effecting the RL model.

ValueError: found nan in actions.
Originated from the RL model, Last given value was:
        observations=[[ 9.74358974e-02  7.22983257e-05  1.00000000e-02 -2.76467740e+02
   2.73499123e+01  3.30538558e-02 -1.35136922e+04 -2.21475881e+04
  -9.28789806e+01 -7.94638599e+04 -4.57999995e-03 -5.04200011e-02
  -1.30199999e-01 -1.00000000e+00 -1.00000000e+00  6.32699998e-03
   4.91530001e-02  5.52972972e-01  0.00000000e+00  0.00000000e+00
   3.10000003e-04  2.40000011e-03  4.90000006e-03  0.00000000e+00
   0.00000000e+00 -3.80605012e-01 -2.95690489e+00 -1.09279394e+00
   0.00000000e+00  0.00000000e+00  9.95419979e-01  9.49580014e-01
   8.56710017e-01  0.00000000e+00  0.00000000e+00  6.32699998e-03
   4.91530001e-02  5.58996975e-01  0.00000000e+00  0.00000000e+00
   3.10000003e-04  2.40000011e-03  5.22999978e-03  0.00000000e+00
   0.00000000e+00 -3.85437995e-01 -2.96178102e+00 -1.23739100e+00
  -0.00000000e+00 -0.00000000e+00  0.00000000e+00 -1.86945000e-01
  -6.97144983e-02 -1.31175000e-01 -1.81750000e-02 -4.62000000e-03
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00 -3.26584443e-01  1.46199994e-02  1.06608226e-01
   1.87500000e-01 -4.16625000e-01  3.43125000e-01  2.83000000e-02
  -1.87610718e-01  1.20337435e+00 -1.04462176e+01 -7.18631796e+00
  -5.13140259e+00 -4.10876458e+00 -3.85127280e+00 -4.06299722e+00
  -4.48331901e+00 -4.92743480e+00 -5.35426524e+00 -5.67133712e+00
  -6.00075009e+00 -6.98883675e+00 -6.49344569e+00  2.79469254e+00
   1.96545682e+01  3.01015005e+01  4.37501483e+01  7.16842851e+01
   1.02455046e+02  1.33220271e+02]...

@cevans3098
Copy link
Author

Quick update: I normalized the rewards and fixed the observations which were providing a large value. I also made the entropy coefficient zero. I am still receiving nans. While this is probably an issue with the environment sending something weird to the RL model, I am surprised that VecCheckNan isn't throwing an exception. it ran over 500,000 steps in the environment last night and actions were nan for all.

In what case(s) would a nan not throw an exception using VecCheckNan?

---------------------------------
| approxkl           | nan      |
| clipfrac           | 0.0      |
| explained_variance | nan      |
| fps                | 51       |
| n_updates          | 3        |
| policy_entropy     | nan      |
| policy_loss        | nan      |
| serial_timesteps   | 6144     |
| time_elapsed       | 790      |
| total_timesteps    | 786432   |
| value_loss         | nan      |
---------------------------------

@ChengYen-Tang
Copy link

I once encountered this problem, and then I tried rl-model-hyperparameters, and finally my problem was improved, but rl-model-hyperparameters takes a lot of time.

@araffin
Copy link
Collaborator

araffin commented May 9, 2020

closing this issue at it seems definitely related to custom environment.

@araffin araffin closed this as completed May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants