network returning nan actions [question] #693

cevans3098 · 2020-02-18T05:00:18Z

Describe the bug
I am working with a custom environment and vectorizing it using SubprocVecEnv. Recently while using PPO2 I have started receiving Nan.

Illustrated below are two consecutive results. The first is good and then all training messages after have nans and the network is sending nan actions to the environment,

------------------------------------
| approxkl           | 0.05277536  |
| clipfrac           | 0.39106447  |
| explained_variance | 0.0262      |
| fps                | 17          |
| n_updates          | 29          |
| policy_entropy     | 6.770977    |
| policy_loss        | 0.016628642 |
| serial_timesteps   | 59392       |
| time_elapsed       | 2.44e+04    |
| total_timesteps    | 237568      |
| value_loss         | 3735940.5   |
------------------------------------


---------------------------------
| approxkl           | nan      |
| clipfrac           | 0.0      |
| explained_variance | 8.29e-06 |
| fps                | 15       |
| n_updates          | 30       |
| policy_entropy     | nan      |
| policy_loss        | nan      |
| serial_timesteps   | 61440    |
| time_elapsed       | 2.48e+04 |
| total_timesteps    | 245760   |
| value_loss         | nan      |
---------------------------------

I was hoping to use VecCheckNan to help debug the root cause, but I do not get a warning or exception. Looking to see if I am implementing VecCheckNan correctly or if this will even help with my problem.

Code example

from stable_baselines.common.policies import MlpPolicy, FeedForwardPolicy, ActorCriticPolicy, register_policy
from stable_baselines.common.vec_env import SubprocVecEnv, DummyVecEnv, VecCheckNan
from stable_baselines.common import set_global_seeds
from stable_baselines import PPO2

if __name__ == '__main__':

    register_policy('CustomPolicy', CustomPolicyDetailed)
    env = SubprocVecEnv(env_list)
    env = VecCheckNan(env, raise_exception=True)

    model = PPO2(policy ='CustomPolicy',
                             env = env, 
                             verbose = 1, 
                              vf_coef = VF_COEFF,
                              noptepochs = EPOCHS,
                              ent_coef = ENT_COEFF,
                              learning_rate = LEARNING_RATE,
                              tensorboard_log = tensorboard_log_location,
                              n_steps = NSTEPS,
                              nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)
        
        env.close()
        
        path = results_folder + logname
        model.save(path)

I have determined that the network is returning Nan for the actions, but unable to determine the cause. I have included some checks in my environment to print to console if an nan or infinite is detected in the observations or reward, but receive no messages; so I assume the environment is operating correctly. I was hoping VecCheckNan would provide some material guidance, but perhaps I am using it incorrectly

System Info

Stable-Baselines version = 2.8.0
Python version = 3.6
Tensorflow version = 1.14

Any Insights?

Miffyli · 2020-02-18T08:30:17Z

I assume you took a look at the docs on this issue?

Looking at your prints, you see a huge value loss of "3735940.5", which probably relates to reason why network nan-ed out. What is the scale of your one-step rewards? These should be in (roughly) [-1, 1] interval to avoid large value estimates. If episodes are long (thousands of steps) and reward is often -1/+1, consider smaller reward (e.g. 0.01 per step).

araffin · 2020-02-18T10:42:41Z

Maybe related: #340 (try setting the entropy coeff to zero) and #634

Miffyli · 2020-02-18T12:10:14Z

Note for future: Issues (and solutions) like these could be documented on docs, similar to how exporting models is done. Perhaps even in more stand-alone fashion where user should not have to check out Github issues for a (possible) solution.

cevans3098 · 2020-02-18T14:59:32Z

Thank you for your comments.

I will try the two proposed solutions.

My Entropy coeff was already relatively small, 0,005, but I will try setting it to zero to eliminate it as a possible culprit.
I can also try normalizing the reward

Am I implementing VecCheckNan incorrectly? or am I not understanding it's function? I would have expected it to throw an exception. Instead nan actions are set to the environment

araffin · 2020-02-18T15:03:18Z

I can also try normalizing the reward

You can also use VecNormalize for that.

Am I implementing VecCheckNan incorrectly?

The use looks good (@hill-a ?). The NaN may come from the training, not the env/actions.

cevans3098 · 2020-02-18T16:05:59Z

I changed the Entropy coeff to zero and VecCheckNan fired and gave me an exception. Not sure why the Entropy coeff would impact that. Not sure if this is a bug, or something else. I'll test a couple more times with different entropy coeff.

Nevertheless, I'll look into my error as well. Looks like I am getting some observations that are quite large, which probably are effecting the RL model.

ValueError: found nan in actions.
Originated from the RL model, Last given value was:
        observations=[[ 9.74358974e-02  7.22983257e-05  1.00000000e-02 -2.76467740e+02
   2.73499123e+01  3.30538558e-02 -1.35136922e+04 -2.21475881e+04
  -9.28789806e+01 -7.94638599e+04 -4.57999995e-03 -5.04200011e-02
  -1.30199999e-01 -1.00000000e+00 -1.00000000e+00  6.32699998e-03
   4.91530001e-02  5.52972972e-01  0.00000000e+00  0.00000000e+00
   3.10000003e-04  2.40000011e-03  4.90000006e-03  0.00000000e+00
   0.00000000e+00 -3.80605012e-01 -2.95690489e+00 -1.09279394e+00
   0.00000000e+00  0.00000000e+00  9.95419979e-01  9.49580014e-01
   8.56710017e-01  0.00000000e+00  0.00000000e+00  6.32699998e-03
   4.91530001e-02  5.58996975e-01  0.00000000e+00  0.00000000e+00
   3.10000003e-04  2.40000011e-03  5.22999978e-03  0.00000000e+00
   0.00000000e+00 -3.85437995e-01 -2.96178102e+00 -1.23739100e+00
  -0.00000000e+00 -0.00000000e+00  0.00000000e+00 -1.86945000e-01
  -6.97144983e-02 -1.31175000e-01 -1.81750000e-02 -4.62000000e-03
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
   0.00000000e+00 -3.26584443e-01  1.46199994e-02  1.06608226e-01
   1.87500000e-01 -4.16625000e-01  3.43125000e-01  2.83000000e-02
  -1.87610718e-01  1.20337435e+00 -1.04462176e+01 -7.18631796e+00
  -5.13140259e+00 -4.10876458e+00 -3.85127280e+00 -4.06299722e+00
  -4.48331901e+00 -4.92743480e+00 -5.35426524e+00 -5.67133712e+00
  -6.00075009e+00 -6.98883675e+00 -6.49344569e+00  2.79469254e+00
   1.96545682e+01  3.01015005e+01  4.37501483e+01  7.16842851e+01
   1.02455046e+02  1.33220271e+02]...

cevans3098 · 2020-02-21T21:53:30Z

Quick update: I normalized the rewards and fixed the observations which were providing a large value. I also made the entropy coefficient zero. I am still receiving nans. While this is probably an issue with the environment sending something weird to the RL model, I am surprised that VecCheckNan isn't throwing an exception. it ran over 500,000 steps in the environment last night and actions were nan for all.

In what case(s) would a nan not throw an exception using VecCheckNan?

---------------------------------
| approxkl           | nan      |
| clipfrac           | 0.0      |
| explained_variance | nan      |
| fps                | 51       |
| n_updates          | 3        |
| policy_entropy     | nan      |
| policy_loss        | nan      |
| serial_timesteps   | 6144     |
| time_elapsed       | 790      |
| total_timesteps    | 786432   |
| value_loss         | nan      |
---------------------------------

ChengYen-Tang · 2020-04-09T17:56:47Z

I once encountered this problem, and then I tried rl-model-hyperparameters, and finally my problem was improved, but rl-model-hyperparameters takes a lot of time.

araffin · 2020-05-09T13:25:12Z

closing this issue at it seems definitely related to custom environment.

Miffyli added question Further information is requested custom gym env Issue related to Custom Gym Env labels Feb 18, 2020

araffin closed this as completed May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network returning nan actions [question] #693

network returning nan actions [question] #693

cevans3098 commented Feb 18, 2020

Miffyli commented Feb 18, 2020

araffin commented Feb 18, 2020

Miffyli commented Feb 18, 2020

cevans3098 commented Feb 18, 2020

araffin commented Feb 18, 2020

cevans3098 commented Feb 18, 2020

cevans3098 commented Feb 21, 2020

ChengYen-Tang commented Apr 9, 2020

araffin commented May 9, 2020

network returning nan actions [question] #693

network returning nan actions [question] #693

Comments

cevans3098 commented Feb 18, 2020

Miffyli commented Feb 18, 2020

araffin commented Feb 18, 2020

Miffyli commented Feb 18, 2020

cevans3098 commented Feb 18, 2020

araffin commented Feb 18, 2020

cevans3098 commented Feb 18, 2020

cevans3098 commented Feb 21, 2020

ChengYen-Tang commented Apr 9, 2020

araffin commented May 9, 2020