-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network returning nan actions [question] #693
Comments
I assume you took a look at the docs on this issue? Looking at your prints, you see a huge value loss of "3735940.5", which probably relates to reason why network nan-ed out. What is the scale of your one-step rewards? These should be in (roughly) [-1, 1] interval to avoid large value estimates. If episodes are long (thousands of steps) and reward is often -1/+1, consider smaller reward (e.g. 0.01 per step). |
Note for future: Issues (and solutions) like these could be documented on docs, similar to how exporting models is done. Perhaps even in more stand-alone fashion where user should not have to check out Github issues for a (possible) solution. |
Thank you for your comments. I will try the two proposed solutions.
Am I implementing VecCheckNan incorrectly? or am I not understanding it's function? I would have expected it to throw an exception. Instead nan actions are set to the environment |
You can also use VecNormalize for that.
The use looks good (@hill-a ?). The NaN may come from the training, not the env/actions. |
I changed the Entropy coeff to zero and VecCheckNan fired and gave me an exception. Not sure why the Entropy coeff would impact that. Not sure if this is a bug, or something else. I'll test a couple more times with different entropy coeff. Nevertheless, I'll look into my error as well. Looks like I am getting some observations that are quite large, which probably are effecting the RL model. ValueError: found nan in actions.
Originated from the RL model, Last given value was:
observations=[[ 9.74358974e-02 7.22983257e-05 1.00000000e-02 -2.76467740e+02
2.73499123e+01 3.30538558e-02 -1.35136922e+04 -2.21475881e+04
-9.28789806e+01 -7.94638599e+04 -4.57999995e-03 -5.04200011e-02
-1.30199999e-01 -1.00000000e+00 -1.00000000e+00 6.32699998e-03
4.91530001e-02 5.52972972e-01 0.00000000e+00 0.00000000e+00
3.10000003e-04 2.40000011e-03 4.90000006e-03 0.00000000e+00
0.00000000e+00 -3.80605012e-01 -2.95690489e+00 -1.09279394e+00
0.00000000e+00 0.00000000e+00 9.95419979e-01 9.49580014e-01
8.56710017e-01 0.00000000e+00 0.00000000e+00 6.32699998e-03
4.91530001e-02 5.58996975e-01 0.00000000e+00 0.00000000e+00
3.10000003e-04 2.40000011e-03 5.22999978e-03 0.00000000e+00
0.00000000e+00 -3.85437995e-01 -2.96178102e+00 -1.23739100e+00
-0.00000000e+00 -0.00000000e+00 0.00000000e+00 -1.86945000e-01
-6.97144983e-02 -1.31175000e-01 -1.81750000e-02 -4.62000000e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 -3.26584443e-01 1.46199994e-02 1.06608226e-01
1.87500000e-01 -4.16625000e-01 3.43125000e-01 2.83000000e-02
-1.87610718e-01 1.20337435e+00 -1.04462176e+01 -7.18631796e+00
-5.13140259e+00 -4.10876458e+00 -3.85127280e+00 -4.06299722e+00
-4.48331901e+00 -4.92743480e+00 -5.35426524e+00 -5.67133712e+00
-6.00075009e+00 -6.98883675e+00 -6.49344569e+00 2.79469254e+00
1.96545682e+01 3.01015005e+01 4.37501483e+01 7.16842851e+01
1.02455046e+02 1.33220271e+02]...
|
Quick update: I normalized the rewards and fixed the observations which were providing a large value. I also made the entropy coefficient zero. I am still receiving nans. While this is probably an issue with the environment sending something weird to the RL model, I am surprised that VecCheckNan isn't throwing an exception. it ran over 500,000 steps in the environment last night and actions were nan for all. In what case(s) would a nan not throw an exception using VecCheckNan? ---------------------------------
| approxkl | nan |
| clipfrac | 0.0 |
| explained_variance | nan |
| fps | 51 |
| n_updates | 3 |
| policy_entropy | nan |
| policy_loss | nan |
| serial_timesteps | 6144 |
| time_elapsed | 790 |
| total_timesteps | 786432 |
| value_loss | nan |
--------------------------------- |
I once encountered this problem, and then I tried rl-model-hyperparameters, and finally my problem was improved, but rl-model-hyperparameters takes a lot of time. |
closing this issue at it seems definitely related to custom environment. |
Describe the bug
I am working with a custom environment and vectorizing it using SubprocVecEnv. Recently while using PPO2 I have started receiving Nan.
Illustrated below are two consecutive results. The first is good and then all training messages after have nans and the network is sending nan actions to the environment,
I was hoping to use VecCheckNan to help debug the root cause, but I do not get a warning or exception. Looking to see if I am implementing VecCheckNan correctly or if this will even help with my problem.
Code example
I have determined that the network is returning Nan for the actions, but unable to determine the cause. I have included some checks in my environment to print to console if an nan or infinite is detected in the observations or reward, but receive no messages; so I assume the environment is operating correctly. I was hoping VecCheckNan would provide some material guidance, but perhaps I am using it incorrectly
System Info
Any Insights?
The text was updated successfully, but these errors were encountered: