You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you please explain in detail how you get Max Average Return in Fig.5 & Table 1. in your paper.
From what I understand, if I want you evaluate one algorithm(like TD3) in one game and get statistics:
test_reward[10][1M/5k = 200]
for seed in range(10):
set the seed
for every 5k steps collected:
test the policy for 10 times, record average game return in test_reward[seed][epoch]
now I have 10*200 = 2000 rewards in total, each reward represents average return over 10 test trails.
How exactly do you calculate Max Average Return and standard deviation.
For example, is it
max average return = (max(mean(test_reward, axis = 0)))
or
max average return = (mean(max(test_reward, axis = 0)))
same for deviation, thanks a lot
The text was updated successfully, but these errors were encountered:
It was the first of the two. In other words, we took the average return over all the trials, and then found the time step/evaluation with the highest average return. The std is the std of the trials at that time step. That being said, I don't recommend using this type of evaluation anymore, in my more recent papers we've taken the average over the final 10 evaluations. Hope that helps!
Could you please explain in detail how you get Max Average Return in Fig.5 & Table 1. in your paper.
From what I understand, if I want you evaluate one algorithm(like TD3) in one game and get statistics:
now I have 10*200 = 2000 rewards in total, each reward represents average return over 10 test trails.
How exactly do you calculate Max Average Return and standard deviation.
For example, is it
max average return = (max(mean(test_reward, axis = 0)))
or
max average return = (mean(max(test_reward, axis = 0)))
same for deviation, thanks a lot
The text was updated successfully, but these errors were encountered: