Figure explanation #34

ChenDRAG · 2020-11-12T06:46:33Z

Could you please explain in detail how you get Max Average Return in Fig.5 & Table 1. in your paper.
From what I understand, if I want you evaluate one algorithm(like TD3) in one game and get statistics:

test_reward[10][1M/5k = 200]
for seed in range(10):
    set the seed
    for every 5k steps collected:
        test the policy for 10 times, record average game return in test_reward[seed][epoch]

now I have 10*200 = 2000 rewards in total, each reward represents average return over 10 test trails.
How exactly do you calculate Max Average Return and standard deviation.
For example, is it
max average return = (max(mean(test_reward, axis = 0)))
or
max average return = (mean(max(test_reward, axis = 0)))

same for deviation, thanks a lot

The text was updated successfully, but these errors were encountered:

sfujim · 2020-11-30T21:45:06Z

It was the first of the two. In other words, we took the average return over all the trials, and then found the time step/evaluation with the highest average return. The std is the std of the trials at that time step. That being said, I don't recommend using this type of evaluation anymore, in my more recent papers we've taken the average over the final 10 evaluations. Hope that helps!

sfujim closed this as completed Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure explanation #34

Figure explanation #34

ChenDRAG commented Nov 12, 2020 •

edited

Loading

sfujim commented Nov 30, 2020

Figure explanation #34

Figure explanation #34

Comments

ChenDRAG commented Nov 12, 2020 • edited Loading

sfujim commented Nov 30, 2020

ChenDRAG commented Nov 12, 2020 •

edited

Loading