Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure explanation #34

Closed
ChenDRAG opened this issue Nov 12, 2020 · 1 comment
Closed

Figure explanation #34

ChenDRAG opened this issue Nov 12, 2020 · 1 comment

Comments

@ChenDRAG
Copy link

ChenDRAG commented Nov 12, 2020

Could you please explain in detail how you get Max Average Return in Fig.5 & Table 1. in your paper.
From what I understand, if I want you evaluate one algorithm(like TD3) in one game and get statistics:

test_reward[10][1M/5k = 200]
for seed in range(10):
    set the seed
    for every 5k steps collected:
        test the policy for 10 times, record average game return in test_reward[seed][epoch]

now I have 10*200 = 2000 rewards in total, each reward represents average return over 10 test trails.
How exactly do you calculate Max Average Return and standard deviation.
For example, is it
max average return = (max(mean(test_reward, axis = 0)))
or
max average return = (mean(max(test_reward, axis = 0)))

same for deviation, thanks a lot

@sfujim
Copy link
Owner

sfujim commented Nov 30, 2020

It was the first of the two. In other words, we took the average return over all the trials, and then found the time step/evaluation with the highest average return. The std is the std of the trials at that time step. That being said, I don't recommend using this type of evaluation anymore, in my more recent papers we've taken the average over the final 10 evaluations. Hope that helps!

@sfujim sfujim closed this as completed Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants