-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor value based methods #102
Conversation
Some benchmarked experiments: https://wandb.ai/costa-huang/cleanRL/reports/Regression-Report--VmlldzoxNDI1MTE4 |
The regression report checks through @dosssman all good on your end? |
I have been looking at the continous action space methods which I am more familiar with. Will further check the DQN / C51 like discrete action space further in the week. In any case, great work as always. |
* Fixed 'optimize the midel' typo in all files * Fixed 'optimize the midel' typo in offline scripts too * TD3: removed DDPG's update code from the training loop * Refactored sac_continuous, with preliminary tests working
Thanks @dosssman for the detailed check on TD3, after this fix, TD3's performance (green line on the right) is even better |
Merging as is so that I could introduce formatting piplines after discussing with @dosssman (great work btw). And if there are remaining issues we can open new PRs. |
Continue on #79. This PR refactors the value based methods. Specifically, we
buffer.sample()
compared to something likeagent.learn()
, which is much more abstract. To this end, I think it's worth to adopt SB3's replay buffer. In the future, when SB3 introduce prioritized replay buffer ([question] HER and prioritized experience replay hill-a/stable-baselines#751), it will also make it easier for us to adopt.