-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Algorithm] Simpler IQL example #998
Conversation
…o rewrite_iql_example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for this!
Can you merge main into this?
I will try to launch them once you do!
@BY571 I forgot: Can you add the scripts in the tests too? |
Updated online and added offline example tests for IQL and also added them for CQL as they were still missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, thanks a mil!
Just a couple of housekeeping comments
|
||
|
||
def make_offline_replay_buffer(rb_cfg): | ||
data = D4RLExperienceReplay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our examples, we could default to the version that does not require the d4rl library, wdyt?
It's pretty annoying to install and the dataset works without it.
pred_q2: NestedKey = "pred_q2" | ||
priority: NestedKey = "td_error" | ||
cql_q1_loss: NestedKey = "cql_q1_loss" | ||
cql_q2_loss: NestedKey = "cql_q2_loss" | ||
priority: NestedKey = "td_error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate
torchrl/objectives/cql.py
Outdated
|
||
td_error = abs(q_pred - target_value) | ||
alpha_prime = torch.clamp(self.log_alpha_prime.exp(), min=0.0, max=1000000.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 1000000.0
a magic number? What is the purpose of this clamp (to understand how we can code this in a more grounded way)?
No need to clamp an exp to 0 below, we can just do clamp_max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the number seems quite random, I took it directly from the official implementation 1.
As far as I could see it in some tests the alpha_prime value can become quite big I guess this is just to make sure its not exploding. But I'll have a look at the paper if they mention something more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one minor thing to edit and we're good I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome let's proceed!
Description
Rewrites the IQL online example and adds an offline training example for IQL.
Motivation and Context
Updates the online IQL example similar to #967. Fixes the "correct_for_frame_skip" function for the nested config and adds a simple offline IQL example.
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!