Open
Description
This isn't really an issue per se, but I found that if you wrap the entire call grad_cache.GradCache(...)
in torch autocast, you will run into weird errors. This happens by default in the huggingface trainer, which wraps the call to self.compute_loss()
in autocast context manager within training_step
by default.
Maybe you can add a note about this gotcha in the readme?
Metadata
Assignees
Labels
No labels