Documentation about autocast

This isn't really an issue per se, but I found that if you wrap the entire call `grad_cache.GradCache(...)` in torch autocast, you will run into weird errors. This happens by default in the huggingface trainer, which wraps the call to `self.compute_loss()` in autocast context manager within `training_step` by default.

Maybe you can add a note about this gotcha in the readme?