-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental support for accelerate #826
Conversation
As of now, implemented as a mixin class to serve as proof of concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I tested on hardware and can confirm this works.
Using a mixin adds another design pattern, which can signal a limitation of the callbacks. The most user-friendly way would be to add a kwarg into NeuraNet
, but that adds more complexity to NeuraNet
. It's all trade-offs now.
I am happy to try mixins out.
skorch/helper.py
Outdated
class AccelerateMixin: | ||
"""Mixin class to add support for huggingface's accelerate | ||
|
||
*Experimental* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can include a link to https://huggingface.co/docs/accelerate/index.html and how to install in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we agree to proceed with this design, I would add links and more information.
skorch/tests/test_helper.py
Outdated
|
||
def test_fit_fp16(self, net, data): | ||
X, y = data | ||
net.fit(X, y) # does not raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a public way to check that accerlate
did something without mocking. All module and optimizer state in Accelerator
is stored as a private attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably add tests with a fake accelerator that sets an attribute on the object (module.accelerated = True
) or something like this for testing.
skorch/tests/test_helper.py
Outdated
|
||
@pytest.fixture | ||
def net(self, net_cls, module_cls, accelerator): | ||
return net_cls( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a sanity check, it would be good to check if the NeuralNet
can be pickled and unpickled with an accelerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Nice, thanks for testing this out. I would assume that certain combinations of settings might cause problems, but it will be impossible to check all of them out (especially not with the existing CI). I would also be happy if we could cover the most common use cases, everything else would be almost unfeasible, given the multitude of options available.
This is what I meant when I wrote
but I could have written that more clearly. I agree, the reason why I started like this is just to get something out quickly and collect feedback (especially since it might not have worked at all). If we agree with the design, I could move the functionality to If we decide to go this way, what do you think would be better: Having a couple of |
If we place the logic into |
@thomasjpfan I discussed this with @ottonemo and we decided to leave the feature as a mixin class for now and not to integrate it into |
Okay, lets go with the mixin. |
- Add install instruction to the docstring - Add section to helper.rst - Minor fixes
@thomasjpfan I added more documentation of this new feature. This should hopefully help users to discover the feature (and provide feedback if there is any issue). |
docs/user/helper.rst
Outdated
MyModule, | ||
accelerator=accelerator, | ||
device=None, | ||
callbacks__print_log__sink=accelerator.print) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do set callbacks__print_log__sink
automatically? It feels like more boilerplate for a user to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. On the other hand, since the accelerator does not exist when we define AccelerateMixin
, we cannot set it as default argument. We would need to do something else, like:
def initialize(self):
super().initialize()
self.set_params(callbacks__print_log__sink=self.accelerator.print)
return self
Not sure if the user would have any reason not to set this, but if they had, this hard-coding would prove annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having good defaults is generally better. If a user does not want to set it like this, I'm guessing they are more advanced and would need to to override initialize
to undo it.
It's a toss up for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a different solution which hopefully lets us have the cake and eat it too. Please take a look: 9d3e5e0
This is achieved by setting the default callbacks__print_log__print to 'auto'. When this is detected later, replace it with self.accelerator.print if available. This way, we can use the sane default even if we cannot directly set it as default (because the accelerator instance does not exist yet) but still give the user the option to set a different argument.
skorch/helper.py
Outdated
>>> net = AcceleratedNet( | ||
... MyModule, | ||
... accelerator=accelerator, | ||
... device=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We likely can make device="auto"
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure on this one, as it's not as clear cut as the other default. Also, changing device is not as obscure as the print log sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with print
, I can not really think of a case where device
is not None
when using Accelerator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, you convinced me, I changed the default to None
.
It's necessary to assign the prepared components, pure inplace mutation seems to create bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I think we can install accelerate on the CI and run the CPU tests.
When using AMP, accelerate applies grad scaling under the hood using GradScaler. That does not support passing the train step as a closure to optimizer.step. Therefore, we need to step explicitly. We could use a more sophisticated approach of trying to figure out if grad scaler is actually being used and only stepping explicitly if needed. However, the need for the closure is quite rare and we want to treat accelerate as a black box instead of relying on implementation details (which we would have to in order to figure out when grad scaling is applied).
This way, it is being tested (as far as possible with only CPU) during CI.
@thomasjpfan I did run into one more bug when I got my hands on an NVIDIA T4 GPU. Namely, when using fp16, accelerate uses I think in the future, we might consider having the 'closure' pattern be an explicit option that the user can opt into via a parameter, instead of doing it implicitly. Then we can also give a proper error when a user tries to use closure with accelerate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix with closures!
I ran the tests locally with an Ampere GPU and everything passes. I also experimented with notebooks/MNIST.ipynb
to use Accelerator
and fp16 works as expected.
LGTM
Thanks @thomasjpfan for doing another round of testing. So now we know it works on Ampere, Turing, and CPU :D That's probably sufficient testing for an experimental feature. |
Description
Proposal to implement huggingface accelerate (#760). If this is adopted, we would probably no longer need implement AMP support, since this is a subset of the offered functionality of accelerate.
For lack of appropriate hardware, I could not test if this would actually work or not.
Implementation
First idea was to implement this as a callback but that would require to add a lot of new
on_
-methods that would serve only this callback for now.Therefore, this feature is implemented as a mixin class to serve as proof of concept of how this could look like if implemented in the
NeuralNet
class itself.I'm not sure if we would need to use
accelerator.utils.save
.The failing tests seem to be unrelated to this change.