forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[User guides] Add user guides for DeepSpeed and Accelerate (ray-proje…
…ct#38513) Signed-off-by: Yunxuan Xiao <yunxuanx@anyscale.com>
- Loading branch information
Showing
18 changed files
with
1,076 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
.. _train-deepspeed: | ||
|
||
Training with DeepSpeed | ||
======================= | ||
|
||
The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `DeepSpeed <https://www.deepspeed.ai/>`_ training across a distributed Ray cluster. | ||
|
||
All you need to do is run your existing training code with a TorchTrainer. You can expect the final code to look like this: | ||
|
||
.. code-block:: python | ||
import deepspeed | ||
from deepspeed.accelerator import get_accelerator | ||
def train_func(config): | ||
# Instantiate your model and dataset | ||
model = ... | ||
train_dataset = ... | ||
eval_dataset = ... | ||
deepspeed_config = {...} # Your Deepspeed config | ||
# Prepare everything for distributed training | ||
model, optimizer, train_dataloader, lr_scheduler = deepspeed.initialize( | ||
model=model, | ||
model_parameters=model.parameters(), | ||
training_data=tokenized_datasets["train"], | ||
collate_fn=collate_fn, | ||
config=deepspeed_config, | ||
) | ||
# Define the GPU device for the current worker | ||
device = get_accelerator().device_name(model.local_rank) | ||
# Start training | ||
... | ||
from ray.train.torch import TorchTrainer | ||
from ray.train import ScalingConfig | ||
trainer = TorchTrainer( | ||
train_func, | ||
scaling_config=ScalingConfig(...), | ||
... | ||
) | ||
trainer.fit() | ||
Below is a simple example of ZeRO-3 training with DeepSpeed only. | ||
|
||
.. tabs:: | ||
|
||
.. group-tab:: Example with Ray Data | ||
|
||
.. dropdown:: Show Code | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer.py | ||
:language: python | ||
:start-after: __deepspeed_torch_basic_example_start__ | ||
:end-before: __deepspeed_torch_basic_example_end__ | ||
|
||
.. group-tab:: Example with PyTorch DataLoader | ||
|
||
.. dropdown:: Show Code | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer_no_raydata.py | ||
:language: python | ||
:start-after: __deepspeed_torch_basic_example_no_raydata_start__ | ||
:end-before: __deepspeed_torch_basic_example_no_raydata_end__ | ||
|
||
.. tip:: | ||
|
||
To run DeepSpeed with pure PyTorch, you **don't need to** provide any additional Ray Train utilities | ||
like :meth:`~ray.train.torch.prepare_model` or :meth:`~ray.train.torch.prepare_data_loader` in your training funciton. Instead, | ||
keep using `deepspeed.initialize() <https://deepspeed.readthedocs.io/en/latest/initialize.html>`_ as usual to prepare everything | ||
for distributed training. | ||
|
||
Running DeepSpeed with other frameworks | ||
------------------------------------------- | ||
|
||
Many deep learning frameworks have integrated with DeepSpeed, including Lightning, Transformers, Accelerate, and more. You can run all these combinations in Ray Train. | ||
|
||
Please check the below examples for more details: | ||
|
||
.. list-table:: | ||
:header-rows: 1 | ||
|
||
* - Framework | ||
- Example | ||
* - Accelelate (:ref:`User Guide <train-hf-accelerate>`) | ||
- `Fine-tune Llama-2 series models with Deepspeed, Accelerate, and Ray Train. <https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed>`_ | ||
* - Transformers (:ref:`User Guide <train-pytorch-transformers>`) | ||
- :ref:`Fine-tune GPT-J-6b with DeepSpeed and Hugging Face Transformers <gptj_deepspeed_finetune>` | ||
* - Lightning (:ref:`User Guide <train-pytorch-lightning>`) | ||
- :ref:`Fine-tune vicuna-13b with DeepSpeed and PyTorch Lightning <vicuna_lightning_deepspeed_finetuning>` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
:orphan: | ||
|
||
.. _accelerate_example: | ||
|
||
Hugging Face Accelerate Distributed Training Example with Ray Train | ||
=================================================================== | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/accelerate/accelerate_torch_trainer.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
:orphan: | ||
|
||
.. _deepspeed_example: | ||
|
||
DeepSpeed ZeRO-3 Distributed Training Example with Ray Train | ||
============================================================ | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer.py |
Oops, something went wrong.