Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Docs do-over (new API stack): Re-write checkpointing rst page. #49504

Merged
merged 45 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c90be66
wip
sven1977 Dec 23, 2024
0b3870c
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 27, 2024
e2a156c
wip
sven1977 Dec 27, 2024
e520678
wip
sven1977 Dec 27, 2024
3c05f4a
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 27, 2024
c6a18b2
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 28, 2024
266f894
wip
sven1977 Dec 28, 2024
5401b6d
wip
sven1977 Dec 29, 2024
6edef91
wip
sven1977 Dec 30, 2024
fd66605
merge
sven1977 Dec 30, 2024
f939ee7
wip
sven1977 Dec 30, 2024
0a3e79e
fix
sven1977 Dec 30, 2024
dadd964
wip
sven1977 Dec 30, 2024
896c1a2
wip
sven1977 Dec 30, 2024
d76edfd
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 30, 2024
44acf78
wip
sven1977 Dec 30, 2024
a8bdbac
wip
sven1977 Dec 30, 2024
fdddd9f
wip
sven1977 Dec 30, 2024
e8fec77
Merge branch 'checkpointing_enhancements_msgpack_and_separation_of_st…
sven1977 Dec 30, 2024
a42ff1e
wip
sven1977 Dec 30, 2024
53967bc
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 31, 2024
db90920
wip
sven1977 Dec 31, 2024
f19b340
merge
sven1977 Jan 2, 2025
46c1388
wip
sven1977 Jan 2, 2025
5554e3b
wip
sven1977 Jan 2, 2025
0598b45
wip
sven1977 Jan 3, 2025
e210dbe
wip
sven1977 Jan 3, 2025
972f355
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 3, 2025
78a344c
fix
sven1977 Jan 3, 2025
5bf0ef6
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 6, 2025
2f04106
wip
sven1977 Jan 6, 2025
2f9925f
Apply suggestions from code review
sven1977 Jan 6, 2025
5333255
Merge branch 'docs_redo_checkpointing' of https://github.com/sven1977…
sven1977 Jan 6, 2025
d6e186f
fixes
sven1977 Jan 6, 2025
da727c3
wip
sven1977 Jan 6, 2025
00f5a4b
fix
sven1977 Jan 6, 2025
ecf631d
fix
sven1977 Jan 6, 2025
7979dbf
fixes
sven1977 Jan 6, 2025
62736f9
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 7, 2025
64d47bc
fixes
sven1977 Jan 7, 2025
d9b769c
wip
sven1977 Jan 7, 2025
f63afc3
wip
sven1977 Jan 7, 2025
dcd4663
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 7, 2025
b59401b
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 7, 2025
998a544
wip
sven1977 Jan 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
wip
Signed-off-by: sven1977 <svenmika1977@gmail.com>
  • Loading branch information
sven1977 committed Dec 30, 2024
commit 44acf781eca7ce281481f4dc00bb4e0411bf8511
170 changes: 100 additions & 70 deletions doc/source/rllib/checkpointing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ how to create a checkpoint:
is ``~/ray_results/[your experiment name]``.


.. _rllib-structure-of-checkpoint-dir:

Structure of a checkpoint directory
+++++++++++++++++++++++++++++++++++

Expand All @@ -126,9 +128,26 @@ Take a look at what the directory now looks like:
Subdirectories inside a checkpoint dir, like ``env_runner/``, hint at a subcomponent's own checkpoint data.
For example, an :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` always also saves its
:py:class:`~ray.rllib.env.env_runner.EnvRunner` state and :py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup` state.

.. note::
Each of the subcomponent's directories themselves contain a ``metadata.json`` file, a ``class_and_ctor_args.pkl`` file
and a ``.._state.pkl`` file, all serving the same purpose than their counterparts in the main algorithm checkpoint directory.
For example, inside the ``learner_group/`` subdirectory, you would find the :py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`'s own
state, construction, and meta information:

.. code-block:: shell

$ cd env_runner/
$ ls -la
.
..
state.pkl
class_and_ctor_args.pkl
metadata.json

See :ref:`here for the complete RLlib component tree <rllib-components-tree>`.

The ``metadata.json`` file is for your convenience only and RLlib doesn't it.
The ``metadata.json`` file exists for your convenience only and RLlib doesn't need it.

.. note::
The ``metadata.json`` file contains information about the Ray version used to create the checkpoint,
Expand All @@ -150,25 +169,18 @@ The ``class_and_ctor_args.pkl`` file stores meta information needed to construct
This information, as the filename suggests, contains the class of the saved object and its constructor arguments and keyword arguments.
RLlib uses this file to create the initial new object when calling :py:meth:`~ray.rllib.utils.checkpoints.Checkpointable.from_checkpoint`.

Finally, the ``.._state.pkl`` file contains the pickled state dict of the saved object. RLlib obtains this state dict
when saving a checkpoint through calling the object's :py:meth:`~ray.rllib.utils.checkpoints.Checkpointable.get_state`
method.
Finally, the ``.._state.[pkl|msgpack]`` files contain the pickled or messagepacked state dict of the saved object.
RLlib obtains this state dict when saving a checkpoint through calling the object's
:py:meth:`~ray.rllib.utils.checkpoints.Checkpointable.get_state` method.

.. note::
Each of the subcomponent's directories themselves contain a ``rllib_checkpoint.json`` file, a ``class_and_ctor_args.pkl`` file
and a ``.._state.pkl`` file, all serving the same purpose than their counterparts in the main algorithm checkpoint directory.
For example, inside the ``learner_group/`` subdirectory, you would find the :py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`'s own
state, construction, and meta information:

.. code-block:: shell
.. info::
Support for ``msgpack`` based checkpoints is experimental, but might become the default in the future.
Unlike ``pickle``, ``msgpack`` has the advantage of being independent of the python-version, thus allowing
users to recover experiment and model states from old checkpoints that have been generated with an older python
version.

$ cd env_runner/
$ ls -la
.
..
state.pkl
class_and_ctor_args.pkl
metadata.json
The Ray team is working on completely separating state from architecture, where all state information should go into
the ``state.msgpack`` file and all architecture information should go into the ``class_and_ctor_args.pkl`` file.


.. _rllib-components-tree:
Expand Down Expand Up @@ -199,15 +211,80 @@ is the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` class:
this issue, probably through softlinking to avoid duplicate files and unnecessary disk usage.


Creating a new object from a checkpoint with `from_checkpoint`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creating instances from a checkpoint with `from_checkpoint`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once you have a checkpoint of either an
Once you have a checkpoint of either a trained :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` or
any of its :ref:`subcomponents <rllib-components-tree>`, you can now create new objects directly
from this checkpoint.

For example, you could create a new :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` from
an existing algorithm checkpoint:

.. testcode::

# Import the correct class to create from scratch using the checkpoint.
from ray.rllib.algorithms.algorithm import Algorithm

# Use the already existing checkpoint in `checkpoint_dir`.
new_ppo = Algorithm.from_checkpoint(checkpoint_dir)
# Confirm the `new_ppo` matches the originally checkpointed one.
assert new_ppo.config.env == "Pendulum-v1"

# Continue training
new_ppo.train()

.. testcode::
:hide:

new_ppo.stop()


Using the exact same checkpoint from before and the same ``.from_checkpoint()`` utility,
you could also only reconstruct the RLModule trained by your Algorithm from the algorithm's checkpoint.
This becomes very useful when deploying trained models into production or evaluating them in a separate
process while training is ongoing.


.. testcode::

from pathlib import Path
import torch

# Import the correct class to create from scratch using the checkpoint.
from ray.rllib.core.rl_module.rl_module import RLModule

# Use the already existing checkpoint in `checkpoint_dir`, but go further down
# into its subdirectory for the single RLModule.
# See the preceding section on "RLlib's components tree" for the various elements in the RLlib
# components tree.
rl_module_checkpoint_dir = Path(checkpoint_dir) / "learner_group" / "learner" / "rl_module" / "default_policy"

# Create the actual RLModule.
rl_module = RLModule.from_checkpoint(rl_module_checkpoint_dir)

# Run a forward pass to compute action logits. Use a dummy Pendulum observation
# tensor (3d) and add a batch dim (B=1).
results = rl_module.forward_inference(
{"obs": torch.tensor([0.5, 0.25, -0.3]).unsqueeze(0).float()}
)
print(results)


See here for an `example on how to run policy inference after training <https://github.com/ray-project/ray/blob/master/rllib/examples/inference/policy_inference_after_training.py>`__
and another `example on how to run policy inference, but with an LSTM <https://github.com/ray-project/ray/blob/master/rllib/examples/inference/policy_inference_after_training_w_connector.py>`__.


.. hint::

A few things to note:
* The checkpoint saves the entire information on how to recreate a new object, identical to the original one.
*


Restoring state from a checkpoint with `restore_from_path`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Restoring from a checkpoint with `restore_from_path`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Expand Down Expand Up @@ -383,53 +460,6 @@ to a new function that maps our five agents to only the two remaining policies:
"agent0" and "agent1" to "pol0", all other agents to "pol1".


Model Exports
-------------

Apart from creating checkpoints for your RLlib objects (such as an RLlib
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` or
an individual RLlib :py:class:`~ray.rllib.policy.policy.Policy`), it may also be very useful
to only export your NN models in their native (non-RLlib dependent) format, for example
as a keras- or PyTorch model.
You could then use the trained NN models outside
of RLlib, e.g. for serving purposes in your production environments.

How do I export my NN Model?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are several ways of creating Keras- or PyTorch native model "exports".

Here is the example code that illustrates these:

.. literalinclude:: doc_code/checkpointing.py
:language: python
:start-after: __export-models-begin__
:end-before: __export-models-end__

We can now export the Keras NN model (that our PPOTF1Policy inside the PPO Algorithm uses)
to disk ...

1) Using the Policy object:

.. literalinclude:: doc_code/checkpointing.py
:language: python
:start-after: __export-models-1-begin__
:end-before: __export-models-1-end__

2) Via the Policy's checkpointing method:

.. literalinclude:: doc_code/checkpointing.py
:language: python
:start-after: __export-models-2-begin__
:end-before: __export-models-2-end__

3) Via the Algorithm (Policy) checkpoint:

.. literalinclude:: doc_code/checkpointing.py
:language: python
:start-after: __export-models-3-begin__
:end-before: __export-models-3-end__


And what about exporting my NN Models in ONNX format?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
92 changes: 0 additions & 92 deletions doc/source/rllib/doc_code/checkpointing.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,95 +176,3 @@ def new_policy_mapping_fn(agent_id, episode, worker, **kwargs):

# __restore-algorithm-from-checkpoint-with-fewer-policies-end__


# __export-models-begin__
from ray.rllib.algorithms.ppo import PPOConfig

# Create a new Algorithm (which contains a Policy, which contains a NN Model).
# Switch on for native models to be included in the Policy checkpoints.
ppo_config = (
PPOConfig()
.api_stack(
enable_rl_module_and_learner=False,
enable_env_runner_and_connector_v2=False,
)
.environment("Pendulum-v1")
.checkpointing(export_native_model_files=True)
)

# The default framework is TensorFlow, but if you would like to do this example with
# PyTorch, uncomment the following line of code:
# ppo_config.framework("torch")

# Create the Algorithm and train one iteration.
ppo = ppo_config.build()
ppo.train()

# Get the underlying PPOTF1Policy (or PPOTorchPolicy) object.
ppo_policy = ppo.get_policy()

# __export-models-end__

# Export the Keras NN model (that our PPOTF1Policy inside the PPO Algorithm uses)
# to disk ...

# 1) .. using the Policy object:

# __export-models-1-begin__
ppo_policy.export_model("/tmp/my_nn_model")
# .. check /tmp/my_nn_model/ for the model files.

# For Keras You should be able to recover the model via:
# keras_model = tf.saved_model.load("/tmp/my_nn_model/")
# And pass in a Pendulum-v1 observation:
# results = keras_model(tf.convert_to_tensor(
# np.array([[0.0, 0.1, 0.2]]), dtype=np.float32)
# )

# For PyTorch, do:
# pytorch_model = torch.load("/tmp/my_nn_model/model.pt")
# results = pytorch_model(
# input_dict={
# "obs": torch.from_numpy(np.array([[0.0, 0.1, 0.2]], dtype=np.float32)),
# },
# state=[torch.tensor(0)], # dummy value
# seq_lens=torch.tensor(0), # dummy value
# )

# __export-models-1-end__

# 2) .. via the Policy's checkpointing method:

# __export-models-2-begin__
checkpoint_dir = ppo_policy.export_checkpoint("tmp/ppo_policy")
# .. check /tmp/ppo_policy/model/ for the model files.
# You should be able to recover the keras model via:
# keras_model = tf.saved_model.load("/tmp/ppo_policy/model")
# And pass in a Pendulum-v1 observation:
# results = keras_model(tf.convert_to_tensor(
# np.array([[0.0, 0.1, 0.2]]), dtype=np.float32)
# )

# __export-models-2-end__

# 3) .. via the Algorithm (Policy) checkpoint:

# __export-models-3-begin__
checkpoint_dir = ppo.save().checkpoint.path
# .. check `checkpoint_dir` for the Algorithm checkpoint files.
# For keras you should be able to recover the model via:
# keras_model = tf.saved_model.load(checkpoint_dir + "/policies/default_policy/model/")
# And pass in a Pendulum-v1 observation
# results = keras_model(tf.convert_to_tensor(
# np.array([[0.0, 0.1, 0.2]]), dtype=np.float32)
# )

# __export-models-3-end__


# __export-models-as-onnx-begin__
# Using the same Policy object, we can also export our NN Model in the ONNX format:
ppo_policy.export_model("/tmp/my_nn_model", onnx=False)

# __export-models-as-onnx-end__