Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Docs do-over (new API stack): Re-write checkpointing rst page. #49504

Merged
merged 45 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c90be66
wip
sven1977 Dec 23, 2024
0b3870c
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 27, 2024
e2a156c
wip
sven1977 Dec 27, 2024
e520678
wip
sven1977 Dec 27, 2024
3c05f4a
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 27, 2024
c6a18b2
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 28, 2024
266f894
wip
sven1977 Dec 28, 2024
5401b6d
wip
sven1977 Dec 29, 2024
6edef91
wip
sven1977 Dec 30, 2024
fd66605
merge
sven1977 Dec 30, 2024
f939ee7
wip
sven1977 Dec 30, 2024
0a3e79e
fix
sven1977 Dec 30, 2024
dadd964
wip
sven1977 Dec 30, 2024
896c1a2
wip
sven1977 Dec 30, 2024
d76edfd
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 30, 2024
44acf78
wip
sven1977 Dec 30, 2024
a8bdbac
wip
sven1977 Dec 30, 2024
fdddd9f
wip
sven1977 Dec 30, 2024
e8fec77
Merge branch 'checkpointing_enhancements_msgpack_and_separation_of_st…
sven1977 Dec 30, 2024
a42ff1e
wip
sven1977 Dec 30, 2024
53967bc
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Dec 31, 2024
db90920
wip
sven1977 Dec 31, 2024
f19b340
merge
sven1977 Jan 2, 2025
46c1388
wip
sven1977 Jan 2, 2025
5554e3b
wip
sven1977 Jan 2, 2025
0598b45
wip
sven1977 Jan 3, 2025
e210dbe
wip
sven1977 Jan 3, 2025
972f355
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 3, 2025
78a344c
fix
sven1977 Jan 3, 2025
5bf0ef6
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 6, 2025
2f04106
wip
sven1977 Jan 6, 2025
2f9925f
Apply suggestions from code review
sven1977 Jan 6, 2025
5333255
Merge branch 'docs_redo_checkpointing' of https://github.com/sven1977…
sven1977 Jan 6, 2025
d6e186f
fixes
sven1977 Jan 6, 2025
da727c3
wip
sven1977 Jan 6, 2025
00f5a4b
fix
sven1977 Jan 6, 2025
ecf631d
fix
sven1977 Jan 6, 2025
7979dbf
fixes
sven1977 Jan 6, 2025
62736f9
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 7, 2025
64d47bc
fixes
sven1977 Jan 7, 2025
d9b769c
wip
sven1977 Jan 7, 2025
f63afc3
wip
sven1977 Jan 7, 2025
dcd4663
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 7, 2025
b59401b
Merge branch 'master' of https://github.com/ray-project/ray into docs…
sven1977 Jan 7, 2025
998a544
wip
sven1977 Jan 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .vale/styles/config/vocabularies/RLlib/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
[Aa]utoscal(e|ing)
boolean
[Cc]allables?
[Cc]heckpoints?(ing)?
[Cc]heckpointable
classmethods?
coeff
config
(DQN|dqn)
Expand All @@ -27,6 +30,7 @@ RLModules?
rollout
(SAC|sac)
SGD
[Ss]ubcomponents?
[Tt]ensor[Ff]low
timesteps?
vectorizes?
467 changes: 467 additions & 0 deletions doc/source/rllib/checkpoints.rst

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,65 +1,5 @@
# flake8: noqa

# __create-algo-checkpoint-begin__
# Create a PPO algorithm object using a config object ..
from ray.rllib.algorithms.ppo import PPOConfig

my_ppo_config = (
PPOConfig()
.api_stack(
enable_rl_module_and_learner=False,
enable_env_runner_and_connector_v2=False,
)
.environment("CartPole-v1")
)
my_ppo = my_ppo_config.build()

# .. train one iteration ..
my_ppo.train()
# .. and call `save()` to create a checkpoint.
save_result = my_ppo.save()
path_to_checkpoint = save_result.checkpoint.path
print(
"An Algorithm checkpoint has been created inside directory: "
f"'{path_to_checkpoint}'."
)

# Let's terminate the algo for demonstration purposes.
my_ppo.stop()
# Doing this will lead to an error.
# my_ppo.train()
# __create-algo-checkpoint-end__


# __restore-from-algo-checkpoint-begin__
from ray.rllib.algorithms.algorithm import Algorithm

# Use the Algorithm's `from_checkpoint` utility to get a new algo instance
# that has the exact same state as the old one, from which the checkpoint was
# created in the first place:
my_new_ppo = Algorithm.from_checkpoint(path_to_checkpoint)

# Continue training.
my_new_ppo.train()

# __restore-from-algo-checkpoint-end__

my_new_ppo.stop()

# __restore-from-algo-checkpoint-2-begin__
# Re-build a fresh algorithm.
my_new_ppo = my_ppo_config.build()

# Restore the old (checkpointed) state.
my_new_ppo.restore(save_result)

# Continue training.
my_new_ppo.train()

# __restore-from-algo-checkpoint-2-end__

my_new_ppo.stop()

# __multi-agent-checkpoints-begin__
import os

Expand Down Expand Up @@ -235,95 +175,3 @@ def new_policy_mapping_fn(agent_id, episode, worker, **kwargs):
algo_w_2_policies.stop()

# __restore-algorithm-from-checkpoint-with-fewer-policies-end__


# __export-models-begin__
from ray.rllib.algorithms.ppo import PPOConfig

# Create a new Algorithm (which contains a Policy, which contains a NN Model).
# Switch on for native models to be included in the Policy checkpoints.
ppo_config = (
PPOConfig()
.api_stack(
enable_rl_module_and_learner=False,
enable_env_runner_and_connector_v2=False,
)
.environment("Pendulum-v1")
.checkpointing(export_native_model_files=True)
)

# The default framework is TensorFlow, but if you would like to do this example with
# PyTorch, uncomment the following line of code:
# ppo_config.framework("torch")

# Create the Algorithm and train one iteration.
ppo = ppo_config.build()
ppo.train()

# Get the underlying PPOTF1Policy (or PPOTorchPolicy) object.
ppo_policy = ppo.get_policy()

# __export-models-end__

# Export the Keras NN model (that our PPOTF1Policy inside the PPO Algorithm uses)
# to disk ...

# 1) .. using the Policy object:

# __export-models-1-begin__
ppo_policy.export_model("/tmp/my_nn_model")
# .. check /tmp/my_nn_model/ for the model files.

# For Keras You should be able to recover the model via:
# keras_model = tf.saved_model.load("/tmp/my_nn_model/")
# And pass in a Pendulum-v1 observation:
# results = keras_model(tf.convert_to_tensor(
# np.array([[0.0, 0.1, 0.2]]), dtype=np.float32)
# )

# For PyTorch, do:
# pytorch_model = torch.load("/tmp/my_nn_model/model.pt")
# results = pytorch_model(
# input_dict={
# "obs": torch.from_numpy(np.array([[0.0, 0.1, 0.2]], dtype=np.float32)),
# },
# state=[torch.tensor(0)], # dummy value
# seq_lens=torch.tensor(0), # dummy value
# )

# __export-models-1-end__

# 2) .. via the Policy's checkpointing method:

# __export-models-2-begin__
checkpoint_dir = ppo_policy.export_checkpoint("tmp/ppo_policy")
# .. check /tmp/ppo_policy/model/ for the model files.
# You should be able to recover the keras model via:
# keras_model = tf.saved_model.load("/tmp/ppo_policy/model")
# And pass in a Pendulum-v1 observation:
# results = keras_model(tf.convert_to_tensor(
# np.array([[0.0, 0.1, 0.2]]), dtype=np.float32)
# )

# __export-models-2-end__

# 3) .. via the Algorithm (Policy) checkpoint:

# __export-models-3-begin__
checkpoint_dir = ppo.save().checkpoint.path
# .. check `checkpoint_dir` for the Algorithm checkpoint files.
# For keras you should be able to recover the model via:
# keras_model = tf.saved_model.load(checkpoint_dir + "/policies/default_policy/model/")
# And pass in a Pendulum-v1 observation
# results = keras_model(tf.convert_to_tensor(
# np.array([[0.0, 0.1, 0.2]]), dtype=np.float32)
# )

# __export-models-3-end__


# __export-models-as-onnx-begin__
# Using the same Policy object, we can also export our NN Model in the ONNX format:
ppo_policy.export_model("/tmp/my_nn_model", onnx=False)

# __export-models-as-onnx-end__
41 changes: 0 additions & 41 deletions doc/source/rllib/doc_code/checkpoints.py

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-callback.rst
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ The following example demonstrates how to implement a simple custom function wri
contents to disk from time to time.

You normally don't want to write the contents of buffers along with your
:ref:`Algorithm checkpoints <rllib-checkpointing-docs>`, so doing this less often, in a more
:ref:`Algorithm checkpoints <rllib-checkpoints-docs>`, so doing this less often, in a more
controlled fashion through a custom callback could be a good compromise.

.. testcode::
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-fault-tolerance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Fault Tolerance and Recovery Provided by Ray Tune
Ray Tune provides fault tolerance and recovery at the experiment trial level.

When using Ray Tune with RLlib, you can enable
:ref:`periodic checkpointing <rllib-checkpointing-docs>`,
:ref:`periodic checkpointing <rllib-checkpoints-docs>`,
which saves the state of the experiment to a user-specified persistent storage location.
If a trial fails, Ray Tune will automatically restart it from the latest
:ref:`checkpointed <tune-fault-tol>` state.
Expand Down
7 changes: 4 additions & 3 deletions doc/source/rllib/rllib-rlmodule.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,10 @@ The most direct way to construct your :py:class:`~ray.rllib.core.rl_module.rl_mo


.. note::
If you have a checkpoint from an `py:class:`~ray.rllib.algorithms.algorithm.Algorithm` or an individual
If you have a checkpoint of an `py:class:`~ray.rllib.algorithms.algorithm.Algorithm` or an individual
:py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`,
see :ref:`Checkpointing RLModules <rllib-checkpointing-rl-modules-docs>` for how to create the stored RLModule instance from disk.
see :ref:`Creating instances with from_checkpoint <rllib-checkpoints-from-checkpoint>` for how to recreate your
:py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` from disk.


Construction through RLModuleSpecs
Expand Down Expand Up @@ -715,7 +716,7 @@ model hyper-parameters:
would take turns updating the same shared encoder, which would lead to learning instabilities.


.. _rllib-checkpointing-rl-modules-docs:
.. _rllib-checkpoints-rl-modules-docs:

Checkpointing RLModules
-----------------------
Expand Down
Loading
Loading