Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Docs do-over (new API stack): Remove all old API stack package_ref docs. #49518

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
wip
  • Loading branch information
sven1977 committed Dec 27, 2024
commit e83acfd16138480a5986dee3d2f4a809336e60e2
296 changes: 133 additions & 163 deletions doc/source/rllib/algorithm-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,240 +4,210 @@

.. _rllib-algo-configuration-docs:

Configuring RLlib
=================
AlgorithmConfig API
===================

You can configure RLlib algorithms in a modular fashion by working with so-called
`AlgorithmConfig` objects.
In essence, you first create a `config = AlgorithmConfig()` object and then call methods
on it to set the desired configuration options. For example:

.. testcode::

from ray.rllib.algorithms.algorithm_config import AlgorithmConfig

# Create an `AlgorithmConfig` instance.
config = AlgorithmConfig()
# Change the learning rate.
config.training(lr=0.0005)


Coding conventions
------------------

RLlib uses the following, more compact notation in all parts of the code for creating a config and then
modifying it through calling its methods



Generic config settings
-----------------------
You can configure your RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
in a type-safe fashion by working with the :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` API.

In essence, you first create an instance of :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
and then call some of its methods to set various configuration options.

RLlib uses the following, `black <https://github.com/psf/black>`__ compliant notation
in all parts of the code. Note that you can chain together more than one method call, including
the constructor:

.. testcode::

from ray.rllib.algorithms.algorithm_config import AlgorithmConfig

Algorithm specific settings
---------------------------
config = (
# Create an `AlgorithmConfig` instance.
AlgorithmConfig()
# Change the learning rate.
.training(lr=0.0005)
# Change the number of Learner actors.
.learners(num_learners=2)
)

Each RLlib algorithm has its own config class that inherits from `AlgorithmConfig`.
For instance, to create a `PPO` algorithm, you start with a `PPOConfig` object, to work
with a `DQN` algorithm, you start with a `DQNConfig` object, etc.

.. note::

Each algorithm has its specific settings, but most configuration options are shared.
We discuss the common options below, and refer to
:ref:`the RLlib algorithms guide <rllib-algorithms-doc>` for algorithm-specific
properties.
Algorithms differ mostly in their `training` settings.
.. hint::

Below you find the basic signature of the `AlgorithmConfig` class, as well as some
advanced usage examples:
For value checking and type-safety reasons, you should never set attributes in your
:py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
directly, but always go through the proper methods:

.. autoclass:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig
:noindex:
.. testcode::

As RLlib algorithms are fairly complex, they come with many configuration options.
To make things easier, the common properties of algorithms are naturally grouped into
the following categories:
# WRONG!
config.env = "CartPole-v1" # <- don't set attributes directly

- :ref:`training options <rllib-config-train>`,
- :ref:`environment options <rllib-config-env>`,
- :ref:`deep learning framework options <rllib-config-framework>`,
- :ref:`env runner options <rllib-config-rollouts>`,
- :ref:`evaluation options <rllib-config-evaluation>`,
- :ref:`options for training with offline data <rllib-config-offline_data>`,
- :ref:`options for training multiple agents <rllib-config-multi_agent>`,
- :ref:`reporting options <rllib-config-reporting>`,
- :ref:`options for saving and restoring checkpoints <rllib-config-checkpointing>`,
- :ref:`debugging options <rllib-config-debugging>`,
- :ref:`options for adding callbacks to algorithms <rllib-config-callbacks>`,
- :ref:`Resource options <rllib-config-resources>`
- :ref:`and options for experimental features <rllib-config-experimental>`
# CORRECT!
config.environment(env="CartPole-v1") # call the proper method

Let's discuss each category one by one, starting with training options.

.. _rllib-config-train:
Algorithm specific config classes
---------------------------------

Specifying Training Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Normally, you should pick the specific :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
subclass that matches the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
you would like to run your learning experiments with. For example, if you would like to
use ``IMPALA`` as your algorithm, you should import its specific config class:

.. note::
.. testcode::

For instance, a `DQNConfig` takes a `double_q` `training` argument to specify whether
to use a double-Q DQN, whereas in a `PPOConfig` this does not make sense.
from ray.rllib.algorithms.impala import IMPALAConfig

For individual algorithms, this is probably the most relevant configuration group,
as this is where all the algorithm-specific options go.
But the base configuration for `training` of an `AlgorithmConfig` is actually quite small:
config = (
# Create an `IMPALAConfig` instance.
IMPALAConfig()
# Change the learning rate.
.training(lr=0.0004)
)

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training
:noindex:
You can build the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` directly from the
config object through calling the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_algo` method:

.. _rllib-config-env:
.. testcode::

Specifying Environments
~~~~~~~~~~~~~~~~~~~~~~~
impala = config.build_algo()

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.environment
:noindex:

The config object stored inside any built :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` instance
is a copy of your original config. Hence, you can further alter your original config object and
build another instance of the algo without affecting the previously built one:

.. _rllib-config-framework:

Specifying Framework Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. testcode::

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.framework
:noindex:
# Further alter the config without affecting the previously built IMPALA ...
config.env_runners(num_env_runners=4)
# ... and build another algo from it.
another_impala = config.build_algo()


.. _rllib-config-rollouts:
If you are working with `Ray Tune <https://docs.ray.io/en/latest/tune/index.html>`__,
pass your :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
instance into the constructor of the :py:class:`~ray.tune.tuner.Tuner`:

Specifying Rollout Workers
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners
:noindex:
from ray import tune

tuner = tune.Tuner(
"IMPALA",
param_space=config, # <- your RLlib AlgorithmConfig object
..
)
# Run the experiment with Ray Tune.
results = tuner.fit()

.. _rllib-config-evaluation:

Specifying Evaluation Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.evaluation
:noindex:
Generic config settings
-----------------------

Most config settings are generic and apply to all of RLlib's
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` classes.

.. _rllib-config-offline_data:

Specifying Offline Data Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RL Environment
~~~~~~~~~~~~~~

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data
:noindex:


.. _rllib-config-multi_agent:
Learning rate `lr`
~~~~~~~~~~~~~~~~~~

Specifying Multi-Agent Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Set the learning rate for updating your models through the ``lr`` arg to the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training` method:

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.multi_agent
:noindex:
.. testcode::

config.training(lr=0.0001)

.. _rllib-config-reporting:

Specifying Reporting Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Train batch size
~~~~~~~~~~~~~~~~

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.reporting
:noindex:
Set the train batch size, per Learner actor,
through the ``train_batch_size_per_learner`` arg to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training`
method:

.. testcode::

.. _rllib-config-checkpointing:
config.training(train_batch_size_per_learner=256)

Specifying Checkpointing Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.checkpointing
:noindex:
Discount factor `gamma`
~~~~~~~~~~~~~~~~~~~~~~~

Set the `RL discount factor <https://www.envisioning.io/vocab/discount-factor?utm_source=chatgpt.com>`__
through the ``gamma`` arg to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training`
method:

.. _rllib-config-debugging:
.. testcode::

Specifying Debugging Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
config.training(gamma=0.995)

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.debugging
:noindex:


.. _rllib-config-callbacks:
num_learners
num_env_runners
num_envs_per_env_runner

Specifying Callback Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~
explore

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.callbacks
:noindex:
rollout_fragment_length


.. _rllib-config-resources:
env
env_config

Specifying Resources
~~~~~~~~~~~~~~~~~~~~

You can control the degree of parallelism used by setting the ``num_env_runners``
hyperparameter for most algorithms. The Algorithm will construct that many
"remote worker" instances (`see RolloutWorker class <https://github.com/ray-project/ray/blob/master/rllib/evaluation/rollout_worker.py>`__)
that are constructed as ray.remote actors, plus exactly one "local worker", an ``EnvRunner`` object that isn't a
ray actor, but lives directly inside the Algorithm.
For most algorithms, learning updates are performed on the local worker and sample collection from
one or more environments is performed by the remote workers (in parallel).
For example, setting ``num_env_runners=0`` will only create the local worker, in which case both
sample collection and training will be done by the local worker.
On the other hand, setting ``num_env_runners=5`` will create the local worker (responsible for training updates)
and 5 remote workers (responsible for sample collection).
policies
policy_mapping_fn

Since learning is most of the time done on the local worker, it may help to provide one or more GPUs
to that worker via the ``num_gpus`` setting.
Similarly, you can control the resource allocation to remote workers with ``num_cpus_per_env_runner``, ``num_gpus_per_env_runner``, and ``custom_resources_per_env_runner``.

The number of GPUs can be fractional quantities (for example, 0.5) to allocate only a fraction
of a GPU. For example, with DQN you can pack five algorithms onto one GPU by setting
``num_gpus: 0.2``. See `this fractional GPU example here <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/fractional_gpus.py>`__
as well that also demonstrates how environments (running on the remote workers) that
require a GPU can benefit from the ``num_gpus_per_env_runner`` setting.

For synchronous algorithms like PPO and A2C, the driver and workers can make use of
the same GPU. To do this for an amount of ``n`` GPUS:
Algorithm specific settings
---------------------------

.. code-block:: python
Each RLlib algorithm has its own config class that inherits from `AlgorithmConfig`.
For instance, to create a `PPO` algorithm, you start with a `PPOConfig` object, to work
with a `DQN` algorithm, you start with a `DQNConfig` object, etc.

gpu_count = n
num_gpus = 0.0001 # Driver GPU
num_gpus_per_env_runner = (gpu_count - num_gpus) / num_env_runners
.. note::

.. Original image: https://docs.google.com/drawings/d/14QINFvx3grVyJyjAnjggOCEVN-Iq6pYVJ3jA2S6j8z0/edit?usp=sharing
.. image:: images/rllib-config.svg
Each algorithm has its specific settings, but most configuration options are shared.
We discuss the common options below, and refer to
:ref:`the RLlib algorithms guide <rllib-algorithms-doc>` for algorithm-specific
properties.
Algorithms differ mostly in their `training` settings.

If you specify ``num_gpus`` and your machine does not have the required number of GPUs
available, a RuntimeError will be thrown by the respective worker. On the other hand,
if you set ``num_gpus=0``, your policies will be built solely on the CPU, even if
GPUs are available on the machine.
Below you find the basic signature of the `AlgorithmConfig` class, as well as some
advanced usage examples:

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.resources
.. autoclass:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig
:noindex:

As RLlib algorithms are fairly complex, they come with many configuration options.
To make things easier, the common properties of algorithms are naturally grouped into
the following categories:

.. _rllib-config-experimental:

Specifying Experimental Features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automethod:: ray.rllib.algorithms.algorithm_config.AlgorithmConfig.experimental
:noindex:
- :ref:`training options <rllib-config-train>`,
- :ref:`environment options <rllib-config-env>`,
- :ref:`deep learning framework options <rllib-config-framework>`,
- :ref:`env runner options <rllib-config-env-runners>`,
- :ref:`evaluation options <rllib-config-evaluation>`,
- :ref:`options for training with offline data <rllib-config-offline_data>`,
- :ref:`options for training multiple agents <rllib-config-multi_agent>`,
- :ref:`reporting options <rllib-config-reporting>`,
- :ref:`options for saving and restoring checkpoints <rllib-config-checkpointing>`,
- :ref:`debugging options <rllib-config-debugging>`,
- :ref:`options for adding callbacks to algorithms <rllib-config-callbacks>`,
- :ref:`Resource options <rllib-config-resources>`
- :ref:`and options for experimental features <rllib-config-experimental>`

Let's discuss each category one by one, starting with training options.
Loading