-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Activate APPO cont. actions release- and CI tests (HalfCheetah-v1 and Pendulum-v1 new in tuned_examples
).
#49068
[RLlib] Activate APPO cont. actions release- and CI tests (HalfCheetah-v1 and Pendulum-v1 new in tuned_examples
).
#49068
Conversation
@@ -137,9 +137,9 @@ click the dropdowns below: | |||
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+ | |||
| **High-throughput Architectures** | | |||
+-------------------------------------------------------------------------+----------------+---------------+-------------+------------+-------------+------------------------+ | |||
| :ref:`IMPALA (Importance Weighted Actor-Learner Architecture) <impala>` | |single_agent| | |multi_agent| | |discr_act| | | |multi_gpu| | |multi_node_multi_gpu| | | |||
| :ref:`APPO (Asynchronous Proximal Policy Optimization) <appo>` | |single_agent| | |multi_agent| | |discr_act| | |cont_act| | |multi_gpu| | |multi_node_multi_gpu| | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flipped this to place APPO more prominently than IMPALA.
@@ -30,9 +30,9 @@ as well as multi-GPU training on multi-node (GPU) clusters when using the `Anysc | |||
+-----------------------------------------------------------------------------+------------------------------+------------------------------------+--------------------------------+ | |||
| **High-throughput on- and off policy** | | |||
+-----------------------------------------------------------------------------+------------------------------+------------------------------------+--------------------------------+ | |||
| :ref:`IMPALA (Importance Weighted Actor-Learner Architecture) <impala>` | |single_agent| |multi_agent| | |multi_gpu| |multi_node_multi_gpu| | |discr_actions| | | |||
| :ref:`APPO (Asynchronous Proximal Policy Optimization) <appo>` | |single_agent| |multi_agent| | |multi_gpu| |multi_node_multi_gpu| | |cont_actions| |discr_actions| | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -2552,6 +2552,37 @@ | |||
cluster: | |||
cluster_compute: 2gpus_64cpus_gce.yaml | |||
|
|||
- name: rllib_learning_tests_halfcheetah_appo_torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new HalfCheetah APPO release test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Awesome that we are now having this fixed and market continuous acitons. Thanks for the hard work @sven1977 !!
RLlib doesn't always synch back the weights to the EnvRunners right after a new model version is available. | ||
To account for the EnvRunners being off-policy, APPO uses a procedure called v-trace, | ||
`described in the IMPALA paper <https://arxiv.org/abs/1802.01561>`__. | ||
APPO scales out on both axes, supporting multiple EnvRunners for sample collection and multiple GPU- or CPU-based Learners |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A monster :)
@@ -2552,6 +2552,37 @@ | |||
cluster: | |||
cluster_compute: 2gpus_64cpus_gce.yaml | |||
|
|||
- name: rllib_learning_tests_halfcheetah_appo_torch | |||
group: RLlib tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to use this in the future for elaborate algorithm tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean?
This will run nightly now under the release pipeline, so it'll be covered and we won't miss if we break stuff.
.env_runners( | ||
num_envs_per_env_runner=20, | ||
) | ||
.learners(num_learners=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't a single remote learner inefficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CPU-only, it's actually better :p
For 1 GPU, num_learners=0 is better. We have to do some more exploration as to why exactly for 1 CPU, the num_learners=1 setting is better. I'm guessing it has to do with the Learner worker then sharing the same thread/process as the Algo, which could slow down things. For the GPU setup, this is NOT a problem as CUDA is async anyways (can do the forward/backward passes parallel to the CPU).
…h-v1 and Pendulum-v1 new in `tuned_examples`). (ray-project#49068) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Activate APPO cont. actions release- and CI tests (HalfCheetah-v1 and Pendulum-v1 new in
tuned_examples
).Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.