-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] External Env support for new API stack: New example script and custom tcp-capable EnvRunner. #49033
[RLlib] External Env support for new API stack: New example script and custom tcp-capable EnvRunner. #49033
Conversation
…rnal_env_support_new_api_stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of questions in regard to the basic design. Great PR!!
- `Env Rendering and Recording <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/env_rendering_and_recording.py>`__: | ||
Illustrates environment rendering and recording setups within RLlib, capturing visual outputs for later review (ex. on WandB), which is essential | ||
for tracking agent behavior in training. | ||
|
||
- `Env with Protobuf Observations <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/env_with_protobuf_observations.py>`__: | ||
- `Env with Protobuf Observations <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/env_w_protobuf_observations.py>`__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome that we have now this example also in new API stack! Great work!
# TODO (sven): We should ideally do this in the LearnerConnector (separation of | ||
# concerns: Only do things on the EnvRunners that are required for computing | ||
# actions, do NOT do anything on the EnvRunners that's only required for a | ||
# training update). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we already moved that from EnvRunner
s into a connector?! Yes, at best it would resit in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in a connector, but in the env-to-module connector :D on the EnvRunner side, where it should ideally be run on the learner side.
@@ -177,7 +177,11 @@ def forward(self, batch: Dict[str, Any], **kwargs) -> Dict[str, Any]: | |||
training sample collection (w/ exploration behavior). | |||
`_forward_train()` to define the forward pass prior to loss computation. | |||
""" | |||
return self.forward_train(batch, **kwargs) | |||
# TODO (sven): Experimental to make ONNX exported models work. | |||
if self.config.inference_only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think about how inference-only should be integrated such that ONNX works well. Maybe it needs a different integration in the modules.
# will not have an RLModule, but might still be usable with random actions. | ||
except NotImplementedError: | ||
self.module = None | ||
self.make_module() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, now env runners work it similarly.
|
||
This implementation assumes: | ||
- Only one external client ever connects to this env runner. | ||
- The external client performs inference locally through an ONNX model. Thus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to define an EnvRunner
that can execute an external environment inside a thread - so users just define the EnvExecutor
that wraps it and defines protocols to start the environment and communicate with it (sending in actions and receiving observations and rewards) and the EnvRunner
can then execute. It would keep the RLModule
like the other EnvRunner
classes we have. Do we have any cases where this would not work? In my C++ external environment this does work quite well. What would be more pro and contras?
self.client_socket, | ||
{ | ||
"type": MessageTypes.SET_CONFIG.name, | ||
"env_steps_per_sample": self.config.get_rollout_fragment_length( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, so per sample each external environment does only sample enough for the next batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This is purely for PPO for now. For IMPALA/APPO or offline algos, this doesn't really matter. The most difficult thing is making this work for PPO, b/c in this purely on-policy case, client and server have to meet exactly in the middle, at the handover of the samples and the receiving of the new weights.
"env_steps_per_sample": self.config.get_rollout_fragment_length( | ||
worker_index=self.worker_index | ||
), | ||
"force_on_policy": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, it could be off-policy if a user wants it. Cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, then the protocol will change slightly, but not much. Clients can send data any time they want and - from time to time - get state/weights back on some of these requests.
client_thread.start() | ||
|
||
# Query the EnvRunner. | ||
episodes = env_runner.sample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! But question: In case I want to run with tune I could simply pass in: TcpClientInferenceEnvRunner
into the env_runner_class
attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this code. It was an early testing leftover.
collects episodes of experiences, and sends these (in bulk) back to RLlib for training. | ||
Also, from time to time, the updated model weights have to be sent from RLlib (server) | ||
back to the connected clients. | ||
Note that RLlib's new API stack does not yet support individual action requests, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this? Only because it is built in the TcpEnvRunner
correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this protocol (remote action inference on the RLlib server) is not supported yet. We need just another EnvRunner for that or build this logic into the existing one.
) | ||
.env_runners( | ||
# Point RLlib to the custom EnvRunner to be used here. | ||
env_runner_cls=TcpClientInferenceEnvRunner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, question answered :)
Signed-off-by: Sven Mika <sven@anyscale.io>
…d custom tcp-capable EnvRunner. (ray-project#49033)
…d custom tcp-capable EnvRunner. (ray-project#49033) Signed-off-by: Connor Sanders <connor@elastiflow.com>
…d custom tcp-capable EnvRunner. (ray-project#49033) Signed-off-by: hjiang <dentinyhao@gmail.com>
…d custom tcp-capable EnvRunner. (ray-project#49033) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
External Env support for new API stack:
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.