-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rllib] Add timeout to filter synchronization #25959
[Rllib] Add timeout to filter synchronization #25959
Conversation
oh, I was gonna use connectors to implement this filters ... seems like extra complexity if they need to be synced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just the one not about int-> Optional float, then we can merge.
Hey @gjoliver , yeah filters may still have some features (synching) that we don't support with connectors yet. But I'm not too sure it's really that important. Even if you use something like MeanStdFilter and workers don't sync, you'd probably still be able to learn properly given that the distribution of states that each worker's trajectory covers is somewhat uniform between the workers. |
* master: (35 commits) [tune/structure] Refactor `suggest` into `search` package (ray-project#26074) Add back ray.state in deprecation wrapper; print stack trace on warning (ray-project#26086) Enable isort for base directory (ray-project#26085) [AIR] Add __init__.py to ray.air.callbacks (ray-project#26088) [Serve] [Docs] Create end-to-end documentation example for Serve REST API and CLI (ray-project#25936) [AIR] Remove unnecessary pandas from examples (ray-project#26009) [Datasets] [Hotfix] Update `ds.to_pandas()` limit error to reflect current limit API (ray-project#26081) [Serve] [Docs] Add Serve REST API Schema to Serve API Docs (ray-project#25786) [Core][Doc] remove cython section from advanced doc. ray-project#26062 [Core] Fix check failure from incorrect death cause (ray-project#26007) [hotfix] Fix linkcheck (ray-project#26070) [RLlib] Add timeout to filter synchronization. (ray-project#25959) [tune/structure] Introduce logger package (ray-project#26049) [RLlib] introduce serialization for our custom gym space types. (ray-project#25923) Fix unit test test_check_env.py and est_check_multi_agent.py. (ray-project#25993) [RLlib] Make QMix use the ReplayBufferAPI (ray-project#25560) [CI] deflake test_multi_node_3 by increasing its timeout [CI] Use BUILDKITE_JOB_ID for better navigation for flaky tracker (ray-project#26021) [AIR/Docs] Improve user guide gallery (ray-project#26016) 🎨 Update type annotations to include options in `ray.remote()` (ray-project#25999) ...
Why are these changes needed?
The filters that Rollout Workers apply to metrics are updated regularely as part of the training_step().
This operation includes a ray.get() that is lacking a timeout.
Without knowing the specifics of why this times out on my local machine when running IMPALA, I propose to set a timeout here so that we do not get stuck on this unnoticed.
Checks
scripts/format.sh
to lint the changes in this PR.