This repository has been archived by the owner on Oct 7, 2024. It is now read-only.
Tags: google-deepmind/bsuite
Tags
Calculate best episode using full episode return in cartpole_swingup. Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode. Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug). PiperOrigin-RevId: 308033113 Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b
Re-organize baselines into subdirectories according to their provenan… …ce/libraries used. - tf: TensorFlow 2/Sonnet 2/TRFL-based agents. - jax: JAX/Haiku/rlax-based agents. - third_party: Agents created by third parties (not DeepMind). Also adopt more standard naming practice within each agent folder (agent.py). PiperOrigin-RevId: 305674544 Change-Id: I3d4f076fb96d2e0250cfbb3f1adf163ce6932e97
Re-organize baselines into subdirectories according to their provenan… …ce/libraries used. - tf: TensorFlow 2/Sonnet 2/TRFL-based agents. - jax: JAX/Haiku/rlax-based agents. - third_party: Agents created by third parties (not DeepMind). Also adopt more standard naming practice within each agent folder (agent.py). PiperOrigin-RevId: 305674544 Change-Id: I3d4f076fb96d2e0250cfbb3f1adf163ce6932e97