[Example] A2C simplified example #1076

albertbou92 · 2023-04-20T19:12:08Z

Description

Add a simplified version of the A2C code example, including some plot results.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

LGTM thanks!

vmoens · 2023-04-24T16:24:46Z

examples/a2c/a2c.py

+            with torch.no_grad():
+                test_env.eval()
+                actor.eval()
+                # Generate a complete episode
+                td_test = test_env.rollout(
+                    policy=actor,
+                    max_steps=10_000_000,
+                    auto_reset=True,
+                    auto_cast_to_device=True,
+                    break_when_any_done=True,
+                ).clone()
+                logger.log_scalar(
+                    "reward_testing",
+                    td_test["next"]["reward"].sum().item(),
+                    collected_frames,
+                )
+                actor.train()


Nit: If we use the Recorder class we can do all of this I guess, but it's a bit of a black box so I'm ok with the explicit calls.
Maybe let's we use td_test["next", "reward"] when we can (same for all the key indexing in the script) :)

I saw that in Recorder you need to specify a number of steps, and I wanted to record a single test episode, independently of the number of steps. Maybe recorder could accept a number of episodes to records instead of a number steps?

albertbou92 added 3 commits April 20, 2023 12:21

adapt script and tests

650477e

adapt script and tests

3d436c7

adapt script and tests

5016434

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 20, 2023

albertbou92 added 2 commits April 20, 2023 21:16

format

5d7282b

fix

070c374

vmoens added the new algo New algorithm request or PR label Apr 24, 2023

vmoens approved these changes Apr 24, 2023

View reviewed changes

vmoens reviewed Apr 24, 2023

View reviewed changes

td indexing fix

8d31816

vmoens approved these changes Apr 25, 2023

View reviewed changes

vmoens mentioned this pull request Apr 25, 2023

[Feature] Allow recorders to collect a number of episodes, not only steps #1088

Open

vmoens merged commit 6c89a65 into pytorch:main Apr 25, 2023

albertbou92 deleted the a2c_simplified_example branch January 18, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] A2C simplified example #1076

[Example] A2C simplified example #1076

albertbou92 commented Apr 20, 2023 •

edited

Loading

vmoens left a comment

vmoens Apr 24, 2023

albertbou92 Apr 25, 2023

[Example] A2C simplified example #1076

[Example] A2C simplified example #1076

Conversation

albertbou92 commented Apr 20, 2023 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

vmoens left a comment

Choose a reason for hiding this comment

vmoens Apr 24, 2023

Choose a reason for hiding this comment

albertbou92 Apr 25, 2023

Choose a reason for hiding this comment

albertbou92 commented Apr 20, 2023 •

edited

Loading