Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Example] A2C simplified example #1076

Merged
merged 6 commits into from
Apr 25, 2023
Merged

Conversation

albertbou92
Copy link
Contributor

@albertbou92 albertbou92 commented Apr 20, 2023

Description

Add a simplified version of the A2C code example, including some plot results.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 20, 2023
@vmoens vmoens added the new algo New algorithm request or PR label Apr 24, 2023
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

Comment on lines 126 to 142
with torch.no_grad():
test_env.eval()
actor.eval()
# Generate a complete episode
td_test = test_env.rollout(
policy=actor,
max_steps=10_000_000,
auto_reset=True,
auto_cast_to_device=True,
break_when_any_done=True,
).clone()
logger.log_scalar(
"reward_testing",
td_test["next"]["reward"].sum().item(),
collected_frames,
)
actor.train()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If we use the Recorder class we can do all of this I guess, but it's a bit of a black box so I'm ok with the explicit calls.
Maybe let's we use td_test["next", "reward"] when we can (same for all the key indexing in the script) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that in Recorder you need to specify a number of steps, and I wanted to record a single test episode, independently of the number of steps. Maybe recorder could accept a number of episodes to records instead of a number steps?

@vmoens vmoens merged commit 6c89a65 into pytorch:main Apr 25, 2023
@albertbou92 albertbou92 deleted the a2c_simplified_example branch January 18, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. new algo New algorithm request or PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants