-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Example] Distributed Replay Buffer Prototype Example Implementation #615
Conversation
@vmoens I've made ReplayBufferNode subclass TensorDictReplayBuffer now so it fits in with the rest of the object hierarchy and can be used entirely like any other Replay Buffer. |
this is soooo cool For instance, this receives memmap tensors from one process and those tensors are actually of memmap type. But |
Note for future contributions: it's best if you don't develop on your main branch but if you branch out on your forked repo :) |
Sure thing - would you prefer I create a new branch for this PR as well?
|
Codecov Report
@@ Coverage Diff @@
## main #615 +/- ##
==========================================
- Coverage 87.82% 87.73% -0.09%
==========================================
Files 125 126 +1
Lines 24280 24371 +91
==========================================
+ Hits 21324 21382 +58
- Misses 2956 2989 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Should we comment the example, or write a separate markdown file in the example directory to explain what this is about?
Personally I'd be in favour of commenting the code using this syntax which will allow us to port that to the doc later on.
Do you think we can test the distributed replay buffer in the CI? It would be nice to cover that there.
Ah yes I'll add some comments in the example. I'll investigate how best to test the distributed buffer now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! Thanks a million for this!
Description
Prototype example distributed replay buffer implementation using LazyMemmapStorage and TensorDictReplayBuffer, with nodes communicating using torch.rpc. The implementation allows for 1 Trainer Node, 1 Replay Buffer Node, and N >= 1 Data Collector Nodes.
Motivation and Context
This investigative example illustrates some patterns with which we can implement distributed RL algorithms with replay buffers using the torch.rl framework. It also helps us understand the abstractions and ideas that may be missing from the framework or may need adapting to make writing distributed RL algorithms natural and performant.
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!