-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated DataLoader, added a optional keyword argument rng. #1501
Conversation
it seems reasonable to do this |
in order to get a reproducible result, without using the global rng.
Do we gain much by storing the RNG? Could that be avoided? |
I feel like you can't avoid storing it, because it is used in |
I think I'm wondering whether we are better off supporting custom iterators Instead and have these kinds of details handled by the that function instead |
i don't know what you mean, DataLoader is a custom iterator already |
Let me do a rough draft of what I have in mind tonight |
This seems like a small, simple change that would be nice to get in before the next release. It's breaking right? Would be a shame to wait until 0.13 to use user-specified RNGs. |
doesn't look breaking, just another keyword arg with the default argument which was implicitly used already |
After thinking about this a bit more, why not direct users to something like |
I think we should hold-off on pushing MLDataPattern.jl too much until it has been refactored (i.e. PS: wow that sentence was a mouthful |
@@ -74,4 +76,7 @@ | |||
d = DataLoader((X, Y)) | |||
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1)) | |||
@test norm(θ .- 1) < 1e-10 | |||
|
|||
# specify the rng | |||
d = map(identity, DataLoader(X, batchsize=2; shuffle=true, rng=Random.seed!(Random.default_rng(), 5))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need better tests to ensure that the expected behaviour is happening. That the API works is pretty good, but we need to ensure that the functionality works as expected.
I would want the whole DataLoader to be extensible, so we can use it for things like training GANs/ NODEs, tuples of arguments (unlike just working on arrays), sending data to different workers/ devices asynchronously etc. It would be cool to have a package to take care of the heavy lifting here, but failing that, I would want these features before moving ahead with MLData* packages. This should be its own issue by the looks of it. |
bors r+ |
@CarloLucibello there's pending comments about lacking tests and this is an api change. Further, regardless of merging this or #1508, the comments need to be addressed. We were not in a rush for merging this, anyway. |
Build succeeded: |
in order to get a reproducible result, without using the global rng.