Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated DataLoader, added a optional keyword argument rng. #1501

Merged
merged 1 commit into from
Feb 25, 2021
Merged

updated DataLoader, added a optional keyword argument rng. #1501

merged 1 commit into from
Feb 25, 2021

Conversation

norci
Copy link
Contributor

@norci norci commented Feb 5, 2021

in order to get a reproducible result, without using the global rng.

@CarloLucibello
Copy link
Member

it seems reasonable to do this

in order to get a reproducible result, without using the global rng.
@DhairyaLGandhi
Copy link
Member

Do we gain much by storing the RNG? Could that be avoided?

@darsnack
Copy link
Member

darsnack commented Feb 5, 2021

I feel like you can't avoid storing it, because it is used in Base.iterate. Passing it in as state would be too clunky.

@DhairyaLGandhi
Copy link
Member

I think I'm wondering whether we are better off supporting custom iterators Instead and have these kinds of details handled by the that function instead

@CarloLucibello
Copy link
Member

i don't know what you mean, DataLoader is a custom iterator already

@DhairyaLGandhi
Copy link
Member

Let me do a rough draft of what I have in mind tonight

@darsnack
Copy link
Member

darsnack commented Feb 7, 2021

This seems like a small, simple change that would be nice to get in before the next release. It's breaking right? Would be a shame to wait until 0.13 to use user-specified RNGs.

@CarloLucibello
Copy link
Member

It's breaking right?

doesn't look breaking, just another keyword arg with the default argument which was implicitly used already

@ToucheSir
Copy link
Member

ToucheSir commented Feb 8, 2021

After thinking about this a bit more, why not direct users to something like MLDataPattern.shuffleobs that already takes an RNG argument? DataLoader has always been awkward when it comes to extensibility (i.e. it isn't at all), but I'm not sure the solution is to add more arguments (I'll refrain from rambling about DataLoader vs DataLoaders.DataLoader here 😛).

@darsnack
Copy link
Member

darsnack commented Feb 8, 2021

I think we should hold-off on pushing MLDataPattern.jl too much until it has been refactored (i.e. MLDataPattern has a DataLoader that can replaces DataLoaders.DataLoader). This is fairly simple and easy feature add.

PS: wow that sentence was a mouthful

@@ -74,4 +76,7 @@
d = DataLoader((X, Y))
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1))
@test norm(θ .- 1) < 1e-10

# specify the rng
d = map(identity, DataLoader(X, batchsize=2; shuffle=true, rng=Random.seed!(Random.default_rng(), 5)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need better tests to ensure that the expected behaviour is happening. That the API works is pretty good, but we need to ensure that the functionality works as expected.

@DhairyaLGandhi
Copy link
Member

I would want the whole DataLoader to be extensible, so we can use it for things like training GANs/ NODEs, tuples of arguments (unlike just working on arrays), sending data to different workers/ devices asynchronously etc. It would be cool to have a package to take care of the heavy lifting here, but failing that, I would want these features before moving ahead with MLData* packages.

This should be its own issue by the looks of it.

@CarloLucibello
Copy link
Member

bors r+

@DhairyaLGandhi
Copy link
Member

@CarloLucibello there's pending comments about lacking tests and this is an api change. Further, regardless of merging this or #1508, the comments need to be addressed. We were not in a rush for merging this, anyway.

@bors
Copy link
Contributor

bors bot commented Feb 25, 2021

Build succeeded:

@bors bors bot merged commit 49c226d into FluxML:master Feb 25, 2021
@norci norci deleted the feature branch February 26, 2021 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants