Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel training: reset back! function #443

Closed
mleprovost opened this issue Oct 19, 2018 · 1 comment
Closed

Parallel training: reset back! function #443

mleprovost opened this issue Oct 19, 2018 · 1 comment

Comments

@mleprovost
Copy link

mleprovost commented Oct 19, 2018

Hello,

I am trying to train the same model in CPU parallel in a genetic way with M different optimizers.

For every epoch, I copy the model M times and I independently train my model with each optimizer.
Then I keep the model with the lowest loss function(best model) and duplicate the best model for the next epoch.

It seems that the function back!() in train! implicitly keep in memory the last model, such that the optimization with each model is not independent.

Is there a way to clean the cache of the function back! to train with each optimizer independently, or to explicitly pass the model as an argument of back! ?

Simple example with multiple copy of the same model train with the same optimizer SGD:

using Flux
X=rand(10,200)
Y=rand(3,200)

# Create model
model=Chain(Dense(10, 3))
# Duplicate 6 times the model
modeltab=[model,model,model,model,model,model]
# Define loss function
lossPAR(x,y,m::Flux.Chain)=Flux.mse(m(x),y)
loss(m)=(x,y)->lossPAR(x,y,m)

losstab=zeros(6)
#Train the initial same model with the same optimizer "independently"
for j=1:6
    opt=Flux.SGD(Flux.params(modeltab[j]),0.01)
    
    Flux.train!(loss(modeltab[j]),zip(X,Y),opt)
    
    losstab[j]=(loss(modeltab[j])(X,Y)).tracker.data
end
losstab

6-element Array{Float64,1}:
0.137147
0.0894572
0.0824021
0.08119
0.0809284
0.0808571

They should all be equal if it was independent

@MikeInnes
Copy link
Member

When you write [model, model, model], you don't have three copies of the model, but instead three references to the same model. You can get the same behaviour if you write xs = [1, 2, 3]; ys = xs and then mutate one of them. The easiest way around this is probably to write [deepcopy(model) for _ = 1:6].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants