Parallel training: reset back! function #443

mleprovost · 2018-10-19T23:04:09Z

Hello,

I am trying to train the same model in CPU parallel in a genetic way with M different optimizers.

For every epoch, I copy the model M times and I independently train my model with each optimizer.
Then I keep the model with the lowest loss function(best model) and duplicate the best model for the next epoch.

It seems that the function back!() in train! implicitly keep in memory the last model, such that the optimization with each model is not independent.

Is there a way to clean the cache of the function back! to train with each optimizer independently, or to explicitly pass the model as an argument of back! ?

Simple example with multiple copy of the same model train with the same optimizer SGD:

using Flux
X=rand(10,200)
Y=rand(3,200)

# Create model
model=Chain(Dense(10, 3))
# Duplicate 6 times the model
modeltab=[model,model,model,model,model,model]
# Define loss function
lossPAR(x,y,m::Flux.Chain)=Flux.mse(m(x),y)
loss(m)=(x,y)->lossPAR(x,y,m)

losstab=zeros(6)
#Train the initial same model with the same optimizer "independently"
for j=1:6
    opt=Flux.SGD(Flux.params(modeltab[j]),0.01)
    
    Flux.train!(loss(modeltab[j]),zip(X,Y),opt)
    
    losstab[j]=(loss(modeltab[j])(X,Y)).tracker.data
end
losstab

6-element Array{Float64,1}:
0.137147
0.0894572
0.0824021
0.08119
0.0809284
0.0808571

They should all be equal if it was independent

The text was updated successfully, but these errors were encountered:

MikeInnes · 2018-10-22T10:38:05Z

When you write [model, model, model], you don't have three copies of the model, but instead three references to the same model. You can get the same behaviour if you write xs = [1, 2, 3]; ys = xs and then mutate one of them. The easiest way around this is probably to write [deepcopy(model) for _ = 1:6].

CarloLucibello closed this as completed Dec 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel training: reset back! function #443

Parallel training: reset back! function #443

mleprovost commented Oct 19, 2018 •

edited

Loading

MikeInnes commented Oct 22, 2018

Parallel training: reset back! function #443

Parallel training: reset back! function #443

Comments

mleprovost commented Oct 19, 2018 • edited Loading

MikeInnes commented Oct 22, 2018

mleprovost commented Oct 19, 2018 •

edited

Loading