Description
I have a strange issue with backward() I have two generators, gen1 and gen2, I calculate loss on three ways, loss_1, loss_2, loss_3
All compute for gen1 are ok
Part 1.
let out = gen1.forward(input);
let out2 = gen2.forward(out.clone())
... calculate loss
then total_loss = loss_1 + loss_2 + loss_3
and:
let grads1 = total_loss.backward();
let grads_gen1 = GradientsParams::from_grads(grads, &gen1);
This works fine
And then:
Part 2.
let out3 = gen2.forward(out)
let out = gen1.forward(out3)
... calculate loss
then in similar way but for others arguments total_loss = loss_1 + loss_2 + loss_3
and:
let grads2 = total_loss.backward();
let grads_gen2= GradientsParams::from_grads(grads, &gen2);
and update models
.. .step()...
all compute are done in one loop. And here for grads2 I have a issue, when I remove a loss_3 from total_loss everything works fine but with loss_3 I have this error msg:
thread 'main' panicked at /home/euuki/.cargo/registry/src/index.crates.io-6f17d22bba15001f/burn-tensor-0.13.2/src/tensor/api/numeric.rs:22:9:
=== Tensor Operation Error ===
Operation: 'Add'
Reason:
1. The provided tensors have incompatible shapes. Incompatible size at dimension '2' => '1278 != 1280', which can't be broadcasted. Lhs tensor shape [2, 1, 1278, 1278], Rhs tensor shape [2, 1, 1280, 1280].
2. The provided tensors have incompatible shapes. Incompatible size at dimension '3' => '1278 != 1280', which can't be broadcasted. Lhs tensor shape [2, 1, 1278, 1278], Rhs tensor shape [2, 1, 1280, 1280].
stack backtrace:
0: 0x577d25479995 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1e1a1972118942ad
1: 0x577d2549e41b - core::fmt::write::hc090a2ffd6b28c4a
2: 0x577d2547773f - std::io::Write::write_fmt::h8898bac6ff039a23
3: 0x577d2547976e - std::sys_common::backtrace::print::ha96650907276675e
4: 0x577d2547aa29 - std::panicking::default_hook::{{closure}}::h215c2a0a8346e0e0
5: 0x577d2547a76d - std::panicking::default_hook::h207342be97478370
6: 0x577d2547aec3 - std::panicking::rust_panic_with_hook::hac8bdceee1e4fe2c
7: 0x577d2547ada4 - std::panicking::begin_panic_handler::{{closure}}::h00d785e82757ce3c
8: 0x577d25479e59 - std::sys_common::backtrace::__rust_end_short_backtrace::h1628d957bcd06996
9: 0x577d2547aad7 - rust_begin_unwind
10: 0x577d24f95ff3 - core::panicking::panic_fmt::hdc63834ffaaefae5
11: 0x577d25085a41 - core::panicking::panic_display::hd504bfa7a23e079b
12: 0x577d24f6910d - burn_tensor::tensor::api::numeric::<impl burn_tensor::tensor::api::base::Tensor<B,_,K>>::add::panic_cold_display::haa334297998f63f1
13: 0x577d2507fd6f - burn_tensor::tensor::api::numeric::<impl burn_tensor::tensor::api::base::Tensor<B,_,K>>::add::h197481e80e899342
14: 0x577d2503c0b7 - burn_autodiff::grads::Gradients::register::h7f50eb7d3a39e84f
15: 0x577d25018acf - <burn_autodiff::ops::module::<impl burn_tensor::tensor::ops::modules::base::ModuleOps<burn_autodiff::backend::Autodiff<B,C>> for burn_autodiff::backend::Autodiff<B,C>>::conv2d::Conv2DWithBias as burn_autodiff::ops::backward::Backward<B,4_usize,3_usize>>::backward::h0eb6b130b28fdf5d
16: 0x577d25060b8f - <burn_autodiff::ops::base::OpsStep<B,T,SB,_,_> as burn_autodiff::graph::base::Step>::step::hd9a3762bb2722aab
17: 0x577d25446e8d - burn_autodiff::runtime::server::AutodiffServer::backward::h034fbb21f4457df1
18: 0x577d24fcf75a - <burn_autodiff::runtime::mutex::MutexClient as burn_autodiff::runtime::client::AutodiffClient>::backward::h235c73a9c48299ab
19: 0x577d2504a5c2 - burn_test::nn::flowscaller::deg_training::flow_net::h0a08effeaee47cf1
20: 0x577d250a7f28 - burn_test::main::h3db9588c4d2b6cd5
21: 0x577d24fc2a33 - std::sys_common::backtrace::__rust_begin_short_backtrace::h1ab658f13ba31837
22: 0x577d2503da39 - std::rt::lang_start::{{closure}}::ha5d1a6d22c45cdd7
23: 0x577d25472ff0 - std::rt::lang_start_internal::h3ed4fe7b2f419135
24: 0x577d250a8035 - main
25: 0x731bf462a1ca - __libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
26: 0x731bf462a28b - __libc_start_main_impl
at ./csu/../csu/libc-start.c:360:3
27: 0x577d24f96745 - _start
28: 0x0 - <unknown>
where 1280 it's output size for tensor for width and height. All padding etc. should be ok for gen1 and gen2 because in otherways Part 1 wouldn't work. Do you have any idea or some sugestion what could be cause? I wanted write on discord on the channel #help, but with logs msg exceeded 2000 chars