This repository has been archived by the owner on Sep 7, 2023. It is now read-only.
When running unittests: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #100
Open
Description
Describe the bug
______________________________________________________________________________________________________________ test_train _________________________________________________________________________________________________________________
configuration_conv3d = Configuration(general=General(name='example', description='example configuration'), input_data=InputData(pv=PV(start_d...atches=250, n_validation_batches=0, n_test_batches=10, upload_every_n_batches=16, local_temp_path='~/temp/'), git=None)
def test_train(configuration_conv3d):
config_file = "tests/configs/model/conv3d_sat_nwp.yaml"
config = load_config(config_file)
dataset_configuration = configuration_conv3d
dataset_configuration.input_data.nwp.nwp_image_size_pixels = 16
# start model
model = Model(**config)
# create fake data loader
train_dataset = FakeDataset(configuration=dataset_configuration)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=None)
# fit model
trainer = pl.Trainer(gpus=0, max_epochs=1)
> trainer.fit(model, train_dataloader)
tests/models/conv3d/test_conv3d_model_sat_nwp.py:85:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:737: in fit
self._call_and_handle_interrupt(
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:682: in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:772: in _fit_impl
self._run(model, ckpt_path=ckpt_path)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1195: in _run
self._dispatch()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1274: in _dispatch
self.training_type_plugin.start_training(self)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:202: in start_training
self._results = trainer.run_stage()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1284: in run_stage
return self._run_train()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1314: in _run_train
self.fit_loop.run()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py:234: in advance
self.epoch_loop.run(data_fetcher)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:193: in advance
batch_output = self.batch_loop.run(batch, batch_idx)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py:88: in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:215: in advance
result = self._run_optimization(
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:266: in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:378: in _optimizer_step
lightning_module.optimizer_step(
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py:1651: in optimizer_step
optimizer.step(closure=optimizer_closure)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py:164: in step
trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py:336: in optimizer_step
self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py:163: in optimizer_step
optimizer.step(closure=closure, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/optim/optimizer.py:88: in wrapper
return func(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/autograd/grad_mode.py:28: in decorate_context
return func(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/optim/adam.py:92: in step
loss = closure()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py:148: in _wrap_closure
closure_result = closure()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:160: in __call__
self._result = self.closure(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:142: in closure
step_output = self._step_fn()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:435: in _training_step
training_step_output = self.trainer.accelerator.training_step(step_kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py:216: in training_step
return self.training_type_plugin.training_step(*step_kwargs.values())
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:213: in training_step
return self.model.training_step(*args, **kwargs)
predict_pv_yield/models/base_model.py:151: in training_step
return self._training_or_validation_step(batch, tag="Train")
predict_pv_yield/models/base_model.py:102: in _training_or_validation_step
mse_exp = self.weighted_losses.get_mse_exp(output=y_hat, target=y)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <nowcasting_utils.models.loss.WeightedLosses object at 0x7f3524a0a3a0>
output = tensor([[-0.1790],
[-0.1054],
[-0.2137],
[-0.0856],
[-0.1854],
[-0.2698],
...-0.0591],
[-0.2064],
[-0.1509],
[-0.1668],
[-0.2546]], grad_fn=<ReshapeAliasBackward0>)
target = tensor([[-0.3660],
[-2.0555],
[ 0.6349],
[ 0.7331],
[ 0.8282],
[-1.5825],
...48],
[ 0.7466],
[ 0.0310],
[-0.9390],
[ 0.4475],
[-0.7565],
[ 0.2256]])
def get_mse_exp(self, output, target):
"""Loss function weighted MSE"""
> return torch.sum(self.weights * (output - target) ** 2)
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/nowcasting_utils/models/loss.py:57: RuntimeError
============================================================================================================== warnings summary ==============================================================================================================
tests/test_training.py::test_train
tests/test_utils.py::test_utils
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
tests/test_training.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:90: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=5)` is deprecated in v1.5 and will be removed in v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
rank_zero_deprecation(
tests/test_training.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:167: LightningDeprecationWarning: Setting `Trainer(weights_summary=None)` is deprecated in v1.5 and will be removed in v1.7. Please set `Trainer(enable_model_summary=False)` instead.
rank_zero_deprecation(
tests/test_training.py::test_train
tests/models/baseline/test_baseline_model.py::test_trainer
tests/models/baseline/test_baseline_model_gsp.py::test_trainer
tests/models/baseline/test_baseline_model_gsp.py::test_trainer_validation
tests/models/conv3d/test_conv3d_model.py::test_train
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1579: UserWarning: GPU available but not used. Set the gpus flag in your trainer `Trainer(gpus=1)` or script `--gpus=1`.
rank_zero_warn(
tests/models/baseline/test_baseline_model.py::test_trainer
tests/models/baseline/test_baseline_model_gsp.py::test_trainer
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
tests/models/baseline/test_baseline_model_gsp.py::test_trainer_validation
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
tests/models/conv3d/test_conv3d_model.py::test_train
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:118: UserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
rank_zero_warn("You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.")
tests/models/conv3d/test_conv3d_model.py::test_train
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
tests/models/conv3d/test_conv3d_model.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:407: UserWarning: The number of training samples (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:407: UserWarning: The number of training samples (10) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
tests/models/perceiver/test_perceiver.py::test_model_forward
/home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1634272204863/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_training.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model.py::test_trainer - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model_gsp.py::test_model_validation - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model_gsp.py::test_trainer - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model_gsp.py::test_trainer_validation - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/conv3d/test_conv3d_model.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/conv3d/test_conv3d_model_gsp.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
================================================================================================= 8 failed, 21 passed, 24 warnings in 48.45s =================================================================================================
Epoch 0: 0%| | 0/10 [00:38<?, ?it/s]
Epoch 0: 0%| | 0/10 [00:36<?, ?it/s]
Segmentation fault (core dumped)
(predict_pv_yield) jack@leonardo:~/dev/ocf/predict_pv_yield$
To Reproduce
Install predict_pv_yield
using conda
on leonardo
using the instructions and run py.test
.
The tests also end in a segmentation fault, but maybe that's related???