Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

When running unittests: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #100

Open
@JackKelly

Description

Describe the bug

______________________________________________________________________________________________________________ test_train _________________________________________________________________________________________________________________

configuration_conv3d = Configuration(general=General(name='example', description='example configuration'), input_data=InputData(pv=PV(start_d...atches=250, n_validation_batches=0, n_test_batches=10, upload_every_n_batches=16, local_temp_path='~/temp/'), git=None)

    def test_train(configuration_conv3d):
    
        config_file = "tests/configs/model/conv3d_sat_nwp.yaml"
        config = load_config(config_file)
    
        dataset_configuration = configuration_conv3d
        dataset_configuration.input_data.nwp.nwp_image_size_pixels = 16
    
        # start model
        model = Model(**config)
    
        # create fake data loader
        train_dataset = FakeDataset(configuration=dataset_configuration)
        train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=None)
    
        # fit model
        trainer = pl.Trainer(gpus=0, max_epochs=1)
>       trainer.fit(model, train_dataloader)

tests/models/conv3d/test_conv3d_model_sat_nwp.py:85: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:737: in fit
    self._call_and_handle_interrupt(
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:682: in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:772: in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1195: in _run
    self._dispatch()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1274: in _dispatch
    self.training_type_plugin.start_training(self)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:202: in start_training
    self._results = trainer.run_stage()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1284: in run_stage
    return self._run_train()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1314: in _run_train
    self.fit_loop.run()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
    self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py:234: in advance
    self.epoch_loop.run(data_fetcher)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
    self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:193: in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
    self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py:88: in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/base.py:145: in run
    self.advance(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:215: in advance
    result = self._run_optimization(
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:266: in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:378: in _optimizer_step
    lightning_module.optimizer_step(
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py:1651: in optimizer_step
    optimizer.step(closure=optimizer_closure)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py:164: in step
    trainer.accelerator.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py:336: in optimizer_step
    self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py:163: in optimizer_step
    optimizer.step(closure=closure, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/optim/optimizer.py:88: in wrapper
    return func(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/autograd/grad_mode.py:28: in decorate_context
    return func(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/optim/adam.py:92: in step
    loss = closure()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py:148: in _wrap_closure
    closure_result = closure()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:160: in __call__
    self._result = self.closure(*args, **kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:142: in closure
    step_output = self._step_fn()
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:435: in _training_step
    training_step_output = self.trainer.accelerator.training_step(step_kwargs)
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py:216: in training_step
    return self.training_type_plugin.training_step(*step_kwargs.values())
../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:213: in training_step
    return self.model.training_step(*args, **kwargs)
predict_pv_yield/models/base_model.py:151: in training_step
    return self._training_or_validation_step(batch, tag="Train")
predict_pv_yield/models/base_model.py:102: in _training_or_validation_step
    mse_exp = self.weighted_losses.get_mse_exp(output=y_hat, target=y)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <nowcasting_utils.models.loss.WeightedLosses object at 0x7f3524a0a3a0>
output = tensor([[-0.1790],
        [-0.1054],
        [-0.2137],
        [-0.0856],
        [-0.1854],
        [-0.2698],
    ...-0.0591],
        [-0.2064],
        [-0.1509],
        [-0.1668],
        [-0.2546]], grad_fn=<ReshapeAliasBackward0>)
target = tensor([[-0.3660],
        [-2.0555],
        [ 0.6349],
        [ 0.7331],
        [ 0.8282],
        [-1.5825],
    ...48],
        [ 0.7466],
        [ 0.0310],
        [-0.9390],
        [ 0.4475],
        [-0.7565],
        [ 0.2256]])

    def get_mse_exp(self, output, target):
        """Loss function weighted MSE"""
>       return torch.sum(self.weights * (output - target) ** 2)
E       RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

../../../miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/nowcasting_utils/models/loss.py:57: RuntimeError
============================================================================================================== warnings summary ==============================================================================================================
tests/test_training.py::test_train
tests/test_utils.py::test_utils
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
    warnings.warn(msg, UserWarning)

tests/test_training.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:90: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=5)` is deprecated in v1.5 and will be removed in v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
    rank_zero_deprecation(

tests/test_training.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:167: LightningDeprecationWarning: Setting `Trainer(weights_summary=None)` is deprecated in v1.5 and will be removed in v1.7. Please set `Trainer(enable_model_summary=False)` instead.
    rank_zero_deprecation(

tests/test_training.py::test_train
tests/models/baseline/test_baseline_model.py::test_trainer
tests/models/baseline/test_baseline_model_gsp.py::test_trainer
tests/models/baseline/test_baseline_model_gsp.py::test_trainer_validation
tests/models/conv3d/test_conv3d_model.py::test_train
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1579: UserWarning: GPU available but not used. Set the gpus flag in your trainer `Trainer(gpus=1)` or script `--gpus=1`.
    rank_zero_warn(

tests/models/baseline/test_baseline_model.py::test_trainer
tests/models/baseline/test_baseline_model_gsp.py::test_trainer
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, test_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
    rank_zero_warn(

tests/models/baseline/test_baseline_model_gsp.py::test_trainer_validation
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
    rank_zero_warn(

tests/models/conv3d/test_conv3d_model.py::test_train
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/configuration_validator.py:118: UserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
    rank_zero_warn("You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.")

tests/models/conv3d/test_conv3d_model.py::test_train
tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:111: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 16 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
    rank_zero_warn(

tests/models/conv3d/test_conv3d_model.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:407: UserWarning: The number of training samples (2) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
    rank_zero_warn(

tests/models/conv3d/test_conv3d_model_gsp.py::test_train
tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/pytorch_lightning/trainer/data_loading.py:407: UserWarning: The number of training samples (10) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
    rank_zero_warn(

tests/models/perceiver/test_perceiver.py::test_model_forward
  /home/jack/miniconda3/envs/predict_pv_yield/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1634272204863/work/aten/src/ATen/native/TensorShape.cpp:2157.)
    return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_training.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model.py::test_trainer - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model_gsp.py::test_model_validation - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model_gsp.py::test_trainer - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/baseline/test_baseline_model_gsp.py::test_trainer_validation - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/conv3d/test_conv3d_model.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/conv3d/test_conv3d_model_gsp.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
FAILED tests/models/conv3d/test_conv3d_model_sat_nwp.py::test_train - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
================================================================================================= 8 failed, 21 passed, 24 warnings in 48.45s =================================================================================================
Epoch 0:   0%|                                                                                                                                                                                                         | 0/10 [00:38<?, ?it/s]
Epoch 0:   0%|                                                                                                                                                                                                         | 0/10 [00:36<?, ?it/s]
Segmentation fault (core dumped)
(predict_pv_yield) jack@leonardo:~/dev/ocf/predict_pv_yield$ 

To Reproduce
Install predict_pv_yield using conda on leonardo using the instructions and run py.test.

The tests also end in a segmentation fault, but maybe that's related???

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions