Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为啥会出现numpy的问题呢 #536

Open
kkkwjr opened this issue Nov 29, 2024 · 1 comment
Open

为啥会出现numpy的问题呢 #536

kkkwjr opened this issue Nov 29, 2024 · 1 comment

Comments

@kkkwjr
Copy link

kkkwjr commented Nov 29, 2024

(CogAgent) (.conda) (base) wpg@node7gpu:/workspace/kkkjr/Item/CogVLM/basic_demo$ torchrun --standalone --nnodes=1 --nproc-per-node=2 cli_demo_sat.py --from_pretrained cogagent-chat --version chat --bf16
/home/wpg/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
W1129 05:48:09.364000 64664 torch/distributed/run.py:793]
W1129 05:48:09.364000 64664 torch/distributed/run.py:793] *****************************************
W1129 05:48:09.364000 64664 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1129 05:48:09.364000 64664 torch/distributed/run.py:793] *****************************************
/home/wpg/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
/home/wpg/.local/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
Traceback (most recent call last):
File "/workspace/kkkjr/Item/CogVLM/basic_demo/cli_demo_sat.py", line 7, in
File "/workspace/kkkjr/Item/CogVLM/basic_demo/cli_demo_sat.py", line 7, in
from sat.model.mixins import CachedAutoregressiveMixinfrom sat.model.mixins import CachedAutoregressiveMixin

File "/home/wpg/.local/lib/python3.10/site-packages/sat/init.py", line 1, in
File "/home/wpg/.local/lib/python3.10/site-packages/sat/init.py", line 1, in
from .arguments import get_args, update_args_with_filefrom .arguments import get_args, update_args_with_file

File "/home/wpg/.local/lib/python3.10/site-packages/sat/arguments.py", line 23, in
File "/home/wpg/.local/lib/python3.10/site-packages/sat/arguments.py", line 23, in
import numpy as np
ModuleNotFoundError: No module named 'numpy'import numpy as np

ModuleNotFoundError: No module named 'numpy'
E1129 05:48:11.400000 64664 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 64829) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/wpg/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wpg/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

cli_demo_sat.py FAILED

Failures:
[1]:
time : 2024-11-29_05:48:11
host : node7gpu
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 64830)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-11-29_05:48:11
host : node7gpu
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 64829)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

image
这里我的已经有了numpy,不知道为啥还是出现这个情况

@MachineDora
Copy link

numpy最高安装1.26.3版本的,不然会出现各种稀奇古怪的错误,版本不能太高。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants