Closed
Description
The conda-forge bot is building the new deepspeed packages for 0.8.1. See conda-forge/deepspeed-feedstock#6 (comment) for context.
All the builds for CUDA 11.2 are failing because of the below error:
/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_build_env/bin/x86_64-conda-linux-gnu-c++ -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/work=/usr/local/src/conda/deepspeed-0.8.1 -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p=/usr/local/src/conda-prefix -isystem /usr/local/cuda/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -isystem /usr/local/cuda/include build/temp.linux-x86_64-cpython-311/csrc/lamb/fused_lamb_cuda.o build/temp.linux-x86_64-cpython-311/csrc/lamb/fused_lamb_cuda_kernel.o -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/lamb/fused_lamb_op.cpython-311-x86_64-linux-gnu.so
building 'deepspeed.ops.quantizer.quantizer_op' extension
creating build/temp.linux-x86_64-cpython-311/csrc/quantization
/usr/local/cuda/bin/nvcc -Icsrc/includes -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include/TH -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include/THC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/python3.11 -c csrc/quantization/dequantize.cu -o build/temp.linux-x86_64-cpython-311/csrc/quantization/dequantize.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1014\" -DTORCH_EXTENSION_NAME=quantizer_op -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -ccbin /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_build_env/bin/x86_64-conda-linux-gnu-cc -std=c++14
csrc/includes/reduction_utils.h(170): error: no operator "+" matches these operands
operand types are: const __half + const __half
csrc/includes/reduction_utils.h(199): error: no operator "+" matches these operands
operand types are: const __half2 + const __half2
csrc/includes/dequantization_utils.h(165): warning: constexpr if statements are a C++17 feature
detected during:
instantiation of "void dequantize_kernel<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=__half, numBits=8, qType=quantize::Type::Symmetric, unroll=8, threads=512]"
csrc/quantization/dequantize.cu(45): here
instantiation of "void launch_dequantize_kernel(T *, const int8_t *, const float *, quantize::Type, int, int, int, cudaStream_t) [with T=__half]"
csrc/quantization/dequantize.cu(55): here
csrc/includes/quantization_utils.h(124): error: no suitable conversion function from "__half" to "float" exists
detected during:
instantiation of "T quantize::Params<quantize::Type::Asymmetric, numBits>::dequantize<T>(int8_t) [with numBits=8, T=float]"
csrc/includes/dequantization_utils.h(75): here
instantiation of "void dequantize::chunk(T *, const int8_t *, dequantize::Params<qType, numBits>) [with T=float, numBits=8, qType=quantize::Type::Asymmetric]"
csrc/includes/dequantization_utils.h(151): here
instantiation of "void dequantize::_to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=8, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
csrc/includes/dequantization_utils.h(167): here
instantiation of "void dequantize::to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=8, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
csrc/quantization/dequantize.cu(18): here
instantiation of "void dequantize_kernel<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=8, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
csrc/quantization/dequantize.cu(47): here
instantiation of "void launch_dequantize_kernel(T *, const int8_t *, const float *, quantize::Type, int, int, int, cudaStream_t) [with T=float]"
csrc/quantization/dequantize.cu(64): here
csrc/includes/quantization_utils.h(124): error: no suitable conversion function from "__half" to "float" exists
detected during:
instantiation of "T quantize::Params<quantize::Type::Asymmetric, numBits>::dequantize<T>(int8_t) [with numBits=4, T=float]"
csrc/includes/dequantization_utils.h(78): here
instantiation of "void dequantize::chunk(T *, const int8_t *, dequantize::Params<qType, numBits>) [with T=float, numBits=4, qType=quantize::Type::Asymmetric]"
csrc/includes/dequantization_utils.h(151): here
instantiation of "void dequantize::_to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=4, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
csrc/includes/dequantization_utils.h(167): here
instantiation of "void dequantize::to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=4, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
csrc/quantization/dequantize.cu(18): here
instantiation of "void dequantize_kernel<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=4, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
csrc/quantization/dequantize.cu(51): here
instantiation of "void launch_dequantize_kernel(T *, const int8_t *, const float *, quantize::Type, int, int, int, cudaStream_t) [with T=float]"
csrc/quantization/dequantize.cu(64): here
4 errors detected in the compilation of "csrc/quantization/dequantize.cu".
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
error: subprocess-exited-with-error
× Running setup.py install for deepspeed did not run successfully.
│ exit code: 1
╰─> See above for output
See also the build script below:
# Deepspeed ops cannot be built without CUDA
if [[ ${cuda_compiler_version} != "None" ]]; then
export DS_BUILD_OPS=1
# Set the CUDA arch list from
# https://github.com/conda-forge/pytorch-cpu-feedstock/blob/2be0b38024b3b5601fcefce40596fc2a5fce4ab7/recipe/build_pytorch.sh#L94
if [[ ${cuda_compiler_version} == 10.* ]]; then
export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5+PTX"
elif [[ ${cuda_compiler_version} == 11.0* ]]; then
export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${cuda_compiler_version} == 11.1 ]]; then
export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX"
elif [[ ${cuda_compiler_version} == 11.2 ]]; then
export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
echo "Unsupported cuda version. edit build.sh"
exit 1
fi
else
export DS_BUILD_OPS=0
fi
# Disable sparse_attn since it requires an exact version of triton==1.0.0
export DS_BUILD_SPARSE_ATTN=0
python -m pip install . -vv
The conda builds were working fine for 0.8.0, so I wonder whether there could be any specific changes to 0.8.1 that could explain this error? Also is CUDA 11.2 officially supported (I could not find the information in this repo)?
Metadata
Assignees
Labels
No labels