Compilation error for 0.8.1 with CUDA 11.2

The conda-forge bot is building the new deepspeed packages for 0.8.1. See https://github.com/conda-forge/deepspeed-feedstock/pull/6#issuecomment-1436120690 for context.

All the builds for CUDA 11.2 are failing because of the below error:

```
  /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_build_env/bin/x86_64-conda-linux-gnu-c++ -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -Wl,-rpath-link,/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/work=/usr/local/src/conda/deepspeed-0.8.1 -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p=/usr/local/src/conda-prefix -isystem /usr/local/cuda/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -isystem /usr/local/cuda/include build/temp.linux-x86_64-cpython-311/csrc/lamb/fused_lamb_cuda.o build/temp.linux-x86_64-cpython-311/csrc/lamb/fused_lamb_cuda_kernel.o -L/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-311/deepspeed/ops/lamb/fused_lamb_op.cpython-311-x86_64-linux-gnu.so
  building 'deepspeed.ops.quantizer.quantizer_op' extension
  creating build/temp.linux-x86_64-cpython-311/csrc/quantization
  /usr/local/cuda/bin/nvcc -Icsrc/includes -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include/TH -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.11/site-packages/torch/include/THC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/python3.11 -c csrc/quantization/dequantize.cu -o build/temp.linux-x86_64-cpython-311/csrc/quantization/dequantize.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1014\" -DTORCH_EXTENSION_NAME=quantizer_op -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -ccbin /home/conda/feedstock_root/build_artifacts/deepspeed_1676841729762/_build_env/bin/x86_64-conda-linux-gnu-cc -std=c++14
  csrc/includes/reduction_utils.h(170): error: no operator "+" matches these operands
              operand types are: const __half + const __half

  csrc/includes/reduction_utils.h(199): error: no operator "+" matches these operands
              operand types are: const __half2 + const __half2

  csrc/includes/dequantization_utils.h(165): warning: constexpr if statements are a C++17 feature
            detected during:
              instantiation of "void dequantize_kernel<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=__half, numBits=8, qType=quantize::Type::Symmetric, unroll=8, threads=512]"
  csrc/quantization/dequantize.cu(45): here
              instantiation of "void launch_dequantize_kernel(T *, const int8_t *, const float *, quantize::Type, int, int, int, cudaStream_t) [with T=__half]"
  csrc/quantization/dequantize.cu(55): here

  csrc/includes/quantization_utils.h(124): error: no suitable conversion function from "__half" to "float" exists
            detected during:
              instantiation of "T quantize::Params<quantize::Type::Asymmetric, numBits>::dequantize<T>(int8_t) [with numBits=8, T=float]"
  csrc/includes/dequantization_utils.h(75): here
              instantiation of "void dequantize::chunk(T *, const int8_t *, dequantize::Params<qType, numBits>) [with T=float, numBits=8, qType=quantize::Type::Asymmetric]"
  csrc/includes/dequantization_utils.h(151): here
              instantiation of "void dequantize::_to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=8, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
  csrc/includes/dequantization_utils.h(167): here
              instantiation of "void dequantize::to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=8, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
  csrc/quantization/dequantize.cu(18): here
              instantiation of "void dequantize_kernel<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=8, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
  csrc/quantization/dequantize.cu(47): here
              instantiation of "void launch_dequantize_kernel(T *, const int8_t *, const float *, quantize::Type, int, int, int, cudaStream_t) [with T=float]"
  csrc/quantization/dequantize.cu(64): here

  csrc/includes/quantization_utils.h(124): error: no suitable conversion function from "__half" to "float" exists
            detected during:
              instantiation of "T quantize::Params<quantize::Type::Asymmetric, numBits>::dequantize<T>(int8_t) [with numBits=4, T=float]"
  csrc/includes/dequantization_utils.h(78): here
              instantiation of "void dequantize::chunk(T *, const int8_t *, dequantize::Params<qType, numBits>) [with T=float, numBits=4, qType=quantize::Type::Asymmetric]"
  csrc/includes/dequantization_utils.h(151): here
              instantiation of "void dequantize::_to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=4, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
  csrc/includes/dequantization_utils.h(167): here
              instantiation of "void dequantize::to_global<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=4, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
  csrc/quantization/dequantize.cu(18): here
              instantiation of "void dequantize_kernel<T,numBits,qType,unroll,threads>(T *, const int8_t *, const float *, int, int) [with T=float, numBits=4, qType=quantize::Type::Asymmetric, unroll=8, threads=512]"
  csrc/quantization/dequantize.cu(51): here
              instantiation of "void launch_dequantize_kernel(T *, const int8_t *, const float *, quantize::Type, int, int, int, cudaStream_t) [with T=float]"
  csrc/quantization/dequantize.cu(64): here

  4 errors detected in the compilation of "csrc/quantization/dequantize.cu".
  error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
  error: subprocess-exited-with-error
  
  × Running setup.py install for deepspeed did not run successfully.
  │ exit code: 1
  ╰─> See above for output
```

See also the build script below:

```
# Deepspeed ops cannot be built without CUDA
if [[ ${cuda_compiler_version} != "None" ]]; then
  export DS_BUILD_OPS=1

  # Set the CUDA arch list from
  # https://github.com/conda-forge/pytorch-cpu-feedstock/blob/2be0b38024b3b5601fcefce40596fc2a5fce4ab7/recipe/build_pytorch.sh#L94

  if [[ ${cuda_compiler_version} == 10.* ]]; then
    export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5+PTX"
  elif [[ ${cuda_compiler_version} == 11.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0+PTX"
  elif [[ ${cuda_compiler_version} == 11.1 ]]; then
    export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX"
  elif [[ ${cuda_compiler_version} == 11.2 ]]; then
    export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX"
  else
    echo "Unsupported cuda version. edit build.sh"
    exit 1
  fi

else
  export DS_BUILD_OPS=0
fi

# Disable sparse_attn since it requires an exact version of triton==1.0.0
export DS_BUILD_SPARSE_ATTN=0

python -m pip install . -vv
```

The conda builds were working fine for 0.8.0, so I wonder whether there could be any specific changes to 0.8.1 that could explain this error? Also is CUDA 11.2 officially supported (I could not find the information in this repo)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation error for 0.8.1 with CUDA 11.2 #2858

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development