【解决vllm】CUDA error: the provided PTX was compiled with an unsupported toolchain.
·
项目场景:
vllm推理qwen3-8b报错:torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
详细错误信息见下:
(EngineCore_DP0 pid=3675948) ERROR 01-22 13:05:52 [core.py:936] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=3675948) ERROR 01-22 13:05:52 [core.py:936] Search for `cudaErrorUnsupportedPtxVersion' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(EngineCore_DP0 pid=3675948) ERROR 01-22 13:05:52 [core.py:936] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=3675948) ERROR 01-22 13:05:52 [core.py:936] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=3675948) ERROR 01-22 13:05:52 [core.py:936] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=3675948) ERROR 01-22 13:05:52 [core.py:936]
解决方法:
import os
# 必须在导入vllm前执行
os.environ["VLLM_ATTENTION_BACKEND"] = "FLASHINFER"
或者在终端执行
export VLLM_ATTENTION_BACKEND=FLASHINFER
更多推荐
所有评论(0)