报错如下:

docker run -it --gpus all nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

处理前宿主机GPU节点必须识别显卡:

加载NVIDIA驱动nvidia-smi

解决如下:

缺少依赖nvidia-container-toolkit(需yum安装)

先添加nvidia-container-toolkit源:

yum config-manager --add-repo https://nvidia.github.io/nvidia-docker/centos8/nvidia-docker.repo

查看刚新加源:

cat nvidia-docker.repo

安装nvidia-container-toolkit:

yum install nvidia-container-toolkit

[root@asc2-gn01 yum.repos.d]# cat /etc/docker/daemon.json 
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
  "registry-mirrors": [
    "https://docker.nju.edu.cn",
    "https://mirror.baidubce.com",
    "https://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn" 
],
  "data-root": "/gpfs/docker"
}

 

重启docker服务:

systemctl restart docker

进入docker的pytorch容器里,并携带可识别宿主机gpu显卡参数:

docker run -it --gpus all nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash

验证可识别宿主机显卡

nvidia-smi

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐