n卡驱动关系

层级 包含的核心组件 查看命令示例 可变性 (Mutability)
物理机 (宿主机) GPU 硬件、NVIDIA 显示驱动 (如 570.xx) nvidia-smi (看 Driver Version) 不可变。 容器内无法修改,只能由物理机升级
物理机 (宿主机) 驱动决定 CUDA API 上限 (如 12.8,最高支持nvcc 12.8) nvidia-smi (看右上角 CUDA Version) 不可变。 它是驱动的附属只读属性,是一块“天花板”,可以升级物理机驱动改变。
镜像 (Image) CUDA Toolkit (nvcc)、cuDNN nvcc -V (在容器内执行) 镜像本身不可变
容器 (Container) 镜像的nvcc版本,torch版本,cuda-runtime版本 nvcc -V pip show torch 动态可变
IMPORTANT

需要保证一条自下而上的兼容链条

  • nvcc版本torch版本cuda-runtime版本最好三个保持一致,可以减少一些cuda编译和运行库的奇怪bug
  • 这三个版本可以低于CUDA Version,因为这个是支持的上限,低于它没关系

查看详细版本信息命令

nvidia-smi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Thu Apr 23 20:50:41 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.148.08             Driver Version: 570.148.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:18:00.0 Off |                    0 |
| N/A   32C    P0            123W /  700W |  130220MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H200                    On  |   00000000:2A:00.0 Off |                    0 |
| N/A   32C    P0            125W /  700W |  130308MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H200                    On  |   00000000:3A:00.0 Off |                    0 |
| N/A   31C    P0            115W /  700W |  130308MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H200                    On  |   00000000:5D:00.0 Off |                    0 |
| N/A   30C    P0            114W /  700W |  130308MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H200                    On  |   00000000:9A:00.0 Off |                    0 |
| N/A   31C    P0            114W /  700W |  130308MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H200                    On  |   00000000:AB:00.0 Off |                    0 |
| N/A   32C    P0            114W /  700W |  130308MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H200                    On  |   00000000:BA:00.0 Off |                    0 |
| N/A   31C    P0            115W /  700W |  130308MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H200                    On  |   00000000:DB:00.0 Off |                    0 |
| N/A   30C    P0            114W /  700W |  129828MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A          255235      C   sglang::scheduler_TP0                 13021... |
|    1   N/A  N/A          255236      C   sglang::scheduler_TP1                 13029... |
|    2   N/A  N/A          255237      C   sglang::scheduler_TP2                 13029... |
|    3   N/A  N/A          255238      C   sglang::scheduler_TP3                 13029... |
|    4   N/A  N/A          255239      C   sglang::scheduler_TP4                 13029... |
|    5   N/A  N/A          255240      C   sglang::scheduler_TP5                 13029... |
|    6   N/A  N/A          255241      C   sglang::scheduler_TP6                 13029... |
|    7   N/A  N/A          255242      C   sglang::scheduler_TP7                 12981... |
+-----------------------------------------------------------------------------------------+

nvcc --version

1
2
3
4
5
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

pip list | grep torch

1
torch                                    2.9.1

pip show torch

1
2
3
4
5
6
7
8
9
10
Name: torch
Version: 2.9.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org
Author:
Author-email: PyTorch Team <packages@pytorch.org>
License: BSD-3-Clause
Location: /usr/local/lib/python3.12/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvshmem-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions
Required-by: cache_dit, compressed-tensors, flashinfer-python, outlines, quack-kernels, runai-model-streamer, sglang, st_attn, timm, torch_c_dlpack_ext, torchaudio, torchvision, vllm, vsa, xgrammar

pip show nvidia-cuda-runtime-cu12

1
2
3
4
5
6
7
8
9
10
Name: nvidia-cuda-runtime-cu12
Version: 12.8.90
Summary: CUDA Runtime native Libraries
Home-page: https://developer.nvidia.com/cuda-zone
Author: Nvidia CUDA Installer Team
Author-email: compute_installer@nvidia.com
License: NVIDIA Proprietary Software
Location: /usr/local/lib/python3.12/dist-packages
Requires:
Required-by: torch