大模型infra问题调试技巧

gogongxt2025-12-242025-12-24

OOM堆栈不准

GPU执行kernel是异步的，报错堆栈可能不准确。

设置环境变量让GPU同步执行，获得准确堆栈：

export CUDA_LAUNCH_BLOCKING=1

decode开启CUDA Graph时，OOM可能在replay时报错，无法定位具体算子，需要禁用后复现：

python -m sglang.launch_server --model-path /path/to/model --disable-cuda-graph

py-spy dump -p <pid>

CPU卡在同步任务（如.tolist()）时，如果GPU核心100%满载且无波动，说明GPU任务卡住。

cuda-gdb -p <pid>
(gdb) bt

如果bt卡在CUDA的.so动态库，说明nvcc/CUDA驱动/torch runtime版本不匹配。

nvcc -V
pip show nvidia-cuda-runtime-cu12

保持小版本号一致（如都是12.8），升级后重新编译卡住的库：

python setup.py install