SGLang_GLM_Installation_Guide
清华镜像上的软件包版本号会随着安全更新而变化。
SGLang部署GLM大模型详细指导
文档信息
- 目标环境: Ubuntu 22.04
- 硬件配置: 8核CPU / 32GB内存 / 32GB GPU显存
- 目标模型: GLM-4-9B-Chat / ChatGLM3-6B
- 部署框架: SGLang
- Python版本: 3.12
- 文档版本: v1.0
目录
1. 系统环境检查
1.1 硬件资源验证
# 检查CPU核心数
lscpu | grep "CPU(s):"
# 检查内存大小
free -h
# 检查GPU信息
nvidia-smi
# 检查磁盘空间(建议至少50GB可用空间)
df -h
预期输出验证标准:
- CPU核心数 ≥ 8
- 内存 ≥ 32GB
- GPU显存 ≥ 32GB
- 磁盘可用空间 ≥ 50GB
1.2 操作系统版本验证
# 检查Ubuntu版本
lsb_release -a
# 检查内核版本
uname -r
预期输出:
- Ubuntu 22.04 LTS
- 内核版本 ≥ 5.15
1.3 GPU驱动检查
# 检查NVIDIA驱动版本
nvidia-smi | grep "Driver Version"
# 检查CUDA版本(如果已安装)
nvcc --version 2>/dev/null || echo "CUDA未安装"
要求:
- NVIDIA驱动版本 ≥ 525.60.13
- CUDA版本 ≥ 12.0(推荐12.1或更高)
2. 场景一:完全离线环境安装
2.1 准备工作(在Windows系统的有网络机器上)
2.1.1 创建下载目录
在Windows PowerShell中创建下载目录:
# 创建下载目录结构
New-Item -ItemType Directory -Path "C:\offline_packages\system" -Force
New-Item -ItemType Directory -Path "C:\offline_packages\python\packages" -Force
New-Item -ItemType Directory -Path "C:\offline_packages\models" -Force
# 进入下载目录
cd C:\offline_packages
2.1.2 下载系统依赖包
⚠️ 重要提示:版本号会定期更新
清华镜像上的软件包版本号会随着安全更新而变化。如果下面的下载链接返回 404 错误,请按以下步骤查找最新版本:
- 访问清华镜像目录:https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/
- 查找最新的版本号(例如
3.12.3-1ubuntu0.13) - 将下面命令中的版本号替换为最新版本
使用wget下载所有依赖包(Windows版)
# 安装wget(如果没有)
# 访问 https://eternallybored.org/misc/wget/ 下载wget.exe,放到C:\Windows\System32\
# 进入system目录
cd C:\offline_packages\system
# 下载Python 3.12相关包(Ubuntu 22.04)
# 注意:版本号可能会更新,请访问清华镜像确认最新版本
# 当前最新版本:3.12.3-1ubuntu0.13 (2026-04-08)
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12-minimal_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12-dev_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-dev_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-stdlib_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-minimal_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python-pip/python3-pip_23.0.1+dfsg-1_all.deb
# 下载编译工具和依赖
# 注意:以下版本号可能会更新,请访问清华镜像确认最新版本
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/b/build-essential/build-essential_12.9ubuntu3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/gcc-11_11.4.0-1ubuntu1~22.04.3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/g++-11_11.4.0-1ubuntu1~22.04.3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/m/make/make_4.3-4.1build1_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/d/dpkg-dev/dpkg-dev_1.21.1ubuntu2.3_all.deb
# 下载其他工具
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/git/git_2.34.1-1ubuntu1.17_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/c/curl/curl_7.81.0-1ubuntu1.15_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/w/wget/wget_1.21.2-2ubuntu1_amd64.deb
# 下载CUDA Toolkit 12.1
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
# 验证下载的文件
dir
替代方案:使用脚本自动下载最新版本
如果上面的链接失效,可以使用以下 PowerShell 脚本自动查找并下载最新版本:
# 创建下载脚本
$script = @'
# Python 3.12 包下载脚本
$baseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/"
$packages = @(
"python3.12_",
"python3.12-minimal_",
"python3.12-dev_",
"libpython3.12-dev_",
"libpython3.12-stdlib_",
"libpython3.12-minimal_"
)
Write-Host "正在查找最新版本的Python 3.12包..." -ForegroundColor Yellow
# 获取目录列表
try {
$response = Invoke-WebRequest -Uri $baseUrl -UseBasicParsing
$content = $response.Content
foreach ($pkg in $packages) {
# 查找最新版本的amd64包
$pattern = "$pkg(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"
$matches = [regex]::Matches($content, $pattern)
if ($matches.Count -gt 0) {
# 获取最后一个匹配(通常是最新版本)
$latest = $matches[$matches.Count - 1].Value
$url = "$baseUrl$latest"
$output = ".\$latest"
Write-Host "下载: $latest" -ForegroundColor Green
wget $url -O $output
}
}
Write-Host "`n所有Python 3.12包下载完成!" -ForegroundColor Green
} catch {
Write-Host "错误: $_" -ForegroundColor Red
}
'@
# 保存脚本
$script | Out-File -FilePath ".\download_python312.ps1" -Encoding UTF8
# 执行脚本
.\download_python312.ps1
2.1.3 下载Python包
⚠️ 重要说明:为什么需要指定Linux平台?
Python包分为两类,在跨平台传输时需要特别注意:
1. 纯Python包(平台无关)
- 示例:transformers、tiktoken、sentencepiece、protobuf、modelscope等
- 特点:只包含Python源代码(.py文件),无编译的二进制文件
- 跨平台:✅ 可以在任何操作系统上使用,Windows下载的包可以直接在Ubuntu上安装
2. 包含C扩展的包(平台相关)
- 示例:torch、sglang、flash-attn等
- 特点:包含编译的二进制文件(.pyd、.so、.dll等),这些文件是针对特定操作系统和CPU架构编译的
- 跨平台:❌ Windows下载的是
.pyd(Windows DLL格式),Ubuntu需要.so(Linux共享库格式) - 文件名差异:
# Windows版本(错误) torch-2.2.1+cu121-cp312-cp312-win_amd64.whl # Linux版本(正确) torch-2.2.1+cu121-cp312-cp312-manylinux2014_x86_64.whl
解决方案:在Windows上使用pip download指定Linux平台下载
步骤1:创建requirements.txt
在 C:\offline_packages\python\ 目录下创建 requirements.txt 文件,内容如下:
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
步骤2:下载Linux版本的Python包
# 确保已安装Python 3.12+和pip
python --version
# 进入python目录
cd C:\offline_packages\python
# 方法一:一次性下载所有包(推荐)
# --platform manylinux2014_x86_64: 指定Linux平台
# --python-version 312: 指定Python版本
# --only-binary=:all: 只下载二进制包(避免源码包)
pip download -r requirements.txt -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
# 方法二:如果上面的命令失败,可以分步下载
# 先下载纯Python包(平台无关,不需要指定平台)
pip download transformers==4.40.0 accelerate==0.28.0 sentencepiece==0.2.0 protobuf==4.25.3 tiktoken==0.6.0 modelscope==1.11.0 safetensors==0.4.3 tokenizers==0.15.2 -d .\packages
# 再下载Linux版本的C扩展包(必须指定平台)
pip download torch==2.2.1 -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
pip download sglang -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
pip download flash-attn==2.5.6 -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
步骤3:验证下载的包是否正确
验证方法1:检查文件名中的平台标识
# 列出所有下载的包
dir .\packages
# 检查torch包是否为Linux版本
Get-ChildItem .\packages\torch*.whl | Select-Object Name
# ✅ 正确的文件名应该包含 "manylinux" 标识:
# torch-2.2.1+cu121-cp312-cp312-manylinux2014_x86_64.whl
# ❌ 错误的文件名包含 "win" 标识(如果在Ubuntu上安装会失败):
# torch-2.2.1+cu121-cp312-cp312-win_amd64.whl
验证方法2:检查所有平台相关的包
# 检查所有whl文件的平台信息
Get-ChildItem .\packages\*.whl | ForEach-Object {
$name = $_.Name
if ($name -match "manylinux") {
Write-Host "✅ Linux版本: $name" -ForegroundColor Green
} elseif ($name -match "win") {
Write-Host "❌ Windows版本(错误): $name" -ForegroundColor Red
} else {
Write-Host "ℹ️ 平台无关: $name" -ForegroundColor Cyan
}
}
验证方法3:解压检查包内容(可选)
# 创建临时目录
mkdir temp_check
cd temp_check
# 解压torch包查看内容
# 使用7-Zip或解压工具
& "C:\Program Files\7-Zip\7z.exe" x ..\packages\torch*.whl
# 检查是否包含.so文件(Linux共享库)
Get-ChildItem -Recurse -Filter "*.so" | Select-Object FullName
# 如果有输出,说明是Linux版本 ✅
# 如果没有.so文件,而是.pyd文件,说明是Windows版本 ❌
# 清理临时目录
cd ..
Remove-Item temp_check -Recurse -Force
常见问题排查:
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 文件名包含"win" | 没有指定–platform参数 | 重新下载,添加–platform manylinux2014_x86_64 |
| pip download失败 | 包不支持指定平台 | 使用方法二分步下载,或使用Ubuntu下载 |
| 安装时报错"not a supported wheel" | 平台不匹配 | 检查文件名,确保包含manylinux标识 |
替代方案:在有网络的Ubuntu机器上下载(推荐)
如果Windows下载Linux包遇到问题,建议:
- 找一台有网络的Ubuntu机器(可以是虚拟机、WSL、或临时云主机)
- 在Ubuntu上执行下载命令:
# 在Ubuntu上创建目录
mkdir -p ~/offline_packages/python/packages
cd ~/offline_packages/python
# 创建requirements.txt
cat > requirements.txt << 'EOF'
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
EOF
# 下载所有依赖包
pip download -r requirements.txt -d ./packages
# 打包
cd ~
tar -czf python_packages.tar.gz offline_packages/python/
# 传输到Windows,再传输到目标Ubuntu服务器
2.1.4 下载GLM-4模型
使用ModelScope命令行工具下载
# 安装modelscope
pip install modelscope
# 进入models目录
cd C:\offline_packages\models
# 下载GLM-4-9B-Chat模型(推荐,适合32GB显存)
modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir .\glm-4-9b-chat
# 或下载ChatGLM3-6B(较小,适合测试)
# modelscope download --model ZhipuAI/chatglm3-6b --local_dir .\chatglm3-6b
验证模型文件完整性:
# 检查模型文件
cd C:\offline_packages\models\glm-4-9b-chat
dir
# 应该包含以下文件:
# config.json
# configuration_glm.py
# model-00001-of-00000x.safetensors (多个分片文件)
# model.safetensors.index.json
# tokenizer.model
# tokenizer_config.json
# tokenization_chatglm.py
2.1.5 打包所有文件
使用7-Zip压缩(推荐,压缩率更高)
# 如果已安装7-Zip
& "C:\Program Files\7-Zip\7z.exe" a -tzip C:\sglang_glm_offline_package.zip C:\offline_packages\
# 或压缩为tar.gz格式(需要在Ubuntu上解压)
& "C:\Program Files\7-Zip\7z.exe" a -ttar C:\sglang_glm_offline_package.tar C:\offline_packages\
& "C:\Program Files\7-Zip\7z.exe" a -tgzip C:\sglang_glm_offline_package.tar.gz C:\sglang_glm_offline_package.tar
# 验证打包内容
& "C:\Program Files\7-Zip\7z.exe" l C:\sglang_glm_offline_package.zip
# 查看压缩包大小
Get-Item C:\sglang_glm_offline_package.zip | Select-Object Name, @{Name="Size(GB)";Expression={[math]::Round($_.Length/1GB,2)}}
2.1.6 完整下载代码
# 代码放入download_all.ps1,执行./download_all.ps1
# SGLang GLM Offline Package Download Script
# For Windows - Download Ubuntu 22.04 dependencies
$ErrorActionPreference = "Continue"
$baseDir = "C:\offline_packages"
$systemDir = "$baseDir\system"
$pythonDir = "$baseDir\python"
$packagesDir = "$pythonDir\packages"
$modelsDir = "$baseDir\models"
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "SGLang GLM Offline Package Download" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
# Create directory structure
Write-Host "[1/5] Creating directories..." -ForegroundColor Yellow
New-Item -ItemType Directory -Path $systemDir -Force | Out-Null
New-Item -ItemType Directory -Path $packagesDir -Force | Out-Null
New-Item -ItemType Directory -Path $modelsDir -Force | Out-Null
Write-Host " Done" -ForegroundColor Green
# Download function
function Download-File {
param(
[string]$Url,
[string]$Output
)
$fileName = Split-Path $Output -Leaf
if (Test-Path $Output) {
Write-Host " Exists: $fileName" -ForegroundColor Gray
return $true
}
try {
Write-Host " Downloading: $fileName" -ForegroundColor Green
Invoke-WebRequest -Uri $Url -OutFile $Output -TimeoutSec 300
return $true
} catch {
Write-Host " Failed: $($_.Exception.Message)" -ForegroundColor Red
return $false
}
}
# Find and download latest package
function Download-LatestPackage {
param(
[string]$BaseUrl,
[string]$Pattern,
[string]$PackageName
)
try {
Write-Host " Finding latest: $PackageName..." -ForegroundColor Yellow
$response = Invoke-WebRequest -Uri $BaseUrl -UseBasicParsing -TimeoutSec 30
$content = $response.Content
$regexMatches = [regex]::Matches($content, $Pattern)
if ($regexMatches.Count -gt 0) {
$latest = $regexMatches[$regexMatches.Count - 1].Value
$url = "$BaseUrl$latest"
$output = Join-Path $systemDir $latest
if (Test-Path $output) {
Write-Host " Exists: $latest" -ForegroundColor Gray
return $true
}
Write-Host " Downloading: $latest" -ForegroundColor Green
Invoke-WebRequest -Uri $url -OutFile $output -TimeoutSec 300
return $true
} else {
Write-Host " Not found" -ForegroundColor Red
return $false
}
} catch {
Write-Host " Error: $($_.Exception.Message)" -ForegroundColor Red
return $false
}
}
# 2. Download system packages
Write-Host ""
Write-Host "[2/5] Downloading system packages..." -ForegroundColor Yellow
# Python 3.12 packages
$python312BaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/"
$python312Packages = @(
@{Name="python3.12"; Pattern="python3\.12_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
@{Name="python3.12-minimal"; Pattern="python3\.12-minimal_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
@{Name="python3.12-dev"; Pattern="python3\.12-dev_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
@{Name="libpython3.12-dev"; Pattern="libpython3\.12-dev_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
@{Name="libpython3.12-stdlib"; Pattern="libpython3\.12-stdlib_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
@{Name="libpython3.12-minimal"; Pattern="libpython3\.12-minimal_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"}
)
foreach ($pkg in $python312Packages) {
Download-LatestPackage -BaseUrl $python312BaseUrl -Pattern $pkg.Pattern -PackageName $pkg.Name
}
# python3-pip
Download-File -Url "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python-pip/python3-pip_23.0.1+dfsg-1_all.deb" -Output "$systemDir\python3-pip_23.0.1+dfsg-1_all.deb"
# GCC packages
$gccBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/"
Download-LatestPackage -BaseUrl $gccBaseUrl -Pattern "gcc-11_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "gcc-11"
Download-LatestPackage -BaseUrl $gccBaseUrl -Pattern "g\+\+-11_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "g++-11"
# build-essential
Download-File -Url "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/b/build-essential/build-essential_12.9ubuntu3_amd64.deb" -Output "$systemDir\build-essential_12.9ubuntu3_amd64.deb"
# make
$makeBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/m/make/"
Download-LatestPackage -BaseUrl $makeBaseUrl -Pattern "make_(\d+\.\d+-\d+\.\d+build\d+)_amd64\.deb" -PackageName "make"
# dpkg-dev
$dpkgBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/d/dpkg-dev/"
Download-LatestPackage -BaseUrl $dpkgBaseUrl -Pattern "dpkg-dev_(\d+\.\d+\.\d+ubuntu\d+\.\d+)_all\.deb" -PackageName "dpkg-dev"
# git
$gitBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/git/"
Download-LatestPackage -BaseUrl $gitBaseUrl -Pattern "git_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "git"
# curl
$curlBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/c/curl/"
Download-LatestPackage -BaseUrl $curlBaseUrl -Pattern "curl_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "curl"
# wget
$wgetBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/w/wget/"
Download-LatestPackage -BaseUrl $wgetBaseUrl -Pattern "wget_(\d+\.\d+\.\d+-\d+ubuntu\d+)_amd64\.deb" -PackageName "wget"
Write-Host " System packages done" -ForegroundColor Green
# 3. Create requirements.txt and download Python packages
Write-Host ""
Write-Host "[3/5] Creating requirements.txt and downloading Python packages..." -ForegroundColor Yellow
$requirementsContent = @"
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
"@
$requirementsPath = "$pythonDir\requirements.txt"
[System.IO.File]::WriteAllText($requirementsPath, $requirementsContent, [System.Text.Encoding]::UTF8)
Write-Host " requirements.txt created" -ForegroundColor Green
# Download pure Python packages (platform independent)
Write-Host " Downloading pure Python packages..." -ForegroundColor Cyan
$purePythonPackages = @(
"transformers==4.40.0",
"accelerate==0.28.0",
"sentencepiece==0.2.0",
"protobuf==4.25.3",
"tiktoken==0.6.0",
"modelscope==1.11.0",
"safetensors==0.4.3",
"tokenizers==0.15.2"
)
foreach ($pkg in $purePythonPackages) {
Write-Host " Downloading: $pkg" -ForegroundColor Green
& pip download $pkg -d $packagesDir 2>&1 | Out-Null
}
# Download Linux C extension packages
Write-Host " Downloading Linux C extension packages..." -ForegroundColor Cyan
Write-Host " Downloading: torch==2.2.1 (Linux)" -ForegroundColor Green
& pip download torch==2.2.1 -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null
Write-Host " Downloading: sglang (Linux)" -ForegroundColor Green
& pip download sglang -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null
Write-Host " Downloading: flash-attn==2.5.6 (Linux)" -ForegroundColor Green
& pip download flash-attn==2.5.6 -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null
Write-Host " Python packages done" -ForegroundColor Green
# 4. Download GLM model
Write-Host ""
Write-Host "[4/5] Downloading GLM-4-9B-Chat model..." -ForegroundColor Yellow
Write-Host " This will download about 18GB, please wait..." -ForegroundColor Yellow
$modelDir = "$modelsDir\glm-4-9b-chat"
if (Test-Path $modelDir) {
Write-Host " Model directory exists, skipping" -ForegroundColor Gray
} else {
$modelscopeCheck = pip show modelscope 2>&1
if ($LASTEXITCODE -ne 0) {
Write-Host " Installing modelscope..." -ForegroundColor Yellow
& pip install modelscope 2>&1 | Out-Null
}
Write-Host " Starting model download..." -ForegroundColor Green
& modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir $modelDir
}
Write-Host " Model download done" -ForegroundColor Green
# 5. Verify downloads
Write-Host ""
Write-Host "[5/5] Verifying downloads..." -ForegroundColor Yellow
Write-Host ""
Write-Host "System packages:" -ForegroundColor Cyan
$systemFiles = Get-ChildItem $systemDir -Filter "*.deb"
$totalSystemSize = 0
foreach ($file in $systemFiles) {
$sizeMB = [math]::Round($file.Length / 1MB, 2)
$totalSystemSize += $sizeMB
Write-Host " $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host " Total: $([math]::Round($totalSystemSize, 2)) MB" -ForegroundColor Green
Write-Host ""
Write-Host "Python packages:" -ForegroundColor Cyan
$pythonFiles = Get-ChildItem $packagesDir -Filter "*.whl"
$totalPythonSize = 0
foreach ($file in $pythonFiles) {
$sizeMB = [math]::Round($file.Length / 1MB, 2)
$totalPythonSize += $sizeMB
Write-Host " $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host " Total: $([math]::Round($totalPythonSize, 2)) MB" -ForegroundColor Green
Write-Host ""
Write-Host "Model files:" -ForegroundColor Cyan
if (Test-Path $modelDir) {
$modelFiles = Get-ChildItem $modelDir
$totalModelSize = 0
foreach ($file in $modelFiles) {
$sizeMB = [math]::Round($file.Length / 1MB, 2)
$totalModelSize += $sizeMB
Write-Host " $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host " Total: $([math]::Round($totalModelSize, 2)) MB" -ForegroundColor Green
}
Write-Host ""
Write-Host "========================================" -ForegroundColor Green
Write-Host "All downloads completed!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
Write-Host ""
Write-Host "Download directory: $baseDir" -ForegroundColor Yellow
Write-Host ""
Write-Host "Next steps:" -ForegroundColor Yellow
Write-Host "1. Package C:\offline_packages directory" -ForegroundColor White
Write-Host "2. Transfer to target Ubuntu server" -ForegroundColor White
Write-Host "3. Follow SGLang_GLM_Installation_Guide.md for offline installation" -ForegroundColor White
2.2 离线环境安装步骤
2.2.1 传输文件到目标主机
传输方法:
-
使用U盘或移动硬盘:
- 将
sglang_glm_offline_package.zip复制到U盘 - 插入Ubuntu服务器,挂载U盘
- 复制文件到服务器
- 将
-
使用SCP传输(如果网络可达):
# 在Ubuntu服务器上执行 scp user@windows-machine:/path/to/sglang_glm_offline_package.zip /tmp/ -
使用SFTP工具(如FileZilla、WinSCP):
- 从Windows上传到Ubuntu服务器
在Ubuntu上解压:
# 如果是ZIP格式
cd /tmp
unzip sglang_glm_offline_package.zip
# 如果是tar.gz格式
cd /tmp
tar -xzf sglang_glm_offline_package.tar.gz
# 进入解压目录
cd offline_packages
注意:Windows和Linux文件系统差异
- Windows文件名不区分大小写,Linux区分大小写
- Windows换行符为
\r\n,Linux为\n - Python脚本文件在传输后可能需要转换换行符:
# 安装dos2unix工具 sudo apt-get install dos2unix # 转换换行符 dos2unix *.py *.sh
2.2.2 安装系统依赖
# 安装deb包
cd system
sudo dpkg -i *.deb
sudo apt-get install -f # 修复依赖关系
# 安装CUDA Toolkit(如果需要)
sudo sh cuda_12.1.0_530.30.02_linux.run --toolkit --silent
# 配置CUDA环境变量
echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
2.2.3 创建Python虚拟环境
cd ~/offline_packages
python3.12 -m venv sglang_env
source sglang_env/bin/activate
2.2.4 安装Python包
cd python
pip install --no-index --find-links=./packages -r requirements.txt
2.2.5 部署模型文件
# 创建模型目录
sudo mkdir -p /opt/models
sudo chown $USER:$USER /opt/models
# 移动模型文件
mv ~/offline_packages/models/glm-4-9b-chat /opt/models/
# 验证模型文件完整性
ls -lh /opt/models/glm-4-9b-chat/
预期模型文件列表:
config.json
configuration_glm.py
model-00001-of-000000.safetensors # 多个分片文件
model.safetensors.index.json
tokenizer.model
tokenizer_config.json
tokenization_chatglm.py
3. 场景二:联网环境安装
3.1 系统环境准备
3.1.1 更新系统
sudo apt-get update
sudo apt-get upgrade -y
3.1.2 安装系统依赖
sudo apt-get install -y \
build-essential \
git \
curl \
wget \
python3.12 \
python3.12-venv \
python3.12-dev \
python3-pip
3.1.3 安装CUDA Toolkit 12.1
# 下载CUDA仓库配置
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
# 添加CUDA仓库
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
# 安装CUDA
sudo apt-get update
sudo apt-get install -y cuda-12-1
# 配置环境变量
echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
# 验证安装
nvidia-smi
nvcc --version
3.2 安装SGLang及依赖
3.2.1 创建Python虚拟环境
python3.12 -m venv ~/sglang_env
source ~/sglang_env/bin/activate
3.2.2 升级pip和setuptools
pip install --upgrade pip setuptools wheel
3.2.3 安装PyTorch(CUDA 12.1版本)
pip install torch==2.2.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
3.2.4 安装SGLang
pip install "sglang[all]"
3.2.5 安装其他依赖
pip install \
transformers==4.40.0 \
accelerate==0.28.0 \
sentencepiece==0.2.0 \
protobuf==4.25.3 \
tiktoken==0.6.0
3.3 下载GLM-4模型
3.3.1 从ModelScope下载(国内推荐)
# 安装modelscope
pip install modelscope
# 创建模型目录
sudo mkdir -p /opt/models
sudo chown $USER:$USER /opt/models
# 下载GLM-4-9B-Chat模型
modelscope download \
--model ZhipuAI/glm-4-9b-chat \
--local_dir /opt/models/glm-4-9b-chat
# 或下载ChatGLM3-6B(较小,适合测试)
# modelscope download \
# --model ZhipuAI/chatglm3-6b \
# --local_dir /opt/models/chatglm3-6b
3.4 验证安装
# 验证PyTorch CUDA支持
python -c "import torch; print(f'PyTorch版本: {torch.__version__}'); print(f'CUDA可用: {torch.cuda.is_available()}'); print(f'CUDA版本: {torch.version.cuda}')"
# 验证SGLang安装
python -c "import sglang; print(f'SGLang版本: {sglang.__version__}')"
# 验证模型文件
ls -lh /opt/models/glm-4-9b-chat/
4. 服务启动与验证
4.1 创建启动脚本
创建服务启动脚本:
cat > ~/start_sglang_server.sh << 'EOF'
#!/bin/bash
# 激活虚拟环境
source ~/sglang_env/bin/activate
# 设置环境变量
export CUDA_VISIBLE_DEVICES=0
# 启动SGLang服务
python -m sglang.launch_server \
--model-path /opt/models/glm-4-9b-chat \
--host 0.0.0.0 \
--port 8000 \
--tp 1 \
--mem-fraction-static 0.9 \
--trust-remote-code
EOF
chmod +x ~/start_sglang_server.sh
4.2 启动服务
4.2.1 前台启动(测试用)
~/start_sglang_server.sh
4.2.2 后台启动(生产环境)
nohup ~/start_sglang_server.sh > ~/sglang_server.log 2>&1 &
# 查看日志
tail -f ~/sglang_server.log
4.2.3 使用systemd管理服务(推荐)
创建systemd服务文件:
sudo cat > /etc/systemd/system/sglang-glm.service << 'EOF'
[Unit]
Description=SGLang GLM-4-9B-Chat Service
After=network.target
[Service]
Type=simple
User=your_username
WorkingDirectory=/home/your_username
Environment="PATH=/home/your_username/sglang_env/bin:/usr/local/cuda-12.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="CUDA_VISIBLE_DEVICES=0"
ExecStart=/home/your_username/sglang_env/bin/python -m sglang.launch_server \
--model-path /opt/models/glm-4-9b-chat \
--host 0.0.0.0 \
--port 8000 \
--tp 1 \
--mem-fraction-static 0.9 \
--trust-remote-code
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# 替换your_username为实际用户名
sudo sed -i "s/your_username/$USER/g" /etc/systemd/system/sglang-glm.service
# 启动服务
sudo systemctl daemon-reload
sudo systemctl start sglang-glm
sudo systemctl enable sglang-glm
# 查看服务状态
sudo systemctl status sglang-glm
4.3 验证服务
4.3.1 检查服务状态
# 检查端口监听
netstat -tlnp | grep 8000
# 或使用ss命令
ss -tlnp | grep 8000
# 检查进程
ps aux | grep sglang
4.3.2 测试API接口
测试1:获取模型列表
curl http://localhost:8000/v1/models
预期输出:
{
"object": "list",
"data": [
{
"id": "/opt/models/glm-4-9b-chat",
"object": "model",
"created": 1234567890,
"owned_by": "sglang"
}
]
}
测试2:发送聊天请求
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/opt/models/glm-4-9b-chat",
"messages": [
{"role": "user", "content": "你好,请介绍一下你自己。"}
],
"max_tokens": 100,
"temperature": 0.7
}'
预期输出:
{
"id": "cmpl-xxxxx",
"object": "chat.completion",
"created": 1234567890,
"model": "/opt/models/glm-4-9b-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "你好!我是GLM-4,由智谱AI开发的大型语言模型..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 50,
"total_tokens": 65
}
}
测试3:使用Python客户端
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy"
)
response = client.chat.completions.create(
model="/opt/models/glm-4-9b-chat",
messages=[
{"role": "user", "content": "请用Python写一个快速排序算法"}
],
max_tokens=500,
temperature=0.7
)
print(response.choices[0].message.content)
4.3.3 性能基准测试
# 安装benchmark工具
pip install aiohttp
# 使用SGLang自带的benchmark脚本
python -m sglang.bench_serving \
--model /opt/models/glm-4-9b-chat \
--host localhost \
--port 8000 \
--num-prompts 100 \
--max-tokens 100
4.4 成功验证标准
✅ 服务启动成功标准:
- 服务进程正常运行,无崩溃
- 端口8000正常监听
- API接口
/v1/models返回正确模型信息 - 聊天接口能正常返回响应
- GPU显存利用率在80%-95%之间
- 单次推理延迟 < 2秒(首token)
- 吞吐量 > 20 tokens/秒
5. 常见问题与解决方案
5.1 CUDA相关问题
问题1:CUDA out of memory
错误信息:
RuntimeError: CUDA out of memory
解决方案:
# 降低GPU显存利用率
--mem-fraction-static 0.8
# 减少最大序列长度(通过环境变量)
export SGLANG_MAX_CONTEXT_LEN=4096
# 使用量化模型
pip install auto-gptq
# 下载量化版本模型
问题2:CUDA版本不匹配
错误信息:
RuntimeError: CUDA version mismatch
解决方案:
# 检查CUDA版本
nvidia-smi
nvcc --version
# 重新安装匹配版本的PyTorch
pip uninstall torch
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121
5.2 模型加载问题
问题3:模型文件缺失
错误信息:
FileNotFoundError: Cannot find model files
解决方案:
# 检查模型文件完整性
ls -lh /opt/models/glm-4-9b-chat/
# 重新下载缺失文件(使用ModelScope)
pip install modelscope
modelscope download \
--model ZhipuAI/glm-4-9b-chat \
--local_dir /opt/models/glm-4-9b-chat
问题4:trust_remote_code错误
错误信息:
ValueError: The repository contains custom code
解决方案:
# 在启动命令中添加 --trust-remote-code 参数
python -m sglang.launch_server \
--model-path /opt/models/glm-4-9b-chat \
--trust-remote-code
5.3 网络问题
问题5:离线环境缺少依赖
解决方案:
# 在有网环境下载所有依赖
pip download -r requirements.txt -d ./packages
# 在离线环境安装
pip install --no-index --find-links=./packages -r requirements.txt
问题6:模型下载慢或失败
解决方案:
# 使用ModelScope国内镜像
pip install modelscope
# 配置ModelScope镜像(可选,默认已优化)
export MODELSCOPE_CACHE=/opt/models
# 下载模型
modelscope download \
--model ZhipuAI/glm-4-9b-chat \
--local_dir /opt/models/glm-4-9b-chat
# 如果下载中断,支持断点续传
# ModelScope会自动检测已下载的文件,继续下载未完成的部分
5.4 性能问题
问题7:推理速度慢
解决方案:
# 启用Flash Attention
pip install flash-attn --no-build-isolation
# SGLang会自动使用Flash Attention(如果可用)
# 调整batch size
export SGLANG_MAX_RUNNING_REQUESTS=256
问题8:首token延迟高
解决方案:
# 预热模型
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "/opt/models/glm-4-9b-chat", "messages": [{"role": "user", "content": "hi"}], "max_tokens": 10}'
# 使用RadixAttention(SGLang特有优化)
# SGLang默认已启用
6. 性能优化建议
6.1 硬件配置优化
针对8核CPU / 32GB内存 / 32GB GPU配置:
# 优化启动参数
python -m sglang.launch_server \
--model-path /opt/models/glm-4-9b-chat \
--trust-remote-code \
--host 0.0.0.0 \
--port 8000 \
--tp 1 \
--mem-fraction-static 0.9 \
--max-running-requests 128
6.2 系统参数优化
# 增加文件描述符限制
ulimit -n 65535
# 永久生效
echo "* soft nofile 65535" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65535" | sudo tee -a /etc/security/limits.conf
# 优化内核参数
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535
6.3 监控与日志
# 实时监控GPU使用
watch -n 1 nvidia-smi
# 监控服务日志
tail -f ~/sglang_server.log
# 使用Prometheus监控(可选)
pip install prometheus-client
# SGLang默认已启用metrics端点:http://localhost:8000/metrics
6.4 负载均衡(多实例部署)
如果需要更高吞吐量,可以部署多个实例:
# 实例1(端口8000)
python -m sglang.launch_server \
--model-path /opt/models/glm-4-9b-chat \
--port 8000 \
--mem-fraction-static 0.45
# 实例2(端口8001)
CUDA_VISIBLE_DEVICES=1 python -m sglang.launch_server \
--model-path /opt/models/glm-4-9b-chat \
--port 8001 \
--mem-fraction-static 0.45
# 使用Nginx负载均衡
sudo apt-get install nginx
sudo tee /etc/nginx/sites-available/sglang << 'EOF'
upstream sglang_backend {
server localhost:8000;
server localhost:8001;
}
server {
listen 80;
location / {
proxy_pass http://sglang_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
EOF
sudo ln -s /etc/nginx/sites-available/sglang /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
7. 附录
7.1 完整的requirements.txt
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
7.2 快速启动命令汇总
联网环境快速安装:
# 一键安装脚本
pip install "sglang[all]"
离线环境快速验证:
# 验证所有组件
python << 'EOF'
import torch
import sglang
import transformers
print(f"PyTorch: {torch.__version__}")
print(f"CUDA可用: {torch.cuda.is_available()}")
print(f"SGLang: {sglang.__version__}")
print(f"Transformers: {transformers.__version__}")
print(f"GPU设备: {torch.cuda.get_device_name(0)}")
print(f"GPU显存: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
EOF
注意事项:
- 本文档基于SGLang最新版本编写,如使用其他版本请参考官方文档
- GLM-4-9B模型需要约18GB显存,32GB显存配置可支持较大batch size
- 生产环境建议使用systemd管理服务,确保服务自动重启
- 定期检查GPU温度和显存使用情况,避免过热或OOM
- 建议在非生产环境先进行测试,验证所有功能正常后再部署到生产环境
- SGLang相比vLLM具有更好的吞吐量和更低的延迟,特别适合高并发场景
更多推荐
所有评论(0)