SGLang部署GLM大模型详细指导

文档信息

  • 目标环境: Ubuntu 22.04
  • 硬件配置: 8核CPU / 32GB内存 / 32GB GPU显存
  • 目标模型: GLM-4-9B-Chat / ChatGLM3-6B
  • 部署框架: SGLang
  • Python版本: 3.12
  • 文档版本: v1.0

目录

  1. 系统环境检查
  2. 场景一:完全离线环境安装
  3. 场景二:联网环境安装
  4. 服务启动与验证
  5. 常见问题与解决方案
  6. 性能优化建议

1. 系统环境检查

1.1 硬件资源验证

# 检查CPU核心数
lscpu | grep "CPU(s):"

# 检查内存大小
free -h

# 检查GPU信息
nvidia-smi

# 检查磁盘空间(建议至少50GB可用空间)
df -h

预期输出验证标准:

  • CPU核心数 ≥ 8
  • 内存 ≥ 32GB
  • GPU显存 ≥ 32GB
  • 磁盘可用空间 ≥ 50GB

1.2 操作系统版本验证

# 检查Ubuntu版本
lsb_release -a

# 检查内核版本
uname -r

预期输出:

  • Ubuntu 22.04 LTS
  • 内核版本 ≥ 5.15

1.3 GPU驱动检查

# 检查NVIDIA驱动版本
nvidia-smi | grep "Driver Version"

# 检查CUDA版本(如果已安装)
nvcc --version 2>/dev/null || echo "CUDA未安装"

要求:

  • NVIDIA驱动版本 ≥ 525.60.13
  • CUDA版本 ≥ 12.0(推荐12.1或更高)

2. 场景一:完全离线环境安装

2.1 准备工作(在Windows系统的有网络机器上)

2.1.1 创建下载目录

在Windows PowerShell中创建下载目录:

# 创建下载目录结构
New-Item -ItemType Directory -Path "C:\offline_packages\system" -Force
New-Item -ItemType Directory -Path "C:\offline_packages\python\packages" -Force
New-Item -ItemType Directory -Path "C:\offline_packages\models" -Force

# 进入下载目录
cd C:\offline_packages
2.1.2 下载系统依赖包

⚠️ 重要提示:版本号会定期更新

清华镜像上的软件包版本号会随着安全更新而变化。如果下面的下载链接返回 404 错误,请按以下步骤查找最新版本:

  1. 访问清华镜像目录:https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/
  2. 查找最新的版本号(例如 3.12.3-1ubuntu0.13
  3. 将下面命令中的版本号替换为最新版本

使用wget下载所有依赖包(Windows版)

# 安装wget(如果没有)
# 访问 https://eternallybored.org/misc/wget/ 下载wget.exe,放到C:\Windows\System32\

# 进入system目录
cd C:\offline_packages\system

# 下载Python 3.12相关包(Ubuntu 22.04)
# 注意:版本号可能会更新,请访问清华镜像确认最新版本
# 当前最新版本:3.12.3-1ubuntu0.13 (2026-04-08)
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12-minimal_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12-dev_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-dev_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-stdlib_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-minimal_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python-pip/python3-pip_23.0.1+dfsg-1_all.deb

# 下载编译工具和依赖
# 注意:以下版本号可能会更新,请访问清华镜像确认最新版本
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/b/build-essential/build-essential_12.9ubuntu3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/gcc-11_11.4.0-1ubuntu1~22.04.3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/g++-11_11.4.0-1ubuntu1~22.04.3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/m/make/make_4.3-4.1build1_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/d/dpkg-dev/dpkg-dev_1.21.1ubuntu2.3_all.deb

# 下载其他工具
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/git/git_2.34.1-1ubuntu1.17_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/c/curl/curl_7.81.0-1ubuntu1.15_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/w/wget/wget_1.21.2-2ubuntu1_amd64.deb

# 下载CUDA Toolkit 12.1
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run

# 验证下载的文件
dir

替代方案:使用脚本自动下载最新版本

如果上面的链接失效,可以使用以下 PowerShell 脚本自动查找并下载最新版本:

# 创建下载脚本
$script = @'
# Python 3.12 包下载脚本
$baseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/"
$packages = @(
    "python3.12_",
    "python3.12-minimal_",
    "python3.12-dev_",
    "libpython3.12-dev_",
    "libpython3.12-stdlib_",
    "libpython3.12-minimal_"
)

Write-Host "正在查找最新版本的Python 3.12包..." -ForegroundColor Yellow

# 获取目录列表
try {
    $response = Invoke-WebRequest -Uri $baseUrl -UseBasicParsing
    $content = $response.Content
    
    foreach ($pkg in $packages) {
        # 查找最新版本的amd64包
        $pattern = "$pkg(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"
        $matches = [regex]::Matches($content, $pattern)
        
        if ($matches.Count -gt 0) {
            # 获取最后一个匹配(通常是最新版本)
            $latest = $matches[$matches.Count - 1].Value
            $url = "$baseUrl$latest"
            $output = ".\$latest"
            
            Write-Host "下载: $latest" -ForegroundColor Green
            wget $url -O $output
        }
    }
    
    Write-Host "`n所有Python 3.12包下载完成!" -ForegroundColor Green
} catch {
    Write-Host "错误: $_" -ForegroundColor Red
}
'@

# 保存脚本
$script | Out-File -FilePath ".\download_python312.ps1" -Encoding UTF8

# 执行脚本
.\download_python312.ps1
2.1.3 下载Python包

⚠️ 重要说明:为什么需要指定Linux平台?

Python包分为两类,在跨平台传输时需要特别注意:

1. 纯Python包(平台无关)

  • 示例:transformers、tiktoken、sentencepiece、protobuf、modelscope等
  • 特点:只包含Python源代码(.py文件),无编译的二进制文件
  • 跨平台:✅ 可以在任何操作系统上使用,Windows下载的包可以直接在Ubuntu上安装

2. 包含C扩展的包(平台相关)

  • 示例:torch、sglang、flash-attn等
  • 特点:包含编译的二进制文件(.pyd、.so、.dll等),这些文件是针对特定操作系统和CPU架构编译的
  • 跨平台:❌ Windows下载的是.pyd(Windows DLL格式),Ubuntu需要.so(Linux共享库格式)
  • 文件名差异
    # Windows版本(错误)
    torch-2.2.1+cu121-cp312-cp312-win_amd64.whl
    
    # Linux版本(正确)
    torch-2.2.1+cu121-cp312-cp312-manylinux2014_x86_64.whl
    

解决方案:在Windows上使用pip download指定Linux平台下载

步骤1:创建requirements.txt

C:\offline_packages\python\ 目录下创建 requirements.txt 文件,内容如下:

sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6

步骤2:下载Linux版本的Python包

# 确保已安装Python 3.12+和pip
python --version

# 进入python目录
cd C:\offline_packages\python

# 方法一:一次性下载所有包(推荐)
# --platform manylinux2014_x86_64: 指定Linux平台
# --python-version 312: 指定Python版本
# --only-binary=:all: 只下载二进制包(避免源码包)
pip download -r requirements.txt -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:

# 方法二:如果上面的命令失败,可以分步下载
# 先下载纯Python包(平台无关,不需要指定平台)
pip download transformers==4.40.0 accelerate==0.28.0 sentencepiece==0.2.0 protobuf==4.25.3 tiktoken==0.6.0 modelscope==1.11.0 safetensors==0.4.3 tokenizers==0.15.2 -d .\packages

# 再下载Linux版本的C扩展包(必须指定平台)
pip download torch==2.2.1 -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
pip download sglang -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
pip download flash-attn==2.5.6 -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:

步骤3:验证下载的包是否正确

验证方法1:检查文件名中的平台标识

# 列出所有下载的包
dir .\packages

# 检查torch包是否为Linux版本
Get-ChildItem .\packages\torch*.whl | Select-Object Name

# ✅ 正确的文件名应该包含 "manylinux" 标识:
# torch-2.2.1+cu121-cp312-cp312-manylinux2014_x86_64.whl

# ❌ 错误的文件名包含 "win" 标识(如果在Ubuntu上安装会失败):
# torch-2.2.1+cu121-cp312-cp312-win_amd64.whl

验证方法2:检查所有平台相关的包

# 检查所有whl文件的平台信息
Get-ChildItem .\packages\*.whl | ForEach-Object {
    $name = $_.Name
    if ($name -match "manylinux") {
        Write-Host "✅ Linux版本: $name" -ForegroundColor Green
    } elseif ($name -match "win") {
        Write-Host "❌ Windows版本(错误): $name" -ForegroundColor Red
    } else {
        Write-Host "ℹ️  平台无关: $name" -ForegroundColor Cyan
    }
}

验证方法3:解压检查包内容(可选)

# 创建临时目录
mkdir temp_check
cd temp_check

# 解压torch包查看内容
# 使用7-Zip或解压工具
& "C:\Program Files\7-Zip\7z.exe" x ..\packages\torch*.whl

# 检查是否包含.so文件(Linux共享库)
Get-ChildItem -Recurse -Filter "*.so" | Select-Object FullName

# 如果有输出,说明是Linux版本 ✅
# 如果没有.so文件,而是.pyd文件,说明是Windows版本 ❌

# 清理临时目录
cd ..
Remove-Item temp_check -Recurse -Force

常见问题排查

问题 原因 解决方案
文件名包含"win" 没有指定–platform参数 重新下载,添加–platform manylinux2014_x86_64
pip download失败 包不支持指定平台 使用方法二分步下载,或使用Ubuntu下载
安装时报错"not a supported wheel" 平台不匹配 检查文件名,确保包含manylinux标识

替代方案:在有网络的Ubuntu机器上下载(推荐)

如果Windows下载Linux包遇到问题,建议:

  1. 找一台有网络的Ubuntu机器(可以是虚拟机、WSL、或临时云主机)
  2. 在Ubuntu上执行下载命令:
# 在Ubuntu上创建目录
mkdir -p ~/offline_packages/python/packages
cd ~/offline_packages/python

# 创建requirements.txt
cat > requirements.txt << 'EOF'
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
EOF

# 下载所有依赖包
pip download -r requirements.txt -d ./packages

# 打包
cd ~
tar -czf python_packages.tar.gz offline_packages/python/

# 传输到Windows,再传输到目标Ubuntu服务器
2.1.4 下载GLM-4模型

使用ModelScope命令行工具下载

# 安装modelscope
pip install modelscope

# 进入models目录
cd C:\offline_packages\models

# 下载GLM-4-9B-Chat模型(推荐,适合32GB显存)
modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir .\glm-4-9b-chat

# 或下载ChatGLM3-6B(较小,适合测试)
# modelscope download --model ZhipuAI/chatglm3-6b --local_dir .\chatglm3-6b

验证模型文件完整性

# 检查模型文件
cd C:\offline_packages\models\glm-4-9b-chat
dir

# 应该包含以下文件:
# config.json
# configuration_glm.py
# model-00001-of-00000x.safetensors (多个分片文件)
# model.safetensors.index.json
# tokenizer.model
# tokenizer_config.json
# tokenization_chatglm.py
2.1.5 打包所有文件

使用7-Zip压缩(推荐,压缩率更高)

# 如果已安装7-Zip
& "C:\Program Files\7-Zip\7z.exe" a -tzip C:\sglang_glm_offline_package.zip C:\offline_packages\

# 或压缩为tar.gz格式(需要在Ubuntu上解压)
& "C:\Program Files\7-Zip\7z.exe" a -ttar C:\sglang_glm_offline_package.tar C:\offline_packages\
& "C:\Program Files\7-Zip\7z.exe" a -tgzip C:\sglang_glm_offline_package.tar.gz C:\sglang_glm_offline_package.tar

# 验证打包内容
& "C:\Program Files\7-Zip\7z.exe" l C:\sglang_glm_offline_package.zip

# 查看压缩包大小
Get-Item C:\sglang_glm_offline_package.zip | Select-Object Name, @{Name="Size(GB)";Expression={[math]::Round($_.Length/1GB,2)}}
2.1.6 完整下载代码
# 代码放入download_all.ps1,执行./download_all.ps1
# SGLang GLM Offline Package Download Script
# For Windows - Download Ubuntu 22.04 dependencies

$ErrorActionPreference = "Continue"

$baseDir = "C:\offline_packages"
$systemDir = "$baseDir\system"
$pythonDir = "$baseDir\python"
$packagesDir = "$pythonDir\packages"
$modelsDir = "$baseDir\models"

Write-Host "========================================" -ForegroundColor Cyan
Write-Host "SGLang GLM Offline Package Download" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""

# Create directory structure
Write-Host "[1/5] Creating directories..." -ForegroundColor Yellow
New-Item -ItemType Directory -Path $systemDir -Force | Out-Null
New-Item -ItemType Directory -Path $packagesDir -Force | Out-Null
New-Item -ItemType Directory -Path $modelsDir -Force | Out-Null
Write-Host "  Done" -ForegroundColor Green

# Download function
function Download-File {
    param(
        [string]$Url,
        [string]$Output
    )
    
    $fileName = Split-Path $Output -Leaf
    if (Test-Path $Output) {
        Write-Host "  Exists: $fileName" -ForegroundColor Gray
        return $true
    }
    
    try {
        Write-Host "  Downloading: $fileName" -ForegroundColor Green
        Invoke-WebRequest -Uri $Url -OutFile $Output -TimeoutSec 300
        return $true
    } catch {
        Write-Host "  Failed: $($_.Exception.Message)" -ForegroundColor Red
        return $false
    }
}

# Find and download latest package
function Download-LatestPackage {
    param(
        [string]$BaseUrl,
        [string]$Pattern,
        [string]$PackageName
    )
    
    try {
        Write-Host "  Finding latest: $PackageName..." -ForegroundColor Yellow
        $response = Invoke-WebRequest -Uri $BaseUrl -UseBasicParsing -TimeoutSec 30
        $content = $response.Content
        
        $regexMatches = [regex]::Matches($content, $Pattern)
        
        if ($regexMatches.Count -gt 0) {
            $latest = $regexMatches[$regexMatches.Count - 1].Value
            $url = "$BaseUrl$latest"
            $output = Join-Path $systemDir $latest
            
            if (Test-Path $output) {
                Write-Host "  Exists: $latest" -ForegroundColor Gray
                return $true
            }
            
            Write-Host "  Downloading: $latest" -ForegroundColor Green
            Invoke-WebRequest -Uri $url -OutFile $output -TimeoutSec 300
            return $true
        } else {
            Write-Host "  Not found" -ForegroundColor Red
            return $false
        }
    } catch {
        Write-Host "  Error: $($_.Exception.Message)" -ForegroundColor Red
        return $false
    }
}

# 2. Download system packages
Write-Host ""
Write-Host "[2/5] Downloading system packages..." -ForegroundColor Yellow

# Python 3.12 packages
$python312BaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/"
$python312Packages = @(
    @{Name="python3.12"; Pattern="python3\.12_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="python3.12-minimal"; Pattern="python3\.12-minimal_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="python3.12-dev"; Pattern="python3\.12-dev_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="libpython3.12-dev"; Pattern="libpython3\.12-dev_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="libpython3.12-stdlib"; Pattern="libpython3\.12-stdlib_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="libpython3.12-minimal"; Pattern="libpython3\.12-minimal_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"}
)

foreach ($pkg in $python312Packages) {
    Download-LatestPackage -BaseUrl $python312BaseUrl -Pattern $pkg.Pattern -PackageName $pkg.Name
}

# python3-pip
Download-File -Url "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python-pip/python3-pip_23.0.1+dfsg-1_all.deb" -Output "$systemDir\python3-pip_23.0.1+dfsg-1_all.deb"

# GCC packages
$gccBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/"
Download-LatestPackage -BaseUrl $gccBaseUrl -Pattern "gcc-11_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "gcc-11"
Download-LatestPackage -BaseUrl $gccBaseUrl -Pattern "g\+\+-11_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "g++-11"

# build-essential
Download-File -Url "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/b/build-essential/build-essential_12.9ubuntu3_amd64.deb" -Output "$systemDir\build-essential_12.9ubuntu3_amd64.deb"

# make
$makeBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/m/make/"
Download-LatestPackage -BaseUrl $makeBaseUrl -Pattern "make_(\d+\.\d+-\d+\.\d+build\d+)_amd64\.deb" -PackageName "make"

# dpkg-dev
$dpkgBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/d/dpkg-dev/"
Download-LatestPackage -BaseUrl $dpkgBaseUrl -Pattern "dpkg-dev_(\d+\.\d+\.\d+ubuntu\d+\.\d+)_all\.deb" -PackageName "dpkg-dev"

# git
$gitBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/git/"
Download-LatestPackage -BaseUrl $gitBaseUrl -Pattern "git_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "git"

# curl
$curlBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/c/curl/"
Download-LatestPackage -BaseUrl $curlBaseUrl -Pattern "curl_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "curl"

# wget
$wgetBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/w/wget/"
Download-LatestPackage -BaseUrl $wgetBaseUrl -Pattern "wget_(\d+\.\d+\.\d+-\d+ubuntu\d+)_amd64\.deb" -PackageName "wget"

Write-Host "  System packages done" -ForegroundColor Green

# 3. Create requirements.txt and download Python packages
Write-Host ""
Write-Host "[3/5] Creating requirements.txt and downloading Python packages..." -ForegroundColor Yellow

$requirementsContent = @"
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
"@

$requirementsPath = "$pythonDir\requirements.txt"
[System.IO.File]::WriteAllText($requirementsPath, $requirementsContent, [System.Text.Encoding]::UTF8)
Write-Host "  requirements.txt created" -ForegroundColor Green

# Download pure Python packages (platform independent)
Write-Host "  Downloading pure Python packages..." -ForegroundColor Cyan
$purePythonPackages = @(
    "transformers==4.40.0",
    "accelerate==0.28.0",
    "sentencepiece==0.2.0",
    "protobuf==4.25.3",
    "tiktoken==0.6.0",
    "modelscope==1.11.0",
    "safetensors==0.4.3",
    "tokenizers==0.15.2"
)

foreach ($pkg in $purePythonPackages) {
    Write-Host "    Downloading: $pkg" -ForegroundColor Green
    & pip download $pkg -d $packagesDir 2>&1 | Out-Null
}

# Download Linux C extension packages
Write-Host "  Downloading Linux C extension packages..." -ForegroundColor Cyan

Write-Host "    Downloading: torch==2.2.1 (Linux)" -ForegroundColor Green
& pip download torch==2.2.1 -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null

Write-Host "    Downloading: sglang (Linux)" -ForegroundColor Green
& pip download sglang -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null

Write-Host "    Downloading: flash-attn==2.5.6 (Linux)" -ForegroundColor Green
& pip download flash-attn==2.5.6 -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null

Write-Host "  Python packages done" -ForegroundColor Green

# 4. Download GLM model
Write-Host ""
Write-Host "[4/5] Downloading GLM-4-9B-Chat model..." -ForegroundColor Yellow
Write-Host "  This will download about 18GB, please wait..." -ForegroundColor Yellow

$modelDir = "$modelsDir\glm-4-9b-chat"
if (Test-Path $modelDir) {
    Write-Host "  Model directory exists, skipping" -ForegroundColor Gray
} else {
    $modelscopeCheck = pip show modelscope 2>&1
    if ($LASTEXITCODE -ne 0) {
        Write-Host "  Installing modelscope..." -ForegroundColor Yellow
        & pip install modelscope 2>&1 | Out-Null
    }
    
    Write-Host "  Starting model download..." -ForegroundColor Green
    & modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir $modelDir
}

Write-Host "  Model download done" -ForegroundColor Green

# 5. Verify downloads
Write-Host ""
Write-Host "[5/5] Verifying downloads..." -ForegroundColor Yellow

Write-Host ""
Write-Host "System packages:" -ForegroundColor Cyan
$systemFiles = Get-ChildItem $systemDir -Filter "*.deb"
$totalSystemSize = 0
foreach ($file in $systemFiles) {
    $sizeMB = [math]::Round($file.Length / 1MB, 2)
    $totalSystemSize += $sizeMB
    Write-Host "  $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host "  Total: $([math]::Round($totalSystemSize, 2)) MB" -ForegroundColor Green

Write-Host ""
Write-Host "Python packages:" -ForegroundColor Cyan
$pythonFiles = Get-ChildItem $packagesDir -Filter "*.whl"
$totalPythonSize = 0
foreach ($file in $pythonFiles) {
    $sizeMB = [math]::Round($file.Length / 1MB, 2)
    $totalPythonSize += $sizeMB
    Write-Host "  $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host "  Total: $([math]::Round($totalPythonSize, 2)) MB" -ForegroundColor Green

Write-Host ""
Write-Host "Model files:" -ForegroundColor Cyan
if (Test-Path $modelDir) {
    $modelFiles = Get-ChildItem $modelDir
    $totalModelSize = 0
    foreach ($file in $modelFiles) {
        $sizeMB = [math]::Round($file.Length / 1MB, 2)
        $totalModelSize += $sizeMB
        Write-Host "  $($file.Name) - $sizeMB MB" -ForegroundColor White
    }
    Write-Host "  Total: $([math]::Round($totalModelSize, 2)) MB" -ForegroundColor Green
}

Write-Host ""
Write-Host "========================================" -ForegroundColor Green
Write-Host "All downloads completed!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
Write-Host ""
Write-Host "Download directory: $baseDir" -ForegroundColor Yellow
Write-Host ""
Write-Host "Next steps:" -ForegroundColor Yellow
Write-Host "1. Package C:\offline_packages directory" -ForegroundColor White
Write-Host "2. Transfer to target Ubuntu server" -ForegroundColor White
Write-Host "3. Follow SGLang_GLM_Installation_Guide.md for offline installation" -ForegroundColor White

2.2 离线环境安装步骤

2.2.1 传输文件到目标主机

传输方法:

  1. 使用U盘或移动硬盘

    • sglang_glm_offline_package.zip 复制到U盘
    • 插入Ubuntu服务器,挂载U盘
    • 复制文件到服务器
  2. 使用SCP传输(如果网络可达):

    # 在Ubuntu服务器上执行
    scp user@windows-machine:/path/to/sglang_glm_offline_package.zip /tmp/
    
  3. 使用SFTP工具(如FileZilla、WinSCP):

    • 从Windows上传到Ubuntu服务器

在Ubuntu上解压:

# 如果是ZIP格式
cd /tmp
unzip sglang_glm_offline_package.zip

# 如果是tar.gz格式
cd /tmp
tar -xzf sglang_glm_offline_package.tar.gz

# 进入解压目录
cd offline_packages

注意:Windows和Linux文件系统差异

  • Windows文件名不区分大小写,Linux区分大小写
  • Windows换行符为\r\n,Linux为\n
  • Python脚本文件在传输后可能需要转换换行符:
    # 安装dos2unix工具
    sudo apt-get install dos2unix
    
    # 转换换行符
    dos2unix *.py *.sh
    
2.2.2 安装系统依赖
# 安装deb包
cd system
sudo dpkg -i *.deb
sudo apt-get install -f  # 修复依赖关系

# 安装CUDA Toolkit(如果需要)
sudo sh cuda_12.1.0_530.30.02_linux.run --toolkit --silent

# 配置CUDA环境变量
echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
2.2.3 创建Python虚拟环境
cd ~/offline_packages
python3.12 -m venv sglang_env
source sglang_env/bin/activate
2.2.4 安装Python包
cd python
pip install --no-index --find-links=./packages -r requirements.txt
2.2.5 部署模型文件
# 创建模型目录
sudo mkdir -p /opt/models
sudo chown $USER:$USER /opt/models

# 移动模型文件
mv ~/offline_packages/models/glm-4-9b-chat /opt/models/

# 验证模型文件完整性
ls -lh /opt/models/glm-4-9b-chat/

预期模型文件列表:

config.json
configuration_glm.py
model-00001-of-000000.safetensors  # 多个分片文件
model.safetensors.index.json
tokenizer.model
tokenizer_config.json
tokenization_chatglm.py

3. 场景二:联网环境安装

3.1 系统环境准备

3.1.1 更新系统
sudo apt-get update
sudo apt-get upgrade -y
3.1.2 安装系统依赖
sudo apt-get install -y \
    build-essential \
    git \
    curl \
    wget \
    python3.12 \
    python3.12-venv \
    python3.12-dev \
    python3-pip
3.1.3 安装CUDA Toolkit 12.1
# 下载CUDA仓库配置
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

# 添加CUDA仓库
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/

# 安装CUDA
sudo apt-get update
sudo apt-get install -y cuda-12-1

# 配置环境变量
echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# 验证安装
nvidia-smi
nvcc --version

3.2 安装SGLang及依赖

3.2.1 创建Python虚拟环境
python3.12 -m venv ~/sglang_env
source ~/sglang_env/bin/activate
3.2.2 升级pip和setuptools
pip install --upgrade pip setuptools wheel
3.2.3 安装PyTorch(CUDA 12.1版本)
pip install torch==2.2.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
3.2.4 安装SGLang
pip install "sglang[all]"
3.2.5 安装其他依赖
pip install \
    transformers==4.40.0 \
    accelerate==0.28.0 \
    sentencepiece==0.2.0 \
    protobuf==4.25.3 \
    tiktoken==0.6.0

3.3 下载GLM-4模型

3.3.1 从ModelScope下载(国内推荐)
# 安装modelscope
pip install modelscope

# 创建模型目录
sudo mkdir -p /opt/models
sudo chown $USER:$USER /opt/models

# 下载GLM-4-9B-Chat模型
modelscope download \
    --model ZhipuAI/glm-4-9b-chat \
    --local_dir /opt/models/glm-4-9b-chat

# 或下载ChatGLM3-6B(较小,适合测试)
# modelscope download \
#     --model ZhipuAI/chatglm3-6b \
#     --local_dir /opt/models/chatglm3-6b

3.4 验证安装

# 验证PyTorch CUDA支持
python -c "import torch; print(f'PyTorch版本: {torch.__version__}'); print(f'CUDA可用: {torch.cuda.is_available()}'); print(f'CUDA版本: {torch.version.cuda}')"

# 验证SGLang安装
python -c "import sglang; print(f'SGLang版本: {sglang.__version__}')"

# 验证模型文件
ls -lh /opt/models/glm-4-9b-chat/

4. 服务启动与验证

4.1 创建启动脚本

创建服务启动脚本:

cat > ~/start_sglang_server.sh << 'EOF'
#!/bin/bash

# 激活虚拟环境
source ~/sglang_env/bin/activate

# 设置环境变量
export CUDA_VISIBLE_DEVICES=0

# 启动SGLang服务
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --host 0.0.0.0 \
    --port 8000 \
    --tp 1 \
    --mem-fraction-static 0.9 \
    --trust-remote-code
EOF

chmod +x ~/start_sglang_server.sh

4.2 启动服务

4.2.1 前台启动(测试用)
~/start_sglang_server.sh
4.2.2 后台启动(生产环境)
nohup ~/start_sglang_server.sh > ~/sglang_server.log 2>&1 &

# 查看日志
tail -f ~/sglang_server.log
4.2.3 使用systemd管理服务(推荐)

创建systemd服务文件:

sudo cat > /etc/systemd/system/sglang-glm.service << 'EOF'
[Unit]
Description=SGLang GLM-4-9B-Chat Service
After=network.target

[Service]
Type=simple
User=your_username
WorkingDirectory=/home/your_username
Environment="PATH=/home/your_username/sglang_env/bin:/usr/local/cuda-12.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="CUDA_VISIBLE_DEVICES=0"
ExecStart=/home/your_username/sglang_env/bin/python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --host 0.0.0.0 \
    --port 8000 \
    --tp 1 \
    --mem-fraction-static 0.9 \
    --trust-remote-code
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# 替换your_username为实际用户名
sudo sed -i "s/your_username/$USER/g" /etc/systemd/system/sglang-glm.service

# 启动服务
sudo systemctl daemon-reload
sudo systemctl start sglang-glm
sudo systemctl enable sglang-glm

# 查看服务状态
sudo systemctl status sglang-glm

4.3 验证服务

4.3.1 检查服务状态
# 检查端口监听
netstat -tlnp | grep 8000

# 或使用ss命令
ss -tlnp | grep 8000

# 检查进程
ps aux | grep sglang
4.3.2 测试API接口

测试1:获取模型列表

curl http://localhost:8000/v1/models

预期输出:

{
  "object": "list",
  "data": [
    {
      "id": "/opt/models/glm-4-9b-chat",
      "object": "model",
      "created": 1234567890,
      "owned_by": "sglang"
    }
  ]
}

测试2:发送聊天请求

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/opt/models/glm-4-9b-chat",
    "messages": [
      {"role": "user", "content": "你好,请介绍一下你自己。"}
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

预期输出:

{
  "id": "cmpl-xxxxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "/opt/models/glm-4-9b-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "你好!我是GLM-4,由智谱AI开发的大型语言模型..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 50,
    "total_tokens": 65
  }
}

测试3:使用Python客户端

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"
)

response = client.chat.completions.create(
    model="/opt/models/glm-4-9b-chat",
    messages=[
        {"role": "user", "content": "请用Python写一个快速排序算法"}
    ],
    max_tokens=500,
    temperature=0.7
)

print(response.choices[0].message.content)
4.3.3 性能基准测试
# 安装benchmark工具
pip install aiohttp

# 使用SGLang自带的benchmark脚本
python -m sglang.bench_serving \
    --model /opt/models/glm-4-9b-chat \
    --host localhost \
    --port 8000 \
    --num-prompts 100 \
    --max-tokens 100

4.4 成功验证标准

服务启动成功标准:

  1. 服务进程正常运行,无崩溃
  2. 端口8000正常监听
  3. API接口 /v1/models 返回正确模型信息
  4. 聊天接口能正常返回响应
  5. GPU显存利用率在80%-95%之间
  6. 单次推理延迟 < 2秒(首token)
  7. 吞吐量 > 20 tokens/秒

5. 常见问题与解决方案

5.1 CUDA相关问题

问题1:CUDA out of memory

错误信息:

RuntimeError: CUDA out of memory

解决方案:

# 降低GPU显存利用率
--mem-fraction-static 0.8

# 减少最大序列长度(通过环境变量)
export SGLANG_MAX_CONTEXT_LEN=4096

# 使用量化模型
pip install auto-gptq
# 下载量化版本模型
问题2:CUDA版本不匹配

错误信息:

RuntimeError: CUDA version mismatch

解决方案:

# 检查CUDA版本
nvidia-smi
nvcc --version

# 重新安装匹配版本的PyTorch
pip uninstall torch
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

5.2 模型加载问题

问题3:模型文件缺失

错误信息:

FileNotFoundError: Cannot find model files

解决方案:

# 检查模型文件完整性
ls -lh /opt/models/glm-4-9b-chat/

# 重新下载缺失文件(使用ModelScope)
pip install modelscope
modelscope download \
    --model ZhipuAI/glm-4-9b-chat \
    --local_dir /opt/models/glm-4-9b-chat
问题4:trust_remote_code错误

错误信息:

ValueError: The repository contains custom code

解决方案:

# 在启动命令中添加 --trust-remote-code 参数
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --trust-remote-code

5.3 网络问题

问题5:离线环境缺少依赖

解决方案:

# 在有网环境下载所有依赖
pip download -r requirements.txt -d ./packages

# 在离线环境安装
pip install --no-index --find-links=./packages -r requirements.txt
问题6:模型下载慢或失败

解决方案:

# 使用ModelScope国内镜像
pip install modelscope

# 配置ModelScope镜像(可选,默认已优化)
export MODELSCOPE_CACHE=/opt/models

# 下载模型
modelscope download \
    --model ZhipuAI/glm-4-9b-chat \
    --local_dir /opt/models/glm-4-9b-chat

# 如果下载中断,支持断点续传
# ModelScope会自动检测已下载的文件,继续下载未完成的部分

5.4 性能问题

问题7:推理速度慢

解决方案:

# 启用Flash Attention
pip install flash-attn --no-build-isolation

# SGLang会自动使用Flash Attention(如果可用)

# 调整batch size
export SGLANG_MAX_RUNNING_REQUESTS=256
问题8:首token延迟高

解决方案:

# 预热模型
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/opt/models/glm-4-9b-chat", "messages": [{"role": "user", "content": "hi"}], "max_tokens": 10}'

# 使用RadixAttention(SGLang特有优化)
# SGLang默认已启用

6. 性能优化建议

6.1 硬件配置优化

针对8核CPU / 32GB内存 / 32GB GPU配置:

# 优化启动参数
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000 \
    --tp 1 \
    --mem-fraction-static 0.9 \
    --max-running-requests 128

6.2 系统参数优化

# 增加文件描述符限制
ulimit -n 65535

# 永久生效
echo "* soft nofile 65535" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65535" | sudo tee -a /etc/security/limits.conf

# 优化内核参数
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535

6.3 监控与日志

# 实时监控GPU使用
watch -n 1 nvidia-smi

# 监控服务日志
tail -f ~/sglang_server.log

# 使用Prometheus监控(可选)
pip install prometheus-client
# SGLang默认已启用metrics端点:http://localhost:8000/metrics

6.4 负载均衡(多实例部署)

如果需要更高吞吐量,可以部署多个实例:

# 实例1(端口8000)
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --port 8000 \
    --mem-fraction-static 0.45

# 实例2(端口8001)
CUDA_VISIBLE_DEVICES=1 python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --port 8001 \
    --mem-fraction-static 0.45

# 使用Nginx负载均衡
sudo apt-get install nginx
sudo tee /etc/nginx/sites-available/sglang << 'EOF'
upstream sglang_backend {
    server localhost:8000;
    server localhost:8001;
}

server {
    listen 80;
    location / {
        proxy_pass http://sglang_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/sglang /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx

7. 附录

7.1 完整的requirements.txt

sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6

7.2 快速启动命令汇总

联网环境快速安装:

# 一键安装脚本
pip install "sglang[all]"

离线环境快速验证:

# 验证所有组件
python << 'EOF'
import torch
import sglang
import transformers

print(f"PyTorch: {torch.__version__}")
print(f"CUDA可用: {torch.cuda.is_available()}")
print(f"SGLang: {sglang.__version__}")
print(f"Transformers: {transformers.__version__}")
print(f"GPU设备: {torch.cuda.get_device_name(0)}")
print(f"GPU显存: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
EOF

注意事项:

  1. 本文档基于SGLang最新版本编写,如使用其他版本请参考官方文档
  2. GLM-4-9B模型需要约18GB显存,32GB显存配置可支持较大batch size
  3. 生产环境建议使用systemd管理服务,确保服务自动重启
  4. 定期检查GPU温度和显存使用情况,避免过热或OOM
  5. 建议在非生产环境先进行测试,验证所有功能正常后再部署到生产环境
  6. SGLang相比vLLM具有更好的吞吐量和更低的延迟,特别适合高并发场景
Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐