SGLang_GLM_Installation_Guide

紫丁香

459人浏览 · 2026-04-12 17:16:03

紫丁香 · 2026-04-12 17:16:03 发布

SGLang部署GLM大模型详细指导

文档信息

目标环境: Ubuntu 22.04
硬件配置: 8核CPU / 32GB内存 / 32GB GPU显存
目标模型: GLM-4-9B-Chat / ChatGLM3-6B
部署框架: SGLang
Python版本: 3.12
文档版本: v1.0

1. 系统环境检查

1.1 硬件资源验证

# 检查CPU核心数
lscpu | grep "CPU(s):"

# 检查内存大小
free -h

# 检查GPU信息
nvidia-smi

# 检查磁盘空间（建议至少50GB可用空间）
df -h

预期输出验证标准：

CPU核心数 ≥ 8
内存 ≥ 32GB
GPU显存 ≥ 32GB
磁盘可用空间 ≥ 50GB

1.2 操作系统版本验证

# 检查Ubuntu版本
lsb_release -a

# 检查内核版本
uname -r

预期输出：

Ubuntu 22.04 LTS
内核版本 ≥ 5.15

1.3 GPU驱动检查

# 检查NVIDIA驱动版本
nvidia-smi | grep "Driver Version"

# 检查CUDA版本（如果已安装）
nvcc --version 2>/dev/null || echo "CUDA未安装"

要求：

NVIDIA驱动版本 ≥ 525.60.13
CUDA版本 ≥ 12.0（推荐12.1或更高）

2. 场景一：完全离线环境安装

2.1 准备工作（在Windows系统的有网络机器上）

2.1.1 创建下载目录

在Windows PowerShell中创建下载目录：

# 创建下载目录结构
New-Item -ItemType Directory -Path "C:\offline_packages\system" -Force
New-Item -ItemType Directory -Path "C:\offline_packages\python\packages" -Force
New-Item -ItemType Directory -Path "C:\offline_packages\models" -Force

# 进入下载目录
cd C:\offline_packages

2.1.2 下载系统依赖包

⚠️ 重要提示：版本号会定期更新

清华镜像上的软件包版本号会随着安全更新而变化。如果下面的下载链接返回 404 错误，请按以下步骤查找最新版本：

访问清华镜像目录：https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/
查找最新的版本号（例如 3.12.3-1ubuntu0.13）
将下面命令中的版本号替换为最新版本

使用wget下载所有依赖包（Windows版）

# 安装wget（如果没有）
# 访问 https://eternallybored.org/misc/wget/ 下载wget.exe，放到C:\Windows\System32\

# 进入system目录
cd C:\offline_packages\system

# 下载Python 3.12相关包（Ubuntu 22.04）
# 注意：版本号可能会更新，请访问清华镜像确认最新版本
# 当前最新版本：3.12.3-1ubuntu0.13 (2026-04-08)
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12-minimal_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/python3.12-dev_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-dev_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-stdlib_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/libpython3.12-minimal_3.12.3-1ubuntu0.13_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python-pip/python3-pip_23.0.1+dfsg-1_all.deb

# 下载编译工具和依赖
# 注意：以下版本号可能会更新，请访问清华镜像确认最新版本
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/b/build-essential/build-essential_12.9ubuntu3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/gcc-11_11.4.0-1ubuntu1~22.04.3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/g++-11_11.4.0-1ubuntu1~22.04.3_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/m/make/make_4.3-4.1build1_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/d/dpkg-dev/dpkg-dev_1.21.1ubuntu2.3_all.deb

# 下载其他工具
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/git/git_2.34.1-1ubuntu1.17_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/c/curl/curl_7.81.0-1ubuntu1.15_amd64.deb
wget https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/w/wget/wget_1.21.2-2ubuntu1_amd64.deb

# 下载CUDA Toolkit 12.1
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run

# 验证下载的文件
dir

替代方案：使用脚本自动下载最新版本

如果上面的链接失效，可以使用以下 PowerShell 脚本自动查找并下载最新版本：

# 创建下载脚本
$script = @'
# Python 3.12 包下载脚本
$baseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/"
$packages = @(
    "python3.12_",
    "python3.12-minimal_",
    "python3.12-dev_",
    "libpython3.12-dev_",
    "libpython3.12-stdlib_",
    "libpython3.12-minimal_"
)

Write-Host "正在查找最新版本的Python 3.12包..." -ForegroundColor Yellow

# 获取目录列表
try {
    $response = Invoke-WebRequest -Uri $baseUrl -UseBasicParsing
    $content = $response.Content
    
    foreach ($pkg in $packages) {
        # 查找最新版本的amd64包
        $pattern = "$pkg(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"
        $matches = [regex]::Matches($content, $pattern)
        
        if ($matches.Count -gt 0) {
            # 获取最后一个匹配（通常是最新版本）
            $latest = $matches[$matches.Count - 1].Value
            $url = "$baseUrl$latest"
            $output = ".\$latest"
            
            Write-Host "下载: $latest" -ForegroundColor Green
            wget $url -O $output
        }
    }
    
    Write-Host "`n所有Python 3.12包下载完成!" -ForegroundColor Green
} catch {
    Write-Host "错误: $_" -ForegroundColor Red
}
'@

# 保存脚本
$script | Out-File -FilePath ".\download_python312.ps1" -Encoding UTF8

# 执行脚本
.\download_python312.ps1

2.1.3 下载Python包

⚠️ 重要说明：为什么需要指定Linux平台？

Python包分为两类，在跨平台传输时需要特别注意：

1. 纯Python包（平台无关）

示例：transformers、tiktoken、sentencepiece、protobuf、modelscope等
特点：只包含Python源代码（.py文件），无编译的二进制文件
跨平台：✅ 可以在任何操作系统上使用，Windows下载的包可以直接在Ubuntu上安装

2. 包含C扩展的包（平台相关）

示例：torch、sglang、flash-attn等
特点：包含编译的二进制文件（.pyd、.so、.dll等），这些文件是针对特定操作系统和CPU架构编译的
跨平台：❌ Windows下载的是.pyd（Windows DLL格式），Ubuntu需要.so（Linux共享库格式）

文件名差异：

# Windows版本（错误）
torch-2.2.1+cu121-cp312-cp312-win_amd64.whl

# Linux版本（正确）
torch-2.2.1+cu121-cp312-cp312-manylinux2014_x86_64.whl

解决方案：在Windows上使用pip download指定Linux平台下载

步骤1：创建requirements.txt

在 C:\offline_packages\python\ 目录下创建 requirements.txt 文件，内容如下：

sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6

步骤2：下载Linux版本的Python包

# 确保已安装Python 3.12+和pip
python --version

# 进入python目录
cd C:\offline_packages\python

# 方法一：一次性下载所有包（推荐）
# --platform manylinux2014_x86_64: 指定Linux平台
# --python-version 312: 指定Python版本
# --only-binary=:all: 只下载二进制包（避免源码包）
pip download -r requirements.txt -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:

# 方法二：如果上面的命令失败，可以分步下载
# 先下载纯Python包（平台无关，不需要指定平台）
pip download transformers==4.40.0 accelerate==0.28.0 sentencepiece==0.2.0 protobuf==4.25.3 tiktoken==0.6.0 modelscope==1.11.0 safetensors==0.4.3 tokenizers==0.15.2 -d .\packages

# 再下载Linux版本的C扩展包（必须指定平台）
pip download torch==2.2.1 -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
pip download sglang -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:
pip download flash-attn==2.5.6 -d .\packages --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all:

步骤3：验证下载的包是否正确

验证方法1：检查文件名中的平台标识

# 列出所有下载的包
dir .\packages

# 检查torch包是否为Linux版本
Get-ChildItem .\packages\torch*.whl | Select-Object Name

# ✅ 正确的文件名应该包含 "manylinux" 标识：
# torch-2.2.1+cu121-cp312-cp312-manylinux2014_x86_64.whl

# ❌ 错误的文件名包含 "win" 标识（如果在Ubuntu上安装会失败）：
# torch-2.2.1+cu121-cp312-cp312-win_amd64.whl

验证方法2：检查所有平台相关的包

# 检查所有whl文件的平台信息
Get-ChildItem .\packages\*.whl | ForEach-Object {
    $name = $_.Name
    if ($name -match "manylinux") {
        Write-Host "✅ Linux版本: $name" -ForegroundColor Green
    } elseif ($name -match "win") {
        Write-Host "❌ Windows版本（错误）: $name" -ForegroundColor Red
    } else {
        Write-Host "ℹ️  平台无关: $name" -ForegroundColor Cyan
    }
}

验证方法3：解压检查包内容（可选）

# 创建临时目录
mkdir temp_check
cd temp_check

# 解压torch包查看内容
# 使用7-Zip或解压工具
& "C:\Program Files\7-Zip\7z.exe" x ..\packages\torch*.whl

# 检查是否包含.so文件（Linux共享库）
Get-ChildItem -Recurse -Filter "*.so" | Select-Object FullName

# 如果有输出，说明是Linux版本 ✅
# 如果没有.so文件，而是.pyd文件，说明是Windows版本 ❌

# 清理临时目录
cd ..
Remove-Item temp_check -Recurse -Force

常见问题排查：

问题	原因	解决方案
文件名包含"win"	没有指定–platform参数	重新下载，添加–platform manylinux2014_x86_64
pip download失败	包不支持指定平台	使用方法二分步下载，或使用Ubuntu下载
安装时报错"not a supported wheel"	平台不匹配	检查文件名，确保包含manylinux标识

替代方案：在有网络的Ubuntu机器上下载（推荐）

如果Windows下载Linux包遇到问题，建议：

找一台有网络的Ubuntu机器（可以是虚拟机、WSL、或临时云主机）
在Ubuntu上执行下载命令：

# 在Ubuntu上创建目录
mkdir -p ~/offline_packages/python/packages
cd ~/offline_packages/python

# 创建requirements.txt
cat > requirements.txt << 'EOF'
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
EOF

# 下载所有依赖包
pip download -r requirements.txt -d ./packages

# 打包
cd ~
tar -czf python_packages.tar.gz offline_packages/python/

# 传输到Windows，再传输到目标Ubuntu服务器

2.1.4 下载GLM-4模型

使用ModelScope命令行工具下载

# 安装modelscope
pip install modelscope

# 进入models目录
cd C:\offline_packages\models

# 下载GLM-4-9B-Chat模型（推荐，适合32GB显存）
modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir .\glm-4-9b-chat

# 或下载ChatGLM3-6B（较小，适合测试）
# modelscope download --model ZhipuAI/chatglm3-6b --local_dir .\chatglm3-6b

验证模型文件完整性：

# 检查模型文件
cd C:\offline_packages\models\glm-4-9b-chat
dir

# 应该包含以下文件：
# config.json
# configuration_glm.py
# model-00001-of-00000x.safetensors (多个分片文件)
# model.safetensors.index.json
# tokenizer.model
# tokenizer_config.json
# tokenization_chatglm.py

2.1.5 打包所有文件

使用7-Zip压缩（推荐，压缩率更高）

# 如果已安装7-Zip
& "C:\Program Files\7-Zip\7z.exe" a -tzip C:\sglang_glm_offline_package.zip C:\offline_packages\

# 或压缩为tar.gz格式（需要在Ubuntu上解压）
& "C:\Program Files\7-Zip\7z.exe" a -ttar C:\sglang_glm_offline_package.tar C:\offline_packages\
& "C:\Program Files\7-Zip\7z.exe" a -tgzip C:\sglang_glm_offline_package.tar.gz C:\sglang_glm_offline_package.tar

# 验证打包内容
& "C:\Program Files\7-Zip\7z.exe" l C:\sglang_glm_offline_package.zip

# 查看压缩包大小
Get-Item C:\sglang_glm_offline_package.zip | Select-Object Name, @{Name="Size(GB)";Expression={[math]::Round($_.Length/1GB,2)}}

2.1.6 完整下载代码

# 代码放入download_all.ps1，执行./download_all.ps1
# SGLang GLM Offline Package Download Script
# For Windows - Download Ubuntu 22.04 dependencies

$ErrorActionPreference = "Continue"

$baseDir = "C:\offline_packages"
$systemDir = "$baseDir\system"
$pythonDir = "$baseDir\python"
$packagesDir = "$pythonDir\packages"
$modelsDir = "$baseDir\models"

Write-Host "========================================" -ForegroundColor Cyan
Write-Host "SGLang GLM Offline Package Download" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""

# Create directory structure
Write-Host "[1/5] Creating directories..." -ForegroundColor Yellow
New-Item -ItemType Directory -Path $systemDir -Force | Out-Null
New-Item -ItemType Directory -Path $packagesDir -Force | Out-Null
New-Item -ItemType Directory -Path $modelsDir -Force | Out-Null
Write-Host "  Done" -ForegroundColor Green

# Download function
function Download-File {
    param(
        [string]$Url,
        [string]$Output
    )
    
    $fileName = Split-Path $Output -Leaf
    if (Test-Path $Output) {
        Write-Host "  Exists: $fileName" -ForegroundColor Gray
        return $true
    }
    
    try {
        Write-Host "  Downloading: $fileName" -ForegroundColor Green
        Invoke-WebRequest -Uri $Url -OutFile $Output -TimeoutSec 300
        return $true
    } catch {
        Write-Host "  Failed: $($_.Exception.Message)" -ForegroundColor Red
        return $false
    }
}

# Find and download latest package
function Download-LatestPackage {
    param(
        [string]$BaseUrl,
        [string]$Pattern,
        [string]$PackageName
    )
    
    try {
        Write-Host "  Finding latest: $PackageName..." -ForegroundColor Yellow
        $response = Invoke-WebRequest -Uri $BaseUrl -UseBasicParsing -TimeoutSec 30
        $content = $response.Content
        
        $regexMatches = [regex]::Matches($content, $Pattern)
        
        if ($regexMatches.Count -gt 0) {
            $latest = $regexMatches[$regexMatches.Count - 1].Value
            $url = "$BaseUrl$latest"
            $output = Join-Path $systemDir $latest
            
            if (Test-Path $output) {
                Write-Host "  Exists: $latest" -ForegroundColor Gray
                return $true
            }
            
            Write-Host "  Downloading: $latest" -ForegroundColor Green
            Invoke-WebRequest -Uri $url -OutFile $output -TimeoutSec 300
            return $true
        } else {
            Write-Host "  Not found" -ForegroundColor Red
            return $false
        }
    } catch {
        Write-Host "  Error: $($_.Exception.Message)" -ForegroundColor Red
        return $false
    }
}

# 2. Download system packages
Write-Host ""
Write-Host "[2/5] Downloading system packages..." -ForegroundColor Yellow

# Python 3.12 packages
$python312BaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python3.12/"
$python312Packages = @(
    @{Name="python3.12"; Pattern="python3\.12_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="python3.12-minimal"; Pattern="python3\.12-minimal_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="python3.12-dev"; Pattern="python3\.12-dev_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="libpython3.12-dev"; Pattern="libpython3\.12-dev_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="libpython3.12-stdlib"; Pattern="libpython3\.12-stdlib_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"},
    @{Name="libpython3.12-minimal"; Pattern="libpython3\.12-minimal_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb"}
)

foreach ($pkg in $python312Packages) {
    Download-LatestPackage -BaseUrl $python312BaseUrl -Pattern $pkg.Pattern -PackageName $pkg.Name
}

# python3-pip
Download-File -Url "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/p/python-pip/python3-pip_23.0.1+dfsg-1_all.deb" -Output "$systemDir\python3-pip_23.0.1+dfsg-1_all.deb"

# GCC packages
$gccBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/gcc-11/"
Download-LatestPackage -BaseUrl $gccBaseUrl -Pattern "gcc-11_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "gcc-11"
Download-LatestPackage -BaseUrl $gccBaseUrl -Pattern "g\+\+-11_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "g++-11"

# build-essential
Download-File -Url "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/b/build-essential/build-essential_12.9ubuntu3_amd64.deb" -Output "$systemDir\build-essential_12.9ubuntu3_amd64.deb"

# make
$makeBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/m/make/"
Download-LatestPackage -BaseUrl $makeBaseUrl -Pattern "make_(\d+\.\d+-\d+\.\d+build\d+)_amd64\.deb" -PackageName "make"

# dpkg-dev
$dpkgBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/d/dpkg-dev/"
Download-LatestPackage -BaseUrl $dpkgBaseUrl -Pattern "dpkg-dev_(\d+\.\d+\.\d+ubuntu\d+\.\d+)_all\.deb" -PackageName "dpkg-dev"

# git
$gitBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/g/git/"
Download-LatestPackage -BaseUrl $gitBaseUrl -Pattern "git_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "git"

# curl
$curlBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/c/curl/"
Download-LatestPackage -BaseUrl $curlBaseUrl -Pattern "curl_(\d+\.\d+\.\d+-\d+ubuntu\d+\.\d+)_amd64\.deb" -PackageName "curl"

# wget
$wgetBaseUrl = "https://mirrors.tuna.tsinghua.edu.cn/ubuntu/pool/main/w/wget/"
Download-LatestPackage -BaseUrl $wgetBaseUrl -Pattern "wget_(\d+\.\d+\.\d+-\d+ubuntu\d+)_amd64\.deb" -PackageName "wget"

Write-Host "  System packages done" -ForegroundColor Green

# 3. Create requirements.txt and download Python packages
Write-Host ""
Write-Host "[3/5] Creating requirements.txt and downloading Python packages..." -ForegroundColor Yellow

$requirementsContent = @"
sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6
"@

$requirementsPath = "$pythonDir\requirements.txt"
[System.IO.File]::WriteAllText($requirementsPath, $requirementsContent, [System.Text.Encoding]::UTF8)
Write-Host "  requirements.txt created" -ForegroundColor Green

# Download pure Python packages (platform independent)
Write-Host "  Downloading pure Python packages..." -ForegroundColor Cyan
$purePythonPackages = @(
    "transformers==4.40.0",
    "accelerate==0.28.0",
    "sentencepiece==0.2.0",
    "protobuf==4.25.3",
    "tiktoken==0.6.0",
    "modelscope==1.11.0",
    "safetensors==0.4.3",
    "tokenizers==0.15.2"
)

foreach ($pkg in $purePythonPackages) {
    Write-Host "    Downloading: $pkg" -ForegroundColor Green
    & pip download $pkg -d $packagesDir 2>&1 | Out-Null
}

# Download Linux C extension packages
Write-Host "  Downloading Linux C extension packages..." -ForegroundColor Cyan

Write-Host "    Downloading: torch==2.2.1 (Linux)" -ForegroundColor Green
& pip download torch==2.2.1 -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null

Write-Host "    Downloading: sglang (Linux)" -ForegroundColor Green
& pip download sglang -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null

Write-Host "    Downloading: flash-attn==2.5.6 (Linux)" -ForegroundColor Green
& pip download flash-attn==2.5.6 -d $packagesDir --platform manylinux2014_x86_64 --python-version 312 --only-binary=:all: 2>&1 | Out-Null

Write-Host "  Python packages done" -ForegroundColor Green

# 4. Download GLM model
Write-Host ""
Write-Host "[4/5] Downloading GLM-4-9B-Chat model..." -ForegroundColor Yellow
Write-Host "  This will download about 18GB, please wait..." -ForegroundColor Yellow

$modelDir = "$modelsDir\glm-4-9b-chat"
if (Test-Path $modelDir) {
    Write-Host "  Model directory exists, skipping" -ForegroundColor Gray
} else {
    $modelscopeCheck = pip show modelscope 2>&1
    if ($LASTEXITCODE -ne 0) {
        Write-Host "  Installing modelscope..." -ForegroundColor Yellow
        & pip install modelscope 2>&1 | Out-Null
    }
    
    Write-Host "  Starting model download..." -ForegroundColor Green
    & modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir $modelDir
}

Write-Host "  Model download done" -ForegroundColor Green

# 5. Verify downloads
Write-Host ""
Write-Host "[5/5] Verifying downloads..." -ForegroundColor Yellow

Write-Host ""
Write-Host "System packages:" -ForegroundColor Cyan
$systemFiles = Get-ChildItem $systemDir -Filter "*.deb"
$totalSystemSize = 0
foreach ($file in $systemFiles) {
    $sizeMB = [math]::Round($file.Length / 1MB, 2)
    $totalSystemSize += $sizeMB
    Write-Host "  $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host "  Total: $([math]::Round($totalSystemSize, 2)) MB" -ForegroundColor Green

Write-Host ""
Write-Host "Python packages:" -ForegroundColor Cyan
$pythonFiles = Get-ChildItem $packagesDir -Filter "*.whl"
$totalPythonSize = 0
foreach ($file in $pythonFiles) {
    $sizeMB = [math]::Round($file.Length / 1MB, 2)
    $totalPythonSize += $sizeMB
    Write-Host "  $($file.Name) - $sizeMB MB" -ForegroundColor White
}
Write-Host "  Total: $([math]::Round($totalPythonSize, 2)) MB" -ForegroundColor Green

Write-Host ""
Write-Host "Model files:" -ForegroundColor Cyan
if (Test-Path $modelDir) {
    $modelFiles = Get-ChildItem $modelDir
    $totalModelSize = 0
    foreach ($file in $modelFiles) {
        $sizeMB = [math]::Round($file.Length / 1MB, 2)
        $totalModelSize += $sizeMB
        Write-Host "  $($file.Name) - $sizeMB MB" -ForegroundColor White
    }
    Write-Host "  Total: $([math]::Round($totalModelSize, 2)) MB" -ForegroundColor Green
}

Write-Host ""
Write-Host "========================================" -ForegroundColor Green
Write-Host "All downloads completed!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
Write-Host ""
Write-Host "Download directory: $baseDir" -ForegroundColor Yellow
Write-Host ""
Write-Host "Next steps:" -ForegroundColor Yellow
Write-Host "1. Package C:\offline_packages directory" -ForegroundColor White
Write-Host "2. Transfer to target Ubuntu server" -ForegroundColor White
Write-Host "3. Follow SGLang_GLM_Installation_Guide.md for offline installation" -ForegroundColor White

2.2 离线环境安装步骤

2.2.1 传输文件到目标主机

传输方法：

使用U盘或移动硬盘：
- 将 sglang_glm_offline_package.zip 复制到U盘
- 插入Ubuntu服务器，挂载U盘
- 复制文件到服务器

使用SCP传输（如果网络可达）：

# 在Ubuntu服务器上执行
scp user@windows-machine:/path/to/sglang_glm_offline_package.zip /tmp/

使用SFTP工具（如FileZilla、WinSCP）：
- 从Windows上传到Ubuntu服务器

在Ubuntu上解压：

# 如果是ZIP格式
cd /tmp
unzip sglang_glm_offline_package.zip

# 如果是tar.gz格式
cd /tmp
tar -xzf sglang_glm_offline_package.tar.gz

# 进入解压目录
cd offline_packages

注意：Windows和Linux文件系统差异

Windows文件名不区分大小写，Linux区分大小写
Windows换行符为\r\n，Linux为\n

Python脚本文件在传输后可能需要转换换行符：

# 安装dos2unix工具
sudo apt-get install dos2unix

# 转换换行符
dos2unix *.py *.sh

2.2.2 安装系统依赖

# 安装deb包
cd system
sudo dpkg -i *.deb
sudo apt-get install -f  # 修复依赖关系

# 安装CUDA Toolkit（如果需要）
sudo sh cuda_12.1.0_530.30.02_linux.run --toolkit --silent

# 配置CUDA环境变量
echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

2.2.3 创建Python虚拟环境

cd ~/offline_packages
python3.12 -m venv sglang_env
source sglang_env/bin/activate

2.2.4 安装Python包

cd python
pip install --no-index --find-links=./packages -r requirements.txt

2.2.5 部署模型文件

# 创建模型目录
sudo mkdir -p /opt/models
sudo chown $USER:$USER /opt/models

# 移动模型文件
mv ~/offline_packages/models/glm-4-9b-chat /opt/models/

# 验证模型文件完整性
ls -lh /opt/models/glm-4-9b-chat/

预期模型文件列表：

config.json
configuration_glm.py
model-00001-of-000000.safetensors  # 多个分片文件
model.safetensors.index.json
tokenizer.model
tokenizer_config.json
tokenization_chatglm.py

3. 场景二：联网环境安装

3.1 系统环境准备

3.1.1 更新系统

sudo apt-get update
sudo apt-get upgrade -y

3.1.2 安装系统依赖

sudo apt-get install -y \
    build-essential \
    git \
    curl \
    wget \
    python3.12 \
    python3.12-venv \
    python3.12-dev \
    python3-pip

3.1.3 安装CUDA Toolkit 12.1

# 下载CUDA仓库配置
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

# 添加CUDA仓库
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/

# 安装CUDA
sudo apt-get update
sudo apt-get install -y cuda-12-1

# 配置环境变量
echo 'export PATH=/usr/local/cuda-12.1/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# 验证安装
nvidia-smi
nvcc --version

3.2 安装SGLang及依赖

3.2.1 创建Python虚拟环境

python3.12 -m venv ~/sglang_env
source ~/sglang_env/bin/activate

3.2.2 升级pip和setuptools

pip install --upgrade pip setuptools wheel

3.2.3 安装PyTorch（CUDA 12.1版本）

pip install torch==2.2.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3.2.4 安装SGLang

pip install "sglang[all]"

3.2.5 安装其他依赖

pip install \
    transformers==4.40.0 \
    accelerate==0.28.0 \
    sentencepiece==0.2.0 \
    protobuf==4.25.3 \
    tiktoken==0.6.0

3.3 下载GLM-4模型

3.3.1 从ModelScope下载（国内推荐）

# 安装modelscope
pip install modelscope

# 创建模型目录
sudo mkdir -p /opt/models
sudo chown $USER:$USER /opt/models

# 下载GLM-4-9B-Chat模型
modelscope download \
    --model ZhipuAI/glm-4-9b-chat \
    --local_dir /opt/models/glm-4-9b-chat

# 或下载ChatGLM3-6B（较小，适合测试）
# modelscope download \
#     --model ZhipuAI/chatglm3-6b \
#     --local_dir /opt/models/chatglm3-6b

3.4 验证安装

# 验证PyTorch CUDA支持
python -c "import torch; print(f'PyTorch版本: {torch.__version__}'); print(f'CUDA可用: {torch.cuda.is_available()}'); print(f'CUDA版本: {torch.version.cuda}')"

# 验证SGLang安装
python -c "import sglang; print(f'SGLang版本: {sglang.__version__}')"

# 验证模型文件
ls -lh /opt/models/glm-4-9b-chat/

4. 服务启动与验证

4.1 创建启动脚本

创建服务启动脚本：

cat > ~/start_sglang_server.sh << 'EOF'
#!/bin/bash

# 激活虚拟环境
source ~/sglang_env/bin/activate

# 设置环境变量
export CUDA_VISIBLE_DEVICES=0

# 启动SGLang服务
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --host 0.0.0.0 \
    --port 8000 \
    --tp 1 \
    --mem-fraction-static 0.9 \
    --trust-remote-code
EOF

chmod +x ~/start_sglang_server.sh

4.2 启动服务

4.2.1 前台启动（测试用）

~/start_sglang_server.sh

4.2.2 后台启动（生产环境）

nohup ~/start_sglang_server.sh > ~/sglang_server.log 2>&1 &

# 查看日志
tail -f ~/sglang_server.log

4.2.3 使用systemd管理服务（推荐）

创建systemd服务文件：

sudo cat > /etc/systemd/system/sglang-glm.service << 'EOF'
[Unit]
Description=SGLang GLM-4-9B-Chat Service
After=network.target

[Service]
Type=simple
User=your_username
WorkingDirectory=/home/your_username
Environment="PATH=/home/your_username/sglang_env/bin:/usr/local/cuda-12.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="CUDA_VISIBLE_DEVICES=0"
ExecStart=/home/your_username/sglang_env/bin/python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --host 0.0.0.0 \
    --port 8000 \
    --tp 1 \
    --mem-fraction-static 0.9 \
    --trust-remote-code
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# 替换your_username为实际用户名
sudo sed -i "s/your_username/$USER/g" /etc/systemd/system/sglang-glm.service

# 启动服务
sudo systemctl daemon-reload
sudo systemctl start sglang-glm
sudo systemctl enable sglang-glm

# 查看服务状态
sudo systemctl status sglang-glm

4.3 验证服务

4.3.1 检查服务状态

# 检查端口监听
netstat -tlnp | grep 8000

# 或使用ss命令
ss -tlnp | grep 8000

# 检查进程
ps aux | grep sglang

4.3.2 测试API接口

测试1：获取模型列表

curl http://localhost:8000/v1/models

预期输出：

{
  "object": "list",
  "data": [
    {
      "id": "/opt/models/glm-4-9b-chat",
      "object": "model",
      "created": 1234567890,
      "owned_by": "sglang"
    }
  ]
}

测试2：发送聊天请求

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/opt/models/glm-4-9b-chat",
    "messages": [
      {"role": "user", "content": "你好，请介绍一下你自己。"}
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

预期输出：

{
  "id": "cmpl-xxxxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "/opt/models/glm-4-9b-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "你好！我是GLM-4，由智谱AI开发的大型语言模型..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 50,
    "total_tokens": 65
  }
}

测试3：使用Python客户端

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"
)

response = client.chat.completions.create(
    model="/opt/models/glm-4-9b-chat",
    messages=[
        {"role": "user", "content": "请用Python写一个快速排序算法"}
    ],
    max_tokens=500,
    temperature=0.7
)

print(response.choices[0].message.content)

4.3.3 性能基准测试

# 安装benchmark工具
pip install aiohttp

# 使用SGLang自带的benchmark脚本
python -m sglang.bench_serving \
    --model /opt/models/glm-4-9b-chat \
    --host localhost \
    --port 8000 \
    --num-prompts 100 \
    --max-tokens 100

4.4 成功验证标准

✅ 服务启动成功标准：

服务进程正常运行，无崩溃
端口8000正常监听
API接口 /v1/models 返回正确模型信息
聊天接口能正常返回响应
GPU显存利用率在80%-95%之间
单次推理延迟 < 2秒（首token）
吞吐量 > 20 tokens/秒

5. 常见问题与解决方案

5.1 CUDA相关问题

问题1：CUDA out of memory

错误信息：

RuntimeError: CUDA out of memory

解决方案：

# 降低GPU显存利用率
--mem-fraction-static 0.8

# 减少最大序列长度（通过环境变量）
export SGLANG_MAX_CONTEXT_LEN=4096

# 使用量化模型
pip install auto-gptq
# 下载量化版本模型

问题2：CUDA版本不匹配

错误信息：

RuntimeError: CUDA version mismatch

解决方案：

# 检查CUDA版本
nvidia-smi
nvcc --version

# 重新安装匹配版本的PyTorch
pip uninstall torch
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

5.2 模型加载问题

问题3：模型文件缺失

错误信息：

FileNotFoundError: Cannot find model files

解决方案：

# 检查模型文件完整性
ls -lh /opt/models/glm-4-9b-chat/

# 重新下载缺失文件（使用ModelScope）
pip install modelscope
modelscope download \
    --model ZhipuAI/glm-4-9b-chat \
    --local_dir /opt/models/glm-4-9b-chat

问题4：trust_remote_code错误

错误信息：

ValueError: The repository contains custom code

解决方案：

# 在启动命令中添加 --trust-remote-code 参数
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --trust-remote-code

5.3 网络问题

问题5：离线环境缺少依赖

解决方案：

# 在有网环境下载所有依赖
pip download -r requirements.txt -d ./packages

# 在离线环境安装
pip install --no-index --find-links=./packages -r requirements.txt

问题6：模型下载慢或失败

解决方案：

# 使用ModelScope国内镜像
pip install modelscope

# 配置ModelScope镜像（可选，默认已优化）
export MODELSCOPE_CACHE=/opt/models

# 下载模型
modelscope download \
    --model ZhipuAI/glm-4-9b-chat \
    --local_dir /opt/models/glm-4-9b-chat

# 如果下载中断，支持断点续传
# ModelScope会自动检测已下载的文件，继续下载未完成的部分

5.4 性能问题

问题7：推理速度慢

解决方案：

# 启用Flash Attention
pip install flash-attn --no-build-isolation

# SGLang会自动使用Flash Attention（如果可用）

# 调整batch size
export SGLANG_MAX_RUNNING_REQUESTS=256

问题8：首token延迟高

解决方案：

# 预热模型
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/opt/models/glm-4-9b-chat", "messages": [{"role": "user", "content": "hi"}], "max_tokens": 10}'

# 使用RadixAttention（SGLang特有优化）
# SGLang默认已启用

6. 性能优化建议

6.1 硬件配置优化

针对8核CPU / 32GB内存 / 32GB GPU配置：

# 优化启动参数
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000 \
    --tp 1 \
    --mem-fraction-static 0.9 \
    --max-running-requests 128

6.2 系统参数优化

# 增加文件描述符限制
ulimit -n 65535

# 永久生效
echo "* soft nofile 65535" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65535" | sudo tee -a /etc/security/limits.conf

# 优化内核参数
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535

6.3 监控与日志

# 实时监控GPU使用
watch -n 1 nvidia-smi

# 监控服务日志
tail -f ~/sglang_server.log

# 使用Prometheus监控（可选）
pip install prometheus-client
# SGLang默认已启用metrics端点：http://localhost:8000/metrics

6.4 负载均衡（多实例部署）

如果需要更高吞吐量，可以部署多个实例：

# 实例1（端口8000）
python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --port 8000 \
    --mem-fraction-static 0.45

# 实例2（端口8001）
CUDA_VISIBLE_DEVICES=1 python -m sglang.launch_server \
    --model-path /opt/models/glm-4-9b-chat \
    --port 8001 \
    --mem-fraction-static 0.45

# 使用Nginx负载均衡
sudo apt-get install nginx
sudo tee /etc/nginx/sites-available/sglang << 'EOF'
upstream sglang_backend {
    server localhost:8000;
    server localhost:8001;
}

server {
    listen 80;
    location / {
        proxy_pass http://sglang_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/sglang /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx

7. 附录

7.1 完整的requirements.txt

sglang>=0.2.0
torch==2.2.1
transformers==4.40.0
accelerate==0.28.0
sentencepiece==0.2.0
protobuf==4.25.3
tiktoken==0.6.0
modelscope==1.11.0
safetensors==0.4.3
tokenizers==0.15.2
flash-attn==2.5.6

7.2 快速启动命令汇总

联网环境快速安装：

# 一键安装脚本
pip install "sglang[all]"

离线环境快速验证：

# 验证所有组件
python << 'EOF'
import torch
import sglang
import transformers

print(f"PyTorch: {torch.__version__}")
print(f"CUDA可用: {torch.cuda.is_available()}")
print(f"SGLang: {sglang.__version__}")
print(f"Transformers: {transformers.__version__}")
print(f"GPU设备: {torch.cuda.get_device_name(0)}")
print(f"GPU显存: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
EOF

注意事项：

本文档基于SGLang最新版本编写，如使用其他版本请参考官方文档
GLM-4-9B模型需要约18GB显存，32GB显存配置可支持较大batch size
生产环境建议使用systemd管理服务，确保服务自动重启
定期检查GPU温度和显存使用情况，避免过热或OOM
建议在非生产环境先进行测试，验证所有功能正常后再部署到生产环境
SGLang相比vLLM具有更好的吞吐量和更低的延迟，特别适合高并发场景

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

Elasticsearch复杂数据类型终极指南：从入门到精通

Elasticsearch作为功能强大的搜索引擎，支持多种复杂数据类型，让开发者能够灵活处理各种结构化和非结构化数据。本文将带你全面了解Elasticsearch中的复杂数据类型，从基础概念到实际应用，助你轻松掌握数据建模的核心技巧。## 内部对象：构建层级化数据结构在Elasticsearch中，对象类型（Object）是最基础的复杂数据类型之一，用于表示具有嵌套关系的数据。例如，我们可

腾讯云开发者社区

终极指南：Flink SQL连接器版本管理从混乱到有序的升级之路

Apache Flink作为流处理领域的佼佼者，其SQL连接器的版本管理一直是开发者面临的核心挑战。本文将系统讲解Flink SQL连接器版本管理的最佳实践，帮助你轻松应对版本兼容性问题，实现从混乱到有序的升级之旅。## 连接器版本管理的常见痛点 😫在Flink应用开发中，连接器版本管理常常让开发者头疼不已。不同版本的连接器可能导致各种兼容性问题，例如API变更、功能差异甚至运行时错误。

腾讯云开发者社区

如何快速搭建Neon无服务器PostgreSQL：面向初学者的完整指南

Neon是一款革命性的无服务器PostgreSQL解决方案，它通过分离存储和计算层，实现了自动扩缩容、类代码式数据库分支以及零级扩展能力。本指南将帮助你从零开始搭建Neon开发环境，体验这款创新数据库的强大功能。## 准备工作：环境要求与依赖项在开始搭建Neon环境前，请确保你的系统满足以下要求：- Linux操作系统（推荐Ubuntu 20.04+或Debian 11+）- Git

腾讯云开发者社区

所有评论(0)

查看更多评论

紫丁香

@oZiDingXiang

已为社区贡献27条内容

SGLang_GLM_Installation_Guide

紫丁香

SGLang部署GLM大模型详细指导

文档信息

目录

1. 系统环境检查

1.1 硬件资源验证

1.2 操作系统版本验证

1.3 GPU驱动检查

2. 场景一：完全离线环境安装

2.1 准备工作（在Windows系统的有网络机器上）

2.1.1 创建下载目录

2.1.2 下载系统依赖包

2.1.3 下载Python包

2.1.4 下载GLM-4模型

2.1.5 打包所有文件

2.1.6 完整下载代码

2.2 离线环境安装步骤

2.2.1 传输文件到目标主机

2.2.2 安装系统依赖

2.2.3 创建Python虚拟环境

2.2.4 安装Python包

2.2.5 部署模型文件

3. 场景二：联网环境安装

3.1 系统环境准备

3.1.1 更新系统

3.1.2 安装系统依赖

3.1.3 安装CUDA Toolkit 12.1

3.2 安装SGLang及依赖

3.2.1 创建Python虚拟环境

3.2.2 升级pip和setuptools

3.2.3 安装PyTorch（CUDA 12.1版本）

3.2.4 安装SGLang

3.2.5 安装其他依赖

3.3 下载GLM-4模型

3.3.1 从ModelScope下载（国内推荐）

3.4 验证安装

4. 服务启动与验证

4.1 创建启动脚本

4.2 启动服务

4.2.1 前台启动（测试用）

4.2.2 后台启动（生产环境）

4.2.3 使用systemd管理服务（推荐）

4.3 验证服务

4.3.1 检查服务状态

4.3.2 测试API接口

4.3.3 性能基准测试

4.4 成功验证标准

5. 常见问题与解决方案

5.1 CUDA相关问题

问题1：CUDA out of memory

问题2：CUDA版本不匹配

5.2 模型加载问题

问题3：模型文件缺失

问题4：trust_remote_code错误

5.3 网络问题

问题5：离线环境缺少依赖

问题6：模型下载慢或失败

5.4 性能问题

问题7：推理速度慢

问题8：首token延迟高

6. 性能优化建议

6.1 硬件配置优化

6.2 系统参数优化

6.3 监控与日志

6.4 负载均衡（多实例部署）

7. 附录

7.1 完整的requirements.txt

7.2 快速启动命令汇总

所有评论(0)

温馨提示：您尚未绑定手机号

紫丁香