模型服务的水平扩展:从单机到Kubernetes——基于K8s的销量预测服务高可用部署

本文是《销量预测模型的自动化部署与监控——基于Docker+GitHub Actions》的进阶篇,聚焦如何将单机Docker部署升级为Kubernetes集群部署,实现自动伸缩和高可用。包含完整的Kubernetes配置文件、Helm Chart模板、以及生产级最佳实践。

一、为什么需要Kubernetes?

从Docker Compose到Kubernetes的演进路径

在前篇文章中,我们使用Docker Compose实现了API + Prometheus + Grafana的单机部署。但随着业务增长,会遇到以下瓶颈:

问题 Docker Compose Kubernetes
扩缩容 手动修改副本数 HorizontalPodAutoscaler自动伸缩
滚动更新 手动docker-compose down/up Rolling Update零停机
负载均衡 依赖外部Nginx 内置Service负载均衡
故障恢复 需监控脚本 自动重启、自动迁移
跨主机部署 需Docker Swarm 原生支持多节点集群
配置管理 环境变量散落 ConfigMap + Secrets集中管理

Kubernetes(简称K8s)是Google开源的容器编排平台,能自动化部署、扩缩容、负载均衡、日志收集等,让你的服务具备企业级的稳定性和弹性。

二、Kubernetes核心概念速览

在开始实战前,先理解几个核心概念:

  • Pod:K8s的最小调度单位,一个Pod包含一个或多个容器(通常是一个)
  • Deployment:管理Pod副本数、滚动更新、回滚的抽象
  • Service:为一组Pod提供稳定的访问入口,支持负载均衡
  • Ingress:管理外部HTTP/HTTPS访问
  • ConfigMap/Secret:存储配置和敏感信息
  • HorizontalPodAutoscaler (HPA):根据CPU/内存自动调整Pod副本数

三、第一步:改造应用为云原生架构

为了让应用更好地在K8s中运行,需要做以下改造:

1. 支持优雅停机

# app.py 添加信号处理
import signal
import sys

def graceful_shutdown(signum, frame):
    print("收到终止信号,开始优雅关闭...")
    # 清理资源,如关闭数据库连接
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)
signal.signal(signal.SIGINT, graceful_shutdown)

2. 支持健康检查可配置

import os

liveness_path = os.environ.get('HEALTH_CHECK_PATH', '/health')
readiness_path = os.environ.get('READINESS_CHECK_PATH', '/ready')

@app.route(liveness_path, methods=['GET'])
def liveness():
    """存活探针:应用是否存活"""
    return jsonify({'status': 'ok'})

@app.route(readiness_path, methods=['GET'])
def readiness():
    """就绪探针:应用是否可以接收流量"""
    if model is None:
        return jsonify({'status': 'not ready', 'reason': 'model not loaded'}), 503
    return jsonify({'status': 'ready'})

四、第二步:编写Kubernetes配置文件

目录结构

k8s/
├── namespace.yaml      # 命名空间
├── configmap.yaml      # 配置
├── deployment.yaml     # Deployment配置
├── service.yaml        # Service配置
├── ingress.yaml        # Ingress配置(可选)
├── hpa.yaml           # 自动伸缩配置
└── secret.yaml        # 密钥(生产环境使用外部Secret)

1. 创建命名空间(namespace.yaml)

apiVersion: v1
kind: Namespace
metadata:
  name: retail-forecast
  labels:
    app: retail-forecast
    environment: production

2. 配置管理(configmap.yaml)

apiVersion: v1
kind: ConfigMap
metadata:
  name: retail-forecast-config
  namespace: retail-forecast
data:
  HEALTH_CHECK_PATH: "/health"
  READINESS_CHECK_PATH: "/ready"
  LOG_LEVEL: "info"
  MODEL_CACHE_TTL: "3600"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: retail-forecast
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'retail-forecast-api'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true

3. Deployment配置(deployment.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: retail-forecast-api
  namespace: retail-forecast
  labels:
    app: retail-forecast
    tier: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: retail-forecast
  # 滚动更新策略
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           # 最多超出期望副本数1个
      maxUnavailable: 0     # 滚动过程中始终保持0个不可用
  
  # Pod模板
  template:
    metadata:
      labels:
        app: retail-forecast
        tier: api
      annotations:
        prometheus.io/scrape: "true"  # Prometheus自动发现
        prometheus.io/port: "5000"
        prometheus.io/path: "/metrics"
    spec:
      # 优雅终止时间
      terminationGracePeriodSeconds: 30
      
      containers:
        - name: api
          image: ghcr.io/yourusername/retail-forecast:latest
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 5000
              protocol: TCP
          
          # 资源限制(防止单个Pod耗尽集群资源)
          resources:
            requests:
              cpu: "100m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "1Gi"
          
          # 环境变量
          envFrom:
            - configMapRef:
                name: retail-forecast-config
          
          # 健康检查
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 10
            periodSeconds: 15
            timeoutSeconds: 5
            failureThreshold: 3
          
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 3
          
          # 启动探针(冷启动时等待模型加载)
          startupProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 0
            periodSeconds: 5
            failureThreshold: 30  # 最多等待30*5=150秒启动
          
          # 生命周期钩子
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]  # 等待kube-proxy更新

4. Service配置(service.yaml)

apiVersion: v1
kind: Service
metadata:
  name: retail-forecast-api
  namespace: retail-forecast
  labels:
    app: retail-forecast
spec:
  type: ClusterIP  # 集群内部访问
  ports:
    - name: http
      port: 80          # Service端口
      targetPort: 5000  # Pod端口
      protocol: TCP
  selector:
    app: retail-forecast
---
# NodePort类型Service(用于开发测试)
apiVersion: v1
kind: Service
metadata:
  name: retail-forecast-api-nodeport
  namespace: retail-forecast
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 5000
      nodePort: 30080  # 固定NodePort
  selector:
    app: retail-forecast

5. Ingress配置(ingress.yaml)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: retail-forecast-ingress
  namespace: retail-forecast
  annotations:
    # Nginx Ingress Controller配置
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
    nginx.ingress.kubernetes.io/rate-limit: "100"  # 限流100请求/秒
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.forecast.example.com
      secretName: forecast-tls-secret
  rules:
    - host: api.forecast.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: retail-forecast-api
                port:
                  number: 80

6. 自动伸缩配置(hpa.yaml)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: retail-forecast-hpa
  namespace: retail-forecast
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: retail-forecast-api
  # 副本数范围
  minReplicas: 2
  maxReplicas: 10
  # 伸缩指标
  metrics:
    # CPU使用率(目标70%)
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    # 内存使用率(目标80%)
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  # 冷却时间(避免频繁伸缩)
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 缩容等待5分钟
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0    # 扩容立即响应
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15

五、第三步:使用Helm简化部署

Helm是Kubernetes的包管理器,类似apt/yum,让你一键部署复杂应用。

1. 创建Helm Chart结构

helm create retail-forecast

生成的目录结构:

retail-forecast/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml
│   └── _helpers.tpl
└── .helmignore

2. 优化values.yaml

# values.yaml

replicaCount: 3

image:
  repository: ghcr.io/yourusername/retail-forecast
  pullPolicy: IfNotPresent
  tag: "latest"

imagePullSecrets: []
# - name: ghcr-secret

nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "5000"
  prometheus.io/path: "/metrics"

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
  hosts:
    - host: api.forecast.example.com
      paths:
        - path: /
          pathType: Prefix
          service:
            port: 80
  tls:
    - secretName: forecast-tls-secret
      hosts:
        - api.forecast.example.com

resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity: {}

# 监控组件
monitoring:
  enabled: true
  prometheus:
    enabled: true
    namespace: monitoring
  grafana:
    enabled: true
    namespace: monitoring

3. 一键部署

# 添加监控仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 部署应用
helm install retail-forecast ./retail-forecast -n retail-forecast --create-namespace

# 部署监控(可选)
helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace

# 升级(如更新配置)
helm upgrade retail-forecast ./retail-forecast -n retail-forecast

# 回滚
helm rollback retail-forecast -n retail-forecast

六、第四步:GitHub Actions集成K8s部署

更新deploy.yml

# .github/workflows/deploy-k8s.yml
name: Deploy to Kubernetes

on:
  push:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,format=long
            type=raw,value=latest

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}

      - name: Setup Kubeconfig
        uses: azure/k8s-set-context@v3
        with:
          kubeconfig: ${{ secrets.KUBE_CONFIG }}

      - name: Deploy to Kubernetes
        run: |
          # 使用sed替换镜像版本
          TAG=$(echo "${{ steps.meta.outputs.tags }}" | cut -d':' -f2)
          sed -i "s|image: .*|image: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${TAG}|" k8s/deployment.yaml
          
          # 应用配置
          kubectl apply -f k8s/namespace.yaml
          kubectl apply -f k8s/configmap.yaml
          kubectl apply -f k8s/deployment.yaml
          kubectl apply -f k8s/service.yaml
          kubectl apply -f k8s/ingress.yaml
          kubectl apply -f k8s/hpa.yaml
          
          # 等待滚动更新完成
          kubectl rollout status deployment/retail-forecast-api -n retail-forecast
          kubectl get pods -n retail-forecast

七、第五步:生产级监控与告警

PrometheusRule配置

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: retail-forecast-alerts
  namespace: retail-forecast
spec:
  groups:
    - name: retail-forecast
      rules:
        # Pod CPU使用率过高
        - alert: HighCPUUsage
          expr: |
            sum(rate(container_cpu_usage_seconds_total{
              pod=~"retail-forecast.*"}[5m])) by (pod)
            / on(pod) group_left()
            kube_pod_container_resource_limits{resource="cpu"}
            > 0.8
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod CPU使用率超过80%"
            description: "Pod {{ $labels.pod }} CPU使用率过高"

        # Pod重启过多
        - alert: PodRestartingTooMuch
          expr: |
            sum(kube_pod_container_status_restarts_total{
              pod=~"retail-forecast.*"}) by (pod) > 3
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod重启次数过多"
            description: "Pod {{ $labels.pod }} 在5分钟内重启超过3次"

        # HPA达到最大副本数
        - alert: HPAAtMaximumReplicas
          expr: |
            kube_hpa_status_current_replicas{
              name="retail-forecast-hpa"} 
            == kube_hpa_spec_max_replicas{
              name="retail-forecast-hpa"}
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "HPA达到最大副本数"
            description: "HPA已达最大副本数{{ $value }},可能需要扩容"

        # 预测延迟过高
        - alert: HighPredictionLatency
          expr: |
            histogram_quantile(0.99, 
              rate(model_prediction_duration_seconds_bucket[5m])) > 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "预测延迟过高"
            description: "P99延迟超过1秒,当前值: {{ $value }}s"

八、常见问题与避坑指南

Q1:Pod一直处于Pending状态?

检查PVC(持久卷)是否满足,或者节点资源是否足够:kubectl describe pod <pod-name> -n retail-forecast

Q2:滚动更新时服务不可用?

确保readinessProbe配置正确,且preStop钩子有足够延迟让kube-proxy更新。

Q3:HPA不生效?

确保metrics-server已安装并运行:kubectl top pods -n retail-forecast

Q4:如何查看Pod日志?

kubectl logs -f deployment/retail-forecast-api -n retail-forecast
kubectl logs -f <pod-name> -n retail-forecast --previous  # 查看重启前的日志

九、性能对比:Docker Compose vs Kubernetes

指标 Docker Compose Kubernetes
部署复杂度 简单 中等(需要集群)
扩缩容速度 分钟级手动 秒级自动
故障恢复 依赖健康检查脚本 原生支持
最大并发 受单机资源限制 理论上无限
成本 低(单台服务器) 中高(多节点集群)
适用场景 开发、小规模生产 中大规模生产

十、完整代码仓库

仓库地址:https://github.com/yourusername/retail-forecast-k8s

包含内容:

  • 完整Kubernetes配置文件
  • Helm Chart模板
  • GitHub Actions K8s部署流水线
  • Prometheus告警规则
  • 完整的Grafana Dashboard JSON

下一篇预告:《模型监控进阶:数据漂移检测与自动重训练》——让模型自己"感知"到性能下降

如果你在部署过程中遇到问题,欢迎私信交流!
「往期文章推荐 + 关注我」

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐