ELK Stack 系统部署、优化与安全加固指南

一、ELK Stack 概述

ELK Stack 是 Elasticsearch、Logstash、Kibana 三个开源软件的组合,用于日志收集、存储、分析和可视化。

二、系统部署

2.1 环境准备

系统要求
# 操作系统:CentOS/RHEL 7+ 或 Ubuntu 20.04+
# 硬件要求:
# - 内存:至少 8GB(生产环境建议 16GB+)
# - CPU:4核以上
# - 存储:根据日志量确定,建议使用 SSD

# 检查系统
cat /etc/os-release
free -h
lscpu
df -h
防火墙配置
# CentOS/RHEL
sudo systemctl start firewalld
sudo systemctl enable firewalld
sudo firewall-cmd --permanent --add-port={9200,5601,5044,9600}/tcp
sudo firewall-cmd --reload

# Ubuntu
sudo ufw allow 9200/tcp
sudo ufw allow 5601/tcp
sudo ufw allow 5044/tcp
sudo ufw allow 9600/tcp
sudo ufw enable
系统优化
# 1. 禁用交换分区
sudo swapoff -a
# 永久禁用
sudo sed -i '/swap/d' /etc/fstab

# 2. 调整文件描述符限制
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch hard nofile 65536" | sudo tee -a /etc/security/limits.conf

# 3. 调整虚拟内存限制
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

# 4. 调整线程限制
echo "* soft nproc 4096" | sudo tee -a /etc/security/limits.conf
echo "elasticsearch soft nproc 4096" | sudo tee -a /etc/security/limits.conf

# 5. 创建专用用户
sudo groupadd elasticsearch
sudo useradd -g elasticsearch -m elasticsearch
sudo usermod -a -G elasticsearch elasticsearch

2.2 Elasticsearch 部署

2.2.1 安装 Elasticsearch
# CentOS/RHEL
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

cat <<EOF | sudo tee /etc/yum.repos.d/elasticsearch.repo
[elasticsearch-8.x]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF

sudo yum install -y elasticsearch

# Ubuntu/Debian
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update
sudo apt-get install elasticsearch
2.2.2 配置 Elasticsearch
# /etc/elasticsearch/elasticsearch.yml

# 集群配置
cluster.name: production-cluster
node.name: ${HOSTNAME}

# 网络配置
network.host: [_local_, _site_]
http.port: 9200

# 发现配置(单节点)
discovery.type: single-node

# 生产环境多节点配置示例
# cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
# discovery.seed_hosts: ["host1", "host2", "host3"]

# 数据路径
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

# 内存锁定(防止内存交换)
bootstrap.memory_lock: true

# JVM 堆大小(最大为物理内存的50%,不超过32GB)
# 在 /etc/elasticsearch/jvm.options 中配置

# 启用安全功能(8.x 默认启用)
xpack.security.enabled: true
xpack.security.enrollment.enabled: true

# 启用加密通信
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12

xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12

# 设置节点角色(根据实际情况调整)
node.roles: [master, data, ingest]
2.2.3 JVM 配置
# /etc/elasticsearch/jvm.options
# 根据服务器内存调整,建议不超过物理内存的50%,最大不超过32GB

-Xms4g
-Xmx4g

# 垃圾回收器设置
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30
2.2.4 启动 Elasticsearch
# 设置目录权限
sudo chown -R elasticsearch:elasticsearch /etc/elasticsearch
sudo chown -R elasticsearch:elasticsearch /var/lib/elasticsearch
sudo chown -R elasticsearch:elasticsearch /var/log/elasticsearch

# 启动服务
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

# 查看状态
sudo systemctl status elasticsearch
sudo journalctl -u elasticsearch -f

# 验证安装
curl -X GET "localhost:9200/" -u elastic:<password>

2.3 Logstash 部署

2.3.1 安装 Logstash
# CentOS/RHEL
sudo yum install -y logstash

# Ubuntu
sudo apt-get install logstash
2.3.2 配置 Logstash
# 创建配置文件目录
sudo mkdir -p /etc/logstash/conf.d
sudo chown -R logstash:logstash /etc/logstash

# 示例配置文件:syslog收集
# /etc/logstash/conf.d/syslog.conf
cat <<'EOF' | sudo tee /etc/logstash/conf.d/syslog.conf
input {
  beats {
    port => 5044
    ssl => false
  }
  
  tcp {
    port => 5000
    type => syslog
  }
  
  udp {
    port => 5000
    type => syslog
  }
}

filter {
  # 根据type进行不同的过滤处理
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
  
  # 移除敏感信息
  mutate {
    remove_field => ["@version", "tags"]
  }
}

output {
  # 输出到Elasticsearch
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
    user => "logstash_writer"
    password => "${LOGSTASH_PASSWORD}"
    ssl_certificate_verification => false
    ssl => true
  }
  
  # 调试输出(生产环境可移除)
  stdout {
    codec => rubydebug
  }
}
EOF

# 创建JMX监控配置
# /etc/logstash/jvm.options
cat <<'EOF' | sudo tee -a /etc/elasticsearch/jvm.options
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
EOF
2.3.3 管道配置管理
# /etc/logstash/pipelines.yml
- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"
  pipeline.workers: 2
  pipeline.batch.size: 125
  pipeline.batch.delay: 50
2.3.4 启动 Logstash
# 设置环境变量密码
sudo tee /etc/sysconfig/logstash <<EOF
LOGSTASH_PASSWORD=your_secure_password
EOF

# 设置权限
sudo chown -R logstash:logstash /etc/logstash
sudo chown -R logstash:logstash /var/lib/logstash

# 启动服务
sudo systemctl enable logstash
sudo systemctl start logstash

# 查看状态
sudo systemctl status logstash
sudo tail -f /var/log/logstash/logstash-plain.log

2.4 Kibana 部署

2.4.1 安装 Kibana
# CentOS/RHEL
sudo yum install -y kibana

# Ubuntu
sudo apt-get install kibana
2.4.2 配置 Kibana
# /etc/kibana/kibana.yml

# 服务器配置
server.port: 5601
server.host: "0.0.0.0"
server.name: "kibana-server"

# Elasticsearch 连接
elasticsearch.hosts: ["https://localhost:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "${KIBANA_PASSWORD}"

# 安全配置
elasticsearch.ssl.certificateAuthorities: [ "/etc/kibana/certs/ca.crt" ]
elasticsearch.ssl.verificationMode: certificate

# 会话管理
server.publicBaseUrl: "https://your-domain.com:5601"
xpack.security.session.lifespan: 7d
xpack.security.session.idleTimeout: 1d

# 启用功能
xpack.fleet.enabled: true
xpack.reporting.enabled: true
xpack.canvas.enabled: true
xpack.graph.enabled: true
xpack.maps.enabled: true
xpack.monitoring.enabled: true
xpack.spaces.enabled: true

# 数据路径
path.data: /var/lib/kibana
path.logs: /var/log/kibana

# 日志配置
logging:
  appenders:
    file:
      type: file
      fileName: /var/log/kibana/kibana.log
      layout:
        type: json
  root:
    appenders:
      - default
      - file

# 性能优化
optimize.lazy: true
optimize.lazyPort: 5602
optimize.lazyPrebuild: true
2.4.3 启动 Kibana
# 创建数据目录
sudo mkdir -p /var/lib/kibana
sudo chown -R kibana:kibana /etc/kibana
sudo chown -R kibana:kibana /var/lib/kibana
sudo chown -R kibana:kibana /var/log/kibana

# 启动服务
sudo systemctl enable kibana
sudo systemctl start kibana

# 查看状态
sudo systemctl status kibana
sudo tail -f /var/log/kibana/kibana.log

2.5 Filebeat 部署(可选)

# 安装Filebeat
# CentOS
sudo yum install filebeat

# Ubuntu
sudo apt-get install filebeat

# 配置Filebeat
# /etc/filebeat/filebeat.yml
cat <<'EOF' | sudo tee /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
    - /var/log/syslog
    - /var/log/messages
  
  fields:
    type: syslog
  fields_under_root: true

- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    type: nginx-access
  fields_under_root: true

- type: log
  enabled: true
  paths:
    - /var/log/nginx/error.log
  fields:
    type: nginx-error
  fields_under_root: true

# 处理模块
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# 输出到Logstash
output.logstash:
  hosts: ["localhost:5044"]

# 监控
monitoring.enabled: true
EOF

# 启动Filebeat
sudo systemctl enable filebeat
sudo systemctl start filebeat

三、系统优化

3.1 Elasticsearch 优化

3.1.1 索引优化
// 创建索引模板
PUT _template/logstash_template
{
  "index_patterns": ["logstash-*"],
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "index.routing.allocation.total_shards_per_node": 2,
    "index.codec": "best_compression",
    "index.translog.durability": "async",
    "index.translog.sync_interval": "5s",
    "index.translog.flush_threshold_size": "512mb",
    
    // 分片大小优化
    "index.merge.scheduler.max_thread_count": 1,
    "index.merge.policy.max_merged_segment": "5gb",
    "index.merge.policy.segments_per_tier": 10,
    
    // 索引生命周期管理
    "lifecycle.name": "logstash_policy",
    "lifecycle.rollover_alias": "logstash"
  },
  "mappings": {
    "dynamic_templates": [
      {
        "strings_as_keywords": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ],
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "message": {
        "type": "text"
      }
    }
  }
}

// ILM策略配置
PUT _ilm/policy/logstash_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "1d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          },
          "allocate": {
            "number_of_replicas": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {
              "data": "cold"
            }
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
3.1.2 查询优化
// 使用索引别名
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "logstash-2023.10.01",
        "alias": "logstash-current"
      }
    }
  ]
}

// 配置慢查询日志
PUT /_cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.index.search.slowlog.query": "DEBUG",
    "logger.org.elasticsearch.index.search.slowlog.fetch": "DEBUG",
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.threshold.fetch.info": "800ms"
  }
}

// 使用查询缓存
PUT /my-index/_settings
{
  "index.queries.cache.enabled": true
}
3.1.3 集群优化
// 集群设置优化
PUT /_cluster/settings
{
  "persistent": {
    // 分片分配策略
    "cluster.routing.allocation.balance.shard": 0.45,
    "cluster.routing.allocation.balance.index": 0.55,
    "cluster.routing.allocation.balance.threshold": 1.0,
    
    // 并发恢复设置
    "cluster.routing.allocation.node_concurrent_recoveries": 2,
    "cluster.routing.allocation.node_initial_primaries_recoveries": 4,
    
    // 磁盘水位线
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    
    // 启用自适应副本选择
    "cluster.routing.use_adaptive_replica_selection": true
  }
}

// 节点角色分离配置
# 在elasticsearch.yml中添加
node.roles: [master, data_hot]  # 热数据节点
# 或
node.roles: [data_warm]  # 温数据节点
# 或
node.roles: [data_cold]  # 冷数据节点
# 或
node.roles: [master]     # 专用主节点
# 或
node.roles: [ml]         # 机器学习节点

3.2 Logstash 优化

3.2.1 管道性能优化
# /etc/logstash/pipelines.yml
- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"
  pipeline.workers: 4                 # CPU核心数
  pipeline.batch.size: 125            # 每批处理事件数
  pipeline.batch.delay: 50            # 批处理延迟(ms)
  pipeline.ordered: false             # 是否保持顺序
  
  # 队列设置
  queue.type: persisted
  queue.max_bytes: 4gb
  queue.checkpoint.acks: 1024
  queue.checkpoint.writes: 1024
  queue.checkpoint.interval: 1000
3.2.2 过滤器优化
# 使用条件语句优化过滤器性能
filter {
  # 尽早过滤不需要的数据
  if [type] != "application" {
    drop {}
  }
  
  # 使用grok失败时跳过后续处理
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
    tag_on_failure => ["_grokparsefailure"]
  }
  
  if "_grokparsefailure" in [tags] {
    drop {}
  }
  
  # 缓存常用模式
  grok {
    patterns_dir => ["/etc/logstash/patterns"]
    match => { "message" => "%{MY_PATTERN}" }
  }
  
  # 使用date过滤器替代ruby代码
  date {
    match => [ "timestamp", "ISO8601" ]
    target => "@timestamp"
  }
}
3.2.3 输出优化
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
    
    # 批量提交优化
    flush_size => 500
    idle_flush_time => 10
    
    # 重试策略
    retry_initial_interval => 1
    retry_max_interval => 60
    retry_on_conflict => 3
    
    # HTTP连接池
    pool_max => 1000
    pool_max_per_route => 100
    
    # 启用文档ID
    document_id => "%{[@metadata][_id]}"
    
    # 启用自动重试
    retry_on_failure => true
    max_retries => 3
  }
}

3.3 Kibana 优化

3.3.1 性能优化配置
# /etc/kibana/kibana.yml

# 内存优化
node.options: "--max-old-space-size=4096"

# 缓存配置
elasticsearch.requestTimeout: 30000
elasticsearch.shardTimeout: 30000
elasticsearch.startupTimeout: 30000

# 会话缓存
xpack.security.session.cleanupInterval: "10m"
xpack.security.session.concurrentSessions.maxSessions: 300

# 优化仪表板加载
optimize:
  lazy: true
  lazyPort: 5602
  lazyPrebuild: true

# 启用压缩
server.compression.enabled: true
server.compression.referrerWhitelist: ["/"]
3.3.2 搜索优化
// 创建索引模式优化
PUT /.kibana/_doc/config:7.15.0
{
  "config": {
    "search:includeFrozen": false,
    "search:timeout": 60000,
    "search:queryLanguage": "kuery",
    "metrics:max_buckets": 10000
  }
}

四、安全加固

4.1 Elasticsearch 安全加固

4.1.1 网络层安全
# /etc/elasticsearch/elasticsearch.yml

# 限制网络访问
network.host: _site_
network.bind_host: ["192.168.1.100", "127.0.0.1"]
network.publish_host: 192.168.1.100

# HTTP接口安全
http.host: _site_
http.port: 9200
http.max_content_length: 100mb
http.max_header_size: 16kb
http.compression: true

# 禁用危险的HTTP方法
http.cors.enabled: true
http.cors.allow-origin: "https://kibana.example.com"
http.cors.allow-methods: "OPTIONS, HEAD, GET, POST"
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
http.cors.allow-credentials: true

# 启用审计日志
xpack.security.audit.enabled: true
xpack.security.audit.logfile.events.include: authentication_failed, access_denied, tampered_request, connection_denied
xpack.security.audit.logfile.events.exclude: authentication_success
4.1.2 TLS/SSL 配置
# 生成CA证书
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca

# 生成节点证书
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert \
  --ca elastic-stack-ca.p12 \
  --name elasticsearch \
  --dns localhost,elasticsearch.example.com \
  --ip 192.168.1.100

# 移动证书到安全目录
sudo mkdir -p /etc/elasticsearch/certs
sudo cp elastic-stack-ca.p12 /etc/elasticsearch/certs/
sudo cp elasticsearch.p12 /etc/elasticsearch/certs/
sudo chown -R elasticsearch:elasticsearch /etc/elasticsearch/certs
sudo chmod 600 /etc/elasticsearch/certs/*.p12
4.1.3 用户与权限管理
# 设置内置用户密码
sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u kibana_system
sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u logstash_system

# 创建自定义角色
curl -X POST "https://localhost:9200/_security/role/logstash_writer" \
  -H 'Content-Type: application/json' \
  -u elastic:<password> \
  -d '{
    "cluster": ["monitor", "manage_index_templates"],
    "indices": [
      {
        "names": ["logstash-*"],
        "privileges": ["write", "create_index", "manage", "read"],
        "allow_restricted_indices": false
      }
    ]
  }'

# 创建用户并分配角色
curl -X POST "https://localhost:9200/_security/user/logstash_user" \
  -H 'Content-Type: application/json' \
  -u elastic:<password> \
  -d '{
    "password": "SecurePassword123!",
    "roles": ["logstash_writer"],
    "full_name": "Logstash User",
    "email": "logstash@example.com",
    "enabled": true
  }'
4.1.4 安全策略配置
// 配置安全策略
PUT /_security/policy/logstash_policy
{
  "cluster": [
    "cluster:monitor/main",
    "cluster:admin/ingest/pipeline/get"
  ],
  "indices": [
    {
      "names": ["logstash-*"],
      "privileges": ["read", "write", "create_index", "delete", "manage"]
    }
  ],
  "applications": [
    {
      "application": "kibana-.kibana",
      "privileges": ["feature_canvas.all", "feature_discover.all"],
      "resources": ["*"]
    }
  ]
}

// API密钥管理
POST /_security/api_key
{
  "name": "logstash-api-key",
  "role_descriptors": {
    "logstash_writer": {
      "cluster": ["monitor"],
      "indices": [
        {
          "names": ["logstash-*"],
          "privileges": ["write", "create_index"]
        }
      ]
    }
  },
  "expiration": "7d"
}

4.2 Logstash 安全加固

4.2.1 输入安全
# 使用SSL/TLS加密Beats输入
input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate_authorities => ["/etc/logstash/certs/ca.crt"]
    ssl_certificate => "/etc/logstash/certs/logstash.crt"
    ssl_key => "/etc/logstash/certs/logstash.key"
    ssl_verify_mode => "force_peer"
    
    # 客户端认证
    client_inactivity_timeout => 3600
    max_pending_requests => 1024
  }
  
  # 限制TCP连接
  tcp {
    port => 5000
    mode => "server"
    ssl_enable => true
    ssl_cert => "/etc/logstash/certs/logstash.crt"
    ssl_key => "/etc/logstash/certs/logstash.key"
    ssl_verify => false
    
    # 连接限制
    tcp_keep_alive => true
    receive_buffer_size => "64kb"
    
    # IP白名单
    # host => "192.168.1.0/24"
  }
}
4.2.2 数据脱敏
filter {
  # 移除敏感字段
  mutate {
    remove_field => [
      "password",
      "credit_card",
      "ssn",
      "token",
      "authorization"
    ]
  }
  
  # 邮箱脱敏
  if [email] {
    mutate {
      gsub => [
        "email", "@.*", "@example.com"
      ]
    }
  }
  
  # IP地址匿名化
  if [client_ip] {
    mutate {
      gsub => [
        "client_ip", "\.\d+$", ".0"
      ]
    }
  }
  
  # 信用卡号脱敏
  if [credit_card_number] {
    mutate {
      gsub => [
        "credit_card_number", "\d{12}(\d{4})", "************\1"
      ]
    }
  }
}
4.2.3 安全审计
# 启用Logstash审计日志
# /etc/logstash/logstash.yml
api.http.host: "127.0.0.1"
api.http.port: 9600
api.auth.type: basic
api.auth.basic.username: "admin"
api.auth.basic.password: "${LOGSTASH_API_PASSWORD}"

log.level: info
log.format: json

4.3 Kibana 安全加固

4.3.1 访问控制
# /etc/kibana/kibana.yml

# 启用安全功能
xpack.security.enabled: true
xpack.security.encryptionKey: "32位随机字符串"
xpack.security.session.idleTimeout: "1h"
xpack.security.session.lifespan: "7d"

# 认证配置
xpack.security.authc.providers:
  basic.basic1:
    order: 0
    description: "本地用户登录"
  saml.saml1:
    order: 1
    realm: "saml1"
    description: "SAML认证"

# 会话管理
server.xsrf.protection.enabled: true
server.xsrf.allowlist: ["/api/status"]
server.compression.enabled: true

# CORS配置
server.cors.enabled: true
server.cors.allowCredentials: true
server.cors.allowOrigin: ["https://your-domain.com"]
server.cors.allowHeaders: ["Authorization", "Content-Type", "kbn-xsrf"]

# 速率限制
xpack.security.rateLimiters:
  login:
    enabled: true
    ip: 10
    username: 5
4.3.2 空间和角色管理
// 创建Kibana空间
POST /api/spaces/space
{
  "id": "security",
  "name": "Security",
  "description": "安全监控空间",
  "color": "#aabbcc",
  "disabledFeatures": ["timelion", "maps"]
}

// 创建Kibana角色
PUT /_security/role/kibana_security_viewer
{
  "elasticsearch": {
    "cluster": ["monitor"],
    "indices": [
      {
        "names": ["logs-*", "audit-*"],
        "privileges": ["read"]
      }
    ]
  },
  "kibana": [
    {
      "base": ["read"],
      "feature": {
        "discover": ["read"],
        "dashboard": ["read"],
        "visualize": ["read"]
      },
      "spaces": ["security"]
    }
  ]
}

4.4 系统级安全加固

4.4.1 操作系统安全
#!/bin/bash
# security_hardening.sh

# 1. 系统更新
sudo yum update -y

# 2. 配置SSH安全
sudo sed -i 's/^#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo sed -i 's/^#MaxAuthTries 6/MaxAuthTries 3/' /etc/ssh/sshd_config
sudo systemctl restart sshd

# 3. 配置防火墙
sudo firewall-cmd --permanent --new-zone=elk
sudo firewall-cmd --permanent --zone=elk --add-source=192.168.1.0/24
sudo firewall-cmd --permanent --zone=elk --add-port=9200/tcp
sudo firewall-cmd --permanent --zone=elk --add-port=5601/tcp
sudo firewall-cmd --permanent --zone=elk --add-port=5044/tcp
sudo firewall-cmd --reload

# 4. 配置SELinux(CentOS/RHEL)
sudo setsebool -P httpd_can_network_connect 1
sudo semanage port -a -t http_port_t -p tcp 9200
sudo semanage port -a -t http_port_t -p tcp 5601

# 5. 配置审计
sudo yum install audit -y
sudo tee /etc/audit/rules.d/elk.rules <<EOF
-w /etc/elasticsearch -p wa -k elasticsearch_config
-w /var/lib/elasticsearch -p wa -k elasticsearch_data
-w /etc/kibana -p wa -k kibana_config
-w /var/log/elasticsearch -p wa -k elasticsearch_logs
EOF
sudo service auditd restart

# 6. 配置日志轮转
sudo tee /etc/logrotate.d/elk <<EOF
/var/log/elasticsearch/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    create 644 elasticsearch elasticsearch
    postrotate
        /bin/kill -SIGHUP \$(cat /var/run/elasticsearch/elasticsearch.pid 2>/dev/null) 2>/dev/null || true
    endscript
}
EOF
4.4.2 监控与告警
# 配置Elasticsearch监控
PUT _cluster/settings
{
  "persistent": {
    "xpack.monitoring.collection.enabled": true,
    "xpack.monitoring.collection.interval": 10s,
    "xpack.monitoring.history.duration": 7d
  }
}

# 配置告警
PUT _watcher/watch/cluster_health_watch
{
  "trigger": {
    "schedule": {
      "interval": "1m"
    }
  },
  "input": {
    "http": {
      "request": {
        "host": "localhost",
        "port": 9200,
        "path": "/_cluster/health",
        "auth": {
          "basic": {
            "username": "elastic",
            "password": "{{password}}"
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.status": {
        "eq": "red"
      }
    }
  },
  "actions": {
    "send_email": {
      "email": {
        "to": "admin@example.com",
        "subject": "Cluster Health Alert",
        "body": "Cluster status is RED!"
      }
    }
  }
}

五、备份与恢复

5.1 Elasticsearch 备份配置

# 配置快照仓库
# 创建备份目录
sudo mkdir -p /backup/elasticsearch
sudo chown -R elasticsearch:elasticsearch /backup/elasticsearch

# 注册文件系统仓库
PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/backup/elasticsearch",
    "compress": true,
    "max_restore_bytes_per_sec": "100mb",
    "max_snapshot_bytes_per_sec": "100mb"
  }
}

# 创建快照策略
PUT /_slm/policy/daily-snapshots
{
  "schedule": "0 30 1 * * ?",  # 每天1:30
  "name": "<daily-snap-{now/d}>",
  "repository": "my_backup",
  "config": {
    "indices": ["*"],
    "ignore_unavailable": false,
    "include_global_state": true,
    "metadata": {
      "taken_by": "slm",
      "taken_because": "Scheduled daily snapshot"
    }
  },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}

# 立即执行快照
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
{
  "indices": "logstash-*",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "admin",
    "taken_because": "backup before upgrade"
  }
}

5.2 恢复策略

# 查看可用快照
GET /_snapshot/my_backup/_all

# 恢复快照
POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": "logstash-2023.10.*",
  "ignore_unavailable": true,
  "include_global_state": false,
  "rename_pattern": "logstash-(.+)",
  "rename_replacement": "restored_logstash-$1"
}

# 部分索引恢复
POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": ["logstash-2023.10.01", "logstash-2023.10.02"],
  "index_settings": {
    "index.number_of_replicas": 0
  },
  "ignore_index_settings": ["index.refresh_interval"]
}

六、监控与维护

6.1 健康检查脚本

#!/bin/bash
# elk_health_check.sh

ELASTICSEARCH_HOST="localhost:9200"
KIBANA_HOST="localhost:5601"
LOGSTASH_HOST="localhost:9600"
ALERT_EMAIL="admin@example.com"

# 检查Elasticsearch
check_elasticsearch() {
    echo "=== Elasticsearch Health Check ==="
    
    # 集群健康
    health=$(curl -s -u elastic:$ELASTIC_PASSWORD "https://$ELASTICSEARCH_HOST/_cluster/health" | jq -r '.status')
    if [[ "$health" != "green" ]]; then
        echo "ALERT: Elasticsearch cluster status: $health"
        send_alert "Elasticsearch cluster status: $health"
    fi
    
    # 节点状态
    nodes=$(curl -s -u elastic:$ELASTIC_PASSWORD "https://$ELASTICSEARCH_HOST/_cat/nodes?v")
    echo "Nodes:"
    echo "$nodes"
    
    # 索引状态
    indices=$(curl -s -u elastic:$ELASTIC_PASSWORD "https://$ELASTICSEARCH_HOST/_cat/indices?v&health=red")
    if [[ -n "$indices" ]]; then
        echo "ALERT: Red indices found"
        echo "$indices"
        send_alert "Red indices found"
    fi
    
    # 磁盘使用
    disk=$(curl -s -u elastic:$ELASTIC_PASSWORD "https://$ELASTICSEARCH_HOST/_cat/allocation?v")
    echo "Disk allocation:"
    echo "$disk"
}

# 检查Kibana
check_kibana() {
    echo "=== Kibana Health Check ==="
    
    status=$(curl -s -o /dev/null -w "%{http_code}" "https://$KIBANA_HOST/api/status")
    if [[ "$status" != "200" ]]; then
        echo "ALERT: Kibana returned status: $status"
        send_alert "Kibana status: $status"
    fi
    
    # 检查Kibana系统状态
    kibana_status=$(curl -s "https://$KIBANA_HOST/api/status" | jq -r '.status.overall.level')
    echo "Kibana status: $kibana_status"
}

# 检查Logstash
check_logstash() {
    echo "=== Logstash Health Check ==="
    
    # 检查管道状态
    pipelines=$(curl -s "http://$LOGSTASH_HOST/_node/pipelines?pretty")
    echo "Logstash pipelines:"
    echo "$pipelines" | jq '.pipelines'
    
    # 检查JVM状态
    jvm=$(curl -s "http://$LOGSTASH_HOST/_node/stats/jvm")
    heap_used=$(echo "$jvm" | jq '.jvm.mem.heap_used_percent')
    if (( heap_used > 80 )); then
        echo "ALERT: Logstash heap usage high: $heap_used%"
        send_alert "Logstash heap usage: $heap_used%"
    fi
}

# 发送告警
send_alert() {
    local message="$1"
    echo "Sending alert: $message"
    # 使用邮件发送
    echo "$message" | mail -s "ELK Stack Alert" "$ALERT_EMAIL"
    # 或使用Slack webhook
    # curl -X POST -H 'Content-type: application/json' \
    #   --data "{\"text\":\"$message\"}" \
    #   https://hooks.slack.com/services/YOUR/WEBHOOK/URL
}

# 主函数
main() {
    source /etc/elk_credentials.conf  # 加载凭证
    
    check_elasticsearch
    check_kibana
    check_logstash
    
    echo "=== Health check completed at $(date) ==="
}

# 设置定时任务
# crontab -e
# 0 */2 * * * /path/to/elk_health_check.sh >> /var/log/elk_health.log 2>&1

main "$@"

七、故障排除

7.1 常见问题解决

# 1. Elasticsearch启动失败
# 检查日志
sudo journalctl -u elasticsearch -f
sudo tail -f /var/log/elasticsearch/production-cluster.log

# 检查内存锁定
sudo grep -i lock /var/log/elasticsearch/*.log
sudo sysctl vm.max_map_count

# 2. 磁盘空间不足
# 查看磁盘使用
curl -X GET "localhost:9200/_cat/allocation?v"

# 清理旧索引
curl -X DELETE "localhost:9200/logstash-2023.09.*"

# 3. 性能问题
# 检查热点线程
curl -X GET "localhost:9200/_nodes/hot_threads"

# 检查索引状态
curl -X GET "localhost:9200/_cat/indices?v&s=docs.count:desc"

# 4. 连接问题
# 检查防火墙
sudo firewall-cmd --list-all

# 检查SELinux
sudo ausearch -m avc -ts recent
sudo getsebool -a | grep httpd

# 5. 证书问题
# 重新生成证书
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca --pem
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca-cert ca/ca.crt --ca-key ca/ca.key

八、最佳实践总结

8.1 部署最佳实践

  1. 分离节点角色:主节点、数据节点、摄取节点分开部署
  2. 多可用区部署:跨可用区部署提高可用性
  3. 容量规划:根据数据量规划分片数量和大小
  4. 监控先行:部署前先设置监控和告警

8.2 安全最佳实践

  1. 最小权限原则:每个用户/服务使用最小必要权限
  2. 网络隔离:ELK组件部署在内部网络,通过反向代理暴露
  3. 定期更新:保持ELK Stack和操作系统最新
  4. 审计日志:启用并定期检查审计日志

8.3 性能最佳实践

  1. 分片优化:单个分片大小控制在20-50GB
  2. 索引生命周期:合理设置Hot-Warm-Cold架构
  3. 缓存优化:合理使用查询缓存和请求缓存
  4. 批量操作:使用批量API减少请求次数

8.4 维护最佳实践

  1. 定期备份:自动化快照策略
  2. 容量监控:监控磁盘、内存、CPU使用率
  3. 日志轮转:配置合理的日志保留策略
  4. 定期优化:定期执行force_merge和shrink操作

这个完整的ELK Stack部署、优化和安全加固指南涵盖了从基础部署到高级优化的所有方面。根据您的具体环境需求,可以适当调整配置参数。建议在生产环境部署前,先在测试环境充分验证。

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐