Ansible自动化运维
Ansible是一款开源的自动化运维工具,由Michael DeHaan于2012年创建,2015年被Red Hat收购。它通过SSH协议实现对远程主机的批量操作,无需在被管理节点安装代理软件,真正实现"无代理"架构。plaintext│ Ansible 核心优势 ││ 🔧 无代理架构 │ 无需在被控节点安装Agent,通过SSH即可管理 ││ 📝 简洁易学 │ 使用YAML编写Playboo
一、Ansible核心概念与架构
1.1 什么是Ansible
Ansible是一款开源的自动化运维工具,由Michael DeHaan于2012年创建,2015年被Red Hat收购。它通过SSH协议实现对远程主机的批量操作,无需在被管理节点安装代理软件,真正实现"无代理"架构。
plaintext
┌─────────────────────────────────────────────────────────────────────────┐
│ Ansible 核心优势 │
├─────────────────────────────────────────────────────────────────────────┤
│ 🔧 无代理架构 │ 无需在被控节点安装Agent,通过SSH即可管理 │
│ 📝 简洁易学 │ 使用YAML编写Playbook,语法直观,学习曲线平缓 │
│ 🔄 幂等性 │ 多次执行结果一致,放心重复执行 │
│ 🎯 角色复用 │ Roles机制支持代码复用和模块化设计 │
│ 🌐 社区丰富 │ Galaxy提供大量预制角色,企业级应用生态完善 │
│ ⚡ 高效并发 │ 支持异步执行和批量并行,大幅提升运维效率 │
└─────────────────────────────────────────────────────────────────────────┘
1.2 Ansible架构解析
Ansible采用客户端-服务器模式,但这里的"服务器"只是管理节点,无需专门的守护进程。
plaintext
┌─────────────────────────────────────────────────────────────────────────┐
│ Ansible 工作架构 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ │
│ │ 控制节点 │ (安装Ansible的主机) │
│ │ ┌─────────┐ │ │
│ │ │ Inventory │ │ 定义被管理主机清单 │
│ │ ├─────────┤ │ │
│ │ │ Playbook │ │ 定义自动化任务剧本 │
│ │ ├─────────┤ │ │
│ │ │ Modules │ │ 丰富的内置模块库 │
│ │ ├─────────┤ │ │
│ │ │ Plugins │ │ 扩展插件系统 │
│ │ └─────────┘ │ │
│ └───────┬───────┘ │
│ │ SSH (默认) / WinRM / API │
│ │ │
└────────────┼───────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ 受控节点 (Managed Nodes) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Server1 │ │ Server2 │ │ Server3 │ │ Server4 │ │ Server5 │ │
│ │ (web) │ │ (web) │ │ (db) │ │ (cache)│ │ (app) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ ※ 无需安装Agent,仅需Python环境和SSH访问 │
└─────────────────────────────────────────────────────────────────────────┘
1.3 核心概念速览
| 概念 | 说明 | 类比 |
|---|---|---|
| Control Node | 安装Ansible的主机,负责发起和管理任务 | 指挥官 |
| Managed Node | 被Ansible管理的主机,通过SSH连接 | 执行者 |
| Inventory | 主机清单,定义哪些主机需要管理 | 名单 |
| Module | 可执行的任务单元,如yum、copy、service | 工种技能 |
| Task | 任务,调用一个或多个模块完成具体操作 | 具体工作 |
| Play | 剧本,针对一组主机执行的任务集 | 演出剧本 |
| Playbook | YAML格式的任务剧本文件,可包含多个Play | 剧本合集 |
| Role | 角色,模块化的任务组织和复用机制 | 职能模块 |
| Facts | 主机变量,自动收集的远程系统信息 | 情报收集 |
1.4 Ansible执行流程
plaintext
┌─────────────────────────────────────────────────────────────────────────┐
│ Ansible 执行流程 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ① 加载 Inventory ──► 读取主机清单,解析主机和组定义 │
│ │ │
│ ▼ │
│ ② 解析 Playbook ──► 加载YAML配置,识别 hosts/tasks/handlers │
│ │ │
│ ▼ │
│ ③ 收集 Facts ──► 连接目标主机,采集系统信息(可选) │
│ │ │
│ ▼ │
│ ④ 执行 Pre-Tasks ──► 执行任何在主任务前的特殊任务 │
│ │ │
│ ▼ │
│ ⑤ 执行 Tasks ──► 按顺序执行任务,调用相应模块 │
│ │ │ │
│ │ ├──► SSH连接 ──► 执行Module ──► 返回JSON结果 │
│ │ │ │
│ │ └──► 模块执行可以是:同步/异步、串行/并行 │
│ │ │
│ ▼ │
│ ⑥ 执行 Handlers ──► 触发通知的处理器(如重启服务) │
│ │ │
│ ▼ │
│ ⑦ 汇总报告 ──► 汇总执行结果,生成报告 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
二、核心组件详解
2.1 Inventory(主机清单)
Inventory是Ansible管理的主机列表,支持静态和动态两种定义方式。
2.1.1 INI格式主机清单
ini
# 基础主机定义
web1.example.com
web2.example.com
db1.example.com
# 带端口的主机定义
192.168.1.10:2222
192.168.1.11:2222
# 定义主机组 [group_name]
[webservers]
web1.example.com
web2.example.com
192.168.1.[10:15] # 范围语法
[dbservers]
db1.example.com
db2.example.com
[appservers]
app1.example.com
app2.example.com
# 定义组变量
[webservers:vars]
http_port=80
max_connections=1000
# 定义子组关系
[production:children]
webservers
dbservers
appservers
# 父组变量(all组)
[all:vars]
ansible_user=admin
ansible_port=22
ansible_ssh_private_key_file=~/.ssh/id_rsa
2.1.2 YAML格式主机清单
yaml
all:
hosts:
web1.example.com:
http_port: 80
web2.example.com:
http_port: 80
db1.example.com:
db_port: 3306
children:
webservers:
hosts:
web1.example.com:
web2.example.com:
vars:
nginx_version: "1.24.0"
dbservers:
hosts:
db1.example.com:
vars:
mysql_version: "8.0"
production:
children:
webservers:
dbservers:
vars:
env: production
2.1.3 动态主机清单
对于云环境或CMDB,可以编写动态清单脚本:
python
#!/usr/bin/env python3
# dynamic_inventory.py
import json
import os
def get_dynamic_inventory():
"""
从云API或CMDB获取主机列表
这里演示返回格式
"""
inventory = {
"webservers": {
"hosts": ["web1.example.com", "web2.example.com"],
"vars": {"ansible_user": "ubuntu"}
},
"dbservers": {
"hosts": ["db1.example.com"],
"vars": {"ansible_user": "ubuntu"}
},
"_meta": {
"hostvars": {
"web1.example.com": {"internal_ip": "10.0.1.10"},
"web2.example.com": {"internal_ip": "10.0.1.11"},
"db1.example.com": {"internal_ip": "10.0.2.10"}
}
}
}
return inventory
if __name__ == "__main__":
print(json.dumps(get_dynamic_inventory()))
bash
# 使用动态清单
ansible-playbook -i dynamic_inventory.py site.yml
# 或者在 ansible.cfg 中指定
# inventory = ./dynamic_inventory.py
2.2 Modules(模块详解)
Ansible自带数千个模块,涵盖系统管理、包管理、文件操作、云服务等方方面面。
2.2.1 模块分类总览
plaintext
┌─────────────────────────────────────────────────────────────────────────┐
│ Ansible 模块分类 │
├─────────────────────────────────────────────────────────────────────────┤
│ 📦 系统模块 │ command, shell, script, service, systemd, cron, user │
│ 📁 文件模块 │ copy, file, template, lineinfile, find, stat │
│ 📦 包管理 │ yum, apt, dnf, pip, gem, npm │
│ 🌐 数据库 │ mysql_db, postgresql_db, mongodb_user │
│ ☁️ 云服务 │ ec2, azure_rm, gce, digital_ocean │
│ 🔧 网络 │ get_url, uri, snmp_facts │
│ 🔒 通知 │ mail, slack, telegram │
│ 🛠️ 集群 │ zookeeper, consul, etcd │
│ 📊 监控 │ datadog, nagios, prometheus │
└─────────────────────────────────────────────────────────────────────────┘
2.2.2 常用模块实战
package模块(跨平台包管理)
yaml
# 使用package模块,自动识别系统包管理器
- name: 安装nginx
package:
name: nginx
state: present
- name: 安装多个包
package:
name:
- vim
- git
- htop
state: present
- name: 安装最新版nginx
package:
name: nginx
state: latest
yum/apt模块(系统包管理)
yaml
# RedHat/CentOS使用yum
- name: 安装Apache并设置开机启动
yum:
name: httpd
state: present
enablerepo: epel
notify: restart apache
- name: 安装PHP及相关扩展
yum:
name:
- php
- php-fpm
- php-mysql
- php-gd
state: present
# Debian/Ubuntu使用apt
- name: 安装Nginx
apt:
name: nginx
state: present
update_cache: yes # 执行前更新缓存
cache_valid_time: 3600
copy模块(文件复制)
yaml
- name: 复制配置文件
copy:
src: ./configs/nginx.conf
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
backup: yes # 备份原文件
- name: 复制目录
copy:
src: ./webapp/
dest: /var/www/html/
owner: www-data
group: www-data
mode: '0755'
- name: 直接写入内容
copy:
content: |
# Generated by Ansible
DATABASE_HOST={{ db_host }}
REDIS_HOST={{ redis_host }}
dest: /etc/app/env.conf
mode: '0600'
template模块(Jinja2模板)
yaml
# 模板文件 nginx.conf.j2
user {{ nginx_user }};
worker_processes {{ ansible_processor_vcpus }};
events {
worker_connections {{ max_connections }};
}
http {
server_tokens off;
access_log {{ access_log_path }};
{% for port in upstream_ports %}
upstream backend_{{ port }} {
server 127.0.0.1:{{ port }};
}
{% endfor %}
}
# Playbook中使用template
- name: 生成Nginx配置
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
vars:
nginx_user: nginx
max_connections: 1024
access_log_path: /var/log/nginx/access.log
upstream_ports:
- 8000
- 8001
- 8002
file模块(文件/目录管理)
yaml
- name: 创建目录结构
file:
path: /var/www/html/app
state: directory
owner: www-data
group: www-data
mode: '0755'
recurse: yes
- name: 创建符号链接
file:
src: /opt/app/current
dest: /var/www/html
state: link
- name: 创建空文件
file:
path: /tmp/lock.file
state: touch
mode: '0644'
- name: 删除文件或目录
file:
path: /tmp/old_cache
state: absent
service/systemd模块(服务管理)
yaml
- name: 确保服务运行
service:
name: nginx
state: started
enabled: yes
- name: 重启MySQL服务
service:
name: mysqld
state: restarted
sleep: 5
- name: 使用systemd模块
systemd:
name: docker
state: started
enabled: yes
daemon_reload: yes
masked: no
command/shell模块(命令执行)
yaml
# command模块(直接执行,不使用shell)
- name: 查看系统时间
command: date
register: result
- name: 执行带管道的命令(需使用shell)
shell: ps aux | grep nginx | wc -l
register: nginx_processes
- name: 创建用户
command: useradd -m -s /bin/bash deploy
args:
creates: /home/deploy # 幂等性:目录存在则跳过
# 获取命令输出
- name: Display output
debug:
msg: "Nginx进程数: {{ nginx_processes.stdout }}"
git模块(代码部署)
yaml
- name: 克隆代码仓库
git:
repo: https://github.com/example/webapp.git
dest: /var/www/app
version: main
force: yes
depth: 1 # 浅克隆,加速
become: yes
become_user: www-data
- name: 检出特定版本标签
git:
repo: https://github.com/example/webapp.git
dest: /var/www/app
version: v2.1.0
force: yes
user模块(用户管理)
yaml
- name: 创建系统用户
user:
name: deploy
comment: "Deploy User"
shell: /bin/bash
home: /home/deploy
groups: sudo,wheel
password: "{{ 'securepass123' | password_hash('sha512') }}"
generate_ssh_key: yes
ssh_key_bits: 4096
- name: 删除用户
user:
name: temp_user
state: absent
remove: yes # 同时删除home目录
lineinfile模块(行操作)
yaml
- name: 添加一行到文件
lineinfile:
path: /etc/sysctl.conf
line: "net.ipv4.tcp_fin_timeout = 30"
state: present
- name: 确保配置行存在
lineinfile:
path: /etc/nginx/nginx.conf
regexp: '^worker_processes'
line: "worker_processes {{ ansible_processor_vcpus }};"
backrefs: yes # 使用正则分组
- name: 删除匹配的行
lineinfile:
path: /etc/hosts
regexp: '^192\.168\.1\.100'
state: absent
- name: 在特定行后插入
lineinfile:
path: /etc/profile
line: "export PATH=$PATH:/opt/app/bin"
insertafter: '^export PATH'
set_fact/register模块(变量设置)
yaml
- name: 获取远程命令输出并设置变量
shell: df -h / | tail -1 | awk '{print $5}' | sed 's/%//'
register: disk_usage
- name: 显示磁盘使用率
debug:
msg: "磁盘使用率: {{ disk_usage.stdout }}%"
- name: 设置自定义fact
set_fact:
app_version: "2.1.0"
deploy_time: "{{ ansible_date_time.iso8601 }}"
is_production: "{{ inventory_hostname in groups['production'] }}"
get_url模块(下载文件)
yaml
- name: 下载并安装包
get_url:
url: https://example.com/app.rpm
dest: /tmp/app.rpm
mode: '0644'
checksum: sha256:abc123...
notify: install app
- name: 下载配置文件
get_url:
url: https://config.example.com/app.conf
dest: /etc/app.conf
owner: root
group: root
mode: '0600'
fail/assert模块(条件判断)
yaml
- name: 确保是生产环境才执行
fail:
msg: "此Playbook只能在生产环境执行"
when: env != "production"
- name: 验证配置
assert:
that:
- mysql_port | int == 3306
- app_version is defined
- "'db' in group_names"
fail_msg: "配置验证失败"
success_msg: "配置验证通过"
2.2.3 查看模块帮助
bash
# 查看模块文档
ansible-doc copy
ansible-doc -l # 列出所有模块
ansible-doc -l | grep yum # 搜索特定模块
# 查看模块示例
ansible-doc -s yum # 简短示例
2.3 Playbooks(剧本编写)
Playbook是Ansible的核心,用于描述你要在目标主机上执行的任务配置。
2.3.1 Playbook基础结构
yaml
---
# 注释:这是一个Ansible Playbook示例
- name: 第一个Play - 部署Web服务器
hosts: webservers # 目标主机
become: yes # 提升权限
become_user: root
vars: # 变量定义
nginx_version: "1.24.0"
app_port: 8080
vars_files: # 外部变量文件
- vars/env_vars.yml
pre_tasks: # 主任务前的准备任务
- name: 更新apt缓存
apt:
update_cache: yes
when: ansible_os_family == "Debian"
tasks: # 核心任务列表
- name: 安装Nginx
apt:
name: nginx
state: present
- name: 复制Nginx配置
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify:重启Nginx
- name: 启动Nginx服务
service:
name: nginx
state: started
enabled: yes
handlers: # 处理器(由notify触发)
- name: 重启Nginx
service:
name: nginx
state: restarted
- name: 第二个Play - 配置数据库服务器
hosts: dbservers
become: yes
tasks:
- name: 安装MySQL
apt:
name:
- mysql-server
- mysql-client
- python3-mysqldb # Ansible MySQL模块依赖
state: present
- name: 启动MySQL
service:
name: mysql
state: started
enabled: yes
2.3.2 Playbook高级特性
条件执行
yaml
- name: 根据系统类型安装软件
package:
name: "{{ package_name }}"
state: present
vars:
package_name: "{{ 'httpd' if ansible_os_family == 'RedHat' else 'apache2' }}"
- name: 基于变量跳过任务
debug:
msg: "跳过数据库安装"
when: skip_db_install | default(false)
- name: 基于facts条件
block:
- name: 安装图形化界面
yum:
name: "@^图形服务器"
state: present
when: ansible_distribution == "CentOS" and ansible_distribution_version | int >= 7
- name: 基于注册变量
shell: /opt/scripts/check-health.sh
register: health_check
failed_when: health_check.stdout != "OK"
循环处理
yaml
# 标准循环 with_items
- name: 创建多个用户
user:
name: "{{ item }}"
state: present
create_home: yes
loop:
- alice
- bob
- charlie
# 循环带索引
- name: 创建多个数据库
mysql_db:
name: "{{ item.dbname }}"
state: present
encoding: "{{ item.encoding }}"
loop:
- { dbname: 'app1', encoding: 'utf8mb4' }
- { dbname: 'app2', encoding: 'utf8mb4' }
- { dbname: 'app3', encoding: 'utf8mb4' }
# 循环注册结果
- name: 并行执行多个命令
shell: "echo {{ item }}"
register: results
loop:
- "Server 1"
- "Server 2"
- "Server 3"
- name: 显示执行结果
debug:
msg: "{{ item.stdout }}"
loop: "{{ results.results }}"
任务委托
yaml
- name: 在本地记录部署日志
shell: echo "Deployed to {{ inventory_hostname }}" >> /tmp/deploy.log
delegate_to: localhost
- name: 向负载均衡器注册主机
shell: /opt/scripts/register.sh
delegate_to: lb01.example.com
run_once: true # 只执行一次
- name: 延迟执行
pause:
seconds: 5
delegate_to: localhost
任务异步
yaml
# 长时间任务异步执行
- name: 启动后台进程
command: /opt/long-running-task.sh
async: 3600 # 最大执行时间(秒)
poll: 0 # 不等待,立即继续
register: long_task
- name: 等待任务完成
async_status:
jid: "{{ long_task.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 100
delay: 30
批量执行策略
yaml
- name: 批量部署应用
hosts: webservers
serial: 5 # 每次5台,避免全部更新
tasks:
- name: 滚动更新应用
include_tasks: rolling_update.yml
2.3.3 Playbook组织最佳实践
plaintext
project/
├── ansible.cfg # Ansible配置文件
├── inventory/ # 主机清单目录
│ ├── prod.yml # 生产环境清单
│ ├── staging.yml # 预发环境清单
│ └── group_vars/ # 组变量
│ └── all.yml
├── playbooks/ # Playbook目录
│ ├── site.yml # 主入口
│ ├── webservers.yml # Web服务配置
│ ├── dbservers.yml # 数据库配置
│ └── monitoring.yml # 监控配置
├── roles/ # 角色目录
│ └── nginx/
│ ├── tasks/
│ │ └── main.yml
│ ├── handlers/
│ │ └── main.yml
│ ├── templates/
│ │ └── nginx.conf.j2
│ ├── vars/
│ │ └── main.yml
│ └── defaults/
│ └── main.yml
├── vars/ # 全局变量
│ └── secrets.yml.enc # 加密的敏感变量
├── files/ # 静态文件
├── library/ # 自定义模块
└── filter_plugins/ # 自定义过滤器
2.4 Roles(角色管理)
Roles是Ansible的模块化组织方式,将任务、处理器、变量、模板等组织在一起,实现代码复用。
2.4.1 标准目录结构
plaintext
roles/
└── nginx/
├── defaults/ # 默认变量(优先级最低)
│ └── main.yml
├── files/ # 静态文件(copy模块直接引用)
│ ├── nginx.conf
│ └── mime.types
├── handlers/ # 处理器
│ └── main.yml
├── meta/ # 角色依赖和元数据
│ └── main.yml
├── tasks/ # 任务列表
│ ├── install.yml
│ ├── configure.yml
│ └── service.yml
│ └── main.yml # 主入口文件
├── templates/ # Jinja2模板文件
│ └── nginx.conf.j2
├── vars/ # 变量(优先级高于defaults)
│ └── main.yml
└── README.md # 角色文档
2.4.2 Role完整示例
tasks/main.yml
yaml
---
# tasks/main.yml - 角色主入口
- name: 包含安装任务
include_tasks: install.yml
- name: 包含配置任务
include_tasks: configure.yml
- name: 包含服务任务
include_tasks: service.yml
tasks/install.yml
yaml
---
- name: 安装Nginx
package:
name: nginx
state: present
- name: 创建用户
user:
name: "{{ nginx_user }}"
system: yes
create_home: no
shell: /sbin/nologin
tasks/configure.yml
yaml
---
- name: 创建日志目录
file:
path: "{{ nginx_log_dir }}"
state: directory
owner: "{{ nginx_user }}"
mode: '0755'
- name: 复制静态配置文件
copy:
src: mime.types
dest: /etc/nginx/mime.types
owner: root
group: root
mode: '0644'
- name: 复制主配置文件
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
notify: 检查Nginx配置
- name: 部署虚拟主机配置
template:
src: vhost.conf.j2
dest: "/etc/nginx/sites-available/{{ nginx_domain }}"
notify: 重载Nginx
when: nginx_domain is defined
tasks/service.yml
yaml
---
- name: 确保Nginx服务运行
service:
name: nginx
state: started
enabled: yes
handlers/main.yml
yaml
---
- name: 检查Nginx配置
command: nginx -t
changed_when: false
- name: 重载Nginx
systemd:
name: nginx
state: reloaded
- name: 重启Nginx
systemd:
name: nginx
state: restarted
defaults/main.yml
yaml
---
nginx_version: "1.24.0"
nginx_user: nginx
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_log_dir: /var/log/nginx
nginx_pid_file: /run/nginx.pid
nginx_server_tokens: "off"
templates/nginx.conf.j2
nginx
user {{ nginx_user }};
worker_processes {{ nginx_worker_processes }};
pid {{ nginx_pid_file }};
error_log {{ nginx_log_dir }}/error.log {{ nginx_error_log_level | default('warn') }};
events {
worker_connections {{ nginx_worker_connections }};
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log {{ nginx_log_dir }}/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout {{ nginx_keepalive_timeout | default(65) }};
types_hash_max_size 2048;
server_tokens {{ nginx_server_tokens }};
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript application/json;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
meta/main.yml(角色依赖)
yaml
---
dependencies:
- role: common
vars:
timezone: Asia/Shanghai
- role: firewall
vars:
allowed_tcp_ports:
- 80
- 443
2.4.3 使用Role
yaml
# playbooks/deploy.yml
---
- name: 部署Web服务器集群
hosts: webservers
become: yes
roles:
- role: common # 使用默认变量
- role: nginx
vars:
nginx_domain: example.com
nginx_worker_connections: 2048
- role: ssl
when: enable_ssl | default(true)
- role: monitoring
tags: [monitoring]
2.5 Variables(变量系统)
Ansible变量用于存储可变的配置值,支持多种来源和作用域。
2.5.1 变量定义方式
Playbook中定义
yaml
vars:
app_name: myapp
app_version: "2.1.0"
db_host: 192.168.1.100
db_port: 3306
db_users:
- name: app
password: "{{ 'apppass' | password_hash('sha512') }}"
privileges: "SELECT,INSERT,UPDATE,DELETE"
- name: readonly
password: "{{ 'readonly' | password_hash('sha512') }}"
privileges: "SELECT"
变量文件引用
yaml
# vars_files 方式
vars_files:
- vars/app_config.yml
- vars/secrets.yml
# vars/env.yml 内容
app_name: production-app
app_env: production
log_level: INFO
max_connections: 1000
通过命令行传递
bash
# 使用 -e 或 --extra-vars
ansible-playbook site.yml -e "app_version=2.2.0"
ansible-playbook site.yml -e @vars.json
ansible-playbook site.yml -e '{"app_version":"2.2.0","db_host":"10.0.0.1"}'
2.5.2 变量优先级(由低到高)
| 优先级 | 来源 | 说明 |
|---|---|---|
| 1 | role defaults | 角色默认变量 |
| 2 | inventory file/script | 主机清单中的变量 |
| 3 | group_vars/all | 所有组的变量 |
| 4 | group_vars/group | 特定组的变量 |
| 5 | host_vars/host | 特定主机的变量 |
| 6 | inventory host_vars | 主机直接定义的变量 |
| 7 | Playbook vars | Playbook中vars定义的变量 |
| 8 | Playbook vars_files | Playbook中vars_files引入的变量 |
| 9 | Playbook vars_prompt | 交互式输入的变量 |
| 10 | role vars (in role) | 角色vars目录中的变量 |
| 11 | block vars | block中定义的变量 |
| 12 | set_fact | 任务中设置的fact |
| 13 | include_vars | 动态包含的变量 |
| 14 | register | 注册的变量 |
| 15 | command line -e | 命令行extra-vars(最高优先级) |
2.5.3 变量查找顺序
Ansible会按照"主机 → 组 → all"的顺序查找变量:
plaintext
对于主机 web1.prod.example.com,变量查找顺序:
1. web1.prod.example.com
2. prod 组
3. example.com 组(父组)
4. all 组
2.5.4 Jinja2变量使用
yaml
- name: 使用变量
template:
src: app.conf.j2
dest: /etc/app.conf
# app.conf.j2 内容
app:
name: {{ app_name }}
version: {{ app_version }}
database:
host: {{ db_host | default('localhost') }}
port: {{ db_port | default(3306) }}
name: {{ db_name | upper }}
features:
{% for feature in enabled_features %}
- {{ feature }}
{% endfor %}
常用过滤器
yaml
{{ variable | default('default_value') }}
{{ variable | int }}
{{ variable | string }}
{{ variable | bool }}
{{ variable | upper }}
{{ variable | lower }}
{{ variable | trim }}
{{ variable | to_json }}
{{ variable | to_yaml }}
{{ 'password' | password_hash('sha512', 65534 | random(seed=inventory_hostname)) }}
{{ list1 | union(list2) }}
{{ list1 | intersect(list2) }}
{{ dictionary | dict2items }}
{{ items | items2dict }}
2.5.5 加密敏感变量
bash
# 使用ansible-vault加密文件
ansible-vault encrypt vars/secrets.yml
ansible-vault decrypt vars/secrets.yml
ansible-vault view vars/secrets.yml
ansible-vault edit vars/secrets.yml
ansible-vault create vars/new_secrets.yml
# 执行时提供密码
ansible-playbook site.yml --ask-vault-pass
ansible-playbook site.yml --vault-password-file ~/.vault_pass.txt
yaml
# vault密码文件
echo "my_secret_password" > ~/.vault_pass.txt
# ansible.cfg 配置
[defaults]
vault_password_file = ~/.vault_pass.txt
三、安装部署与基础配置
3.1 控制节点安装
方式一:pip安装(推荐)
bash
# 安装Ansible
pip install ansible
# 验证安装
ansible --version
# 升级
pip install --upgrade ansible
方式二:系统包管理器
bash
# RHEL/CentOS/Fedora
sudo dnf install ansible
# Ubuntu/Debian
sudo apt update
sudo apt install ansible
# macOS
brew install ansible
方式三:源码安装
bash
git clone https://github.com/ansible/ansible.git
cd ansible
source ./hacking/env-setup
pip install -r requirements.txt
3.2 基础配置
3.2.1 ansible.cfg配置
ini
[defaults]
# Inventory路径
inventory = ./inventory
# 默认远程用户
remote_user = admin
# SSH参数
timeout = 30
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
# 并行数
forks = 10
# 输出配置
stdout_callback = yaml
bin_ansible_callbacks = yes
# 日志配置
log_path = /var/log/ansible/ansible.log
# Privilege Escalation
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
# SSH优化
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
3.2.2 SSH免密配置
bash
# 生成SSH密钥
ssh-keygen -t rsa -b 4096 -C "ansible@control-node" -f ~/.ssh/ansible_key
# 批量推送公钥到目标主机
ssh-copy-id -i ~/.ssh/ansible_key.pub admin@192.168.1.10
ssh-copy-id -i ~/.ssh/ansible_key.pub admin@192.168.1.11
# 或者使用Ansible自身
ansible all -i inventory -m authorized_key \
-a "user=admin key='{{ lookup('file', '~/.ssh/ansible_key.pub') }}'"
# SSH配置简化
cat >> ~/.ssh/config << 'EOF'
Host *
IdentityFile ~/.ssh/ansible_key
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
EOF
3.3 快速验证
bash
# 检查Ansible环境
ansible --version
# 测试连通性
ansible all -i inventory -m ping
# 查看所有主机
ansible all -i inventory --list-hosts
# 收集远程主机facts
ansible all -i inventory -m setup
# 查看特定facts
ansible all -i inventory -m setup -a "filter=ansible_distribution*"
# 执行临时命令
ansible all -i inventory -m command -a "uptime"
ansible all -i inventory -m shell -a "df -h"
四、实战案例
4.1 案例一:批量部署Web应用
4.1.1 项目结构
plaintext
webapp-deployment/
├── ansible.cfg
├── inventory/
│ └── prod.yml
├── playbooks/
│ └── deploy.yml
├── roles/
│ ├── common/
│ │ └── tasks/main.yml
│ ├── nginx/
│ │ ├── tasks/main.yml
│ │ ├── handlers/main.yml
│ │ └── templates/nginx.conf.j2
│ ├── app/
│ │ ├── tasks/main.yml
│ │ ├── handlers/main.yml
│ │ └── templates/env.conf.j2
│ └── monitoring/
│ └── tasks/main.yml
└── vars/
└── env.yml
4.1.2 主机清单
yaml
# inventory/prod.yml
all:
vars:
ansible_user: deploy
ansible_ssh_private_key_file: ~/.ssh/deploy_key
env: production
children:
webservers:
hosts:
web01.prod.internal:
nginx_port: 80
app_port: 8000
web02.prod.internal:
nginx_port: 80
app_port: 8001
web03.prod.internal:
nginx_port: 80
app_port: 8002
dbservers:
hosts:
db01.prod.internal:
db_host: 10.0.1.10
4.1.3 变量定义
yaml
# vars/env.yml
---
app_name: my-webapp
app_version: "2.1.0"
app_repo: https://github.com/example/my-webapp.git
app_deploy_path: /opt/app
app_user: appuser
app_group: appuser
database:
host: "{{ db_host }}"
port: 3306
name: webapp_db
user: webapp
password: "{{ vault_db_password }}" # 加密存储
redis:
host: 10.0.1.20
port: 6379
4.1.4 主Playbook
yaml
# playbooks/deploy.yml
---
- name: 准备环境 - 通用配置
hosts: webservers
become: yes
roles:
- role: common
tags: [common, prepare]
- name: 部署Nginx反向代理
hosts: webservers
become: yes
roles:
- role: nginx
tags: [nginx]
vars:
nginx_domain: "{{ app_domain }}"
upstream_servers:
- 127.0.0.1:{{ app_port }}
- name: 部署Web应用
hosts: webservers
become: yes
become_user: "{{ app_user }}"
roles:
- role: app
tags: [app]
vars:
deploy_strategy: rolling
health_check_url: "http://localhost:{{ nginx_port }}/health"
- name: 部署后验证
hosts: webservers
become: yes
tasks:
- name: 检查服务健康状态
uri:
url: "http://{{ inventory_hostname }}/health"
method: GET
status_code: 200
register: health_result
until: health_result.status == 200
retries: 10
delay: 5
- name: 显示部署结果
debug:
msg: "{{ app_name }} v{{ app_version }} 部署成功!"
4.1.5 应用Role
yaml
# roles/app/tasks/main.yml
---
- name: 创建应用用户
user:
name: "{{ app_user }}"
comment: "Application User"
shell: /bin/bash
home: "{{ app_deploy_path }}"
create_home: yes
system: yes
- name: 创建部署目录
file:
path: "{{ item }}"
state: directory
owner: "{{ app_user }}"
group: "{{ app_group }}"
mode: '0755'
loop:
- "{{ app_deploy_path }}"
- "{{ app_deploy_path }}/releases"
- "{{ app_deploy_path }}/shared"
- "{{ app_deploy_path }}/shared/logs"
- name: 克隆代码仓库
git:
repo: "{{ app_repo }}"
dest: "{{ app_deploy_path }}/releases/{{ deploy_timestamp }}"
version: "{{ app_version }}"
force: yes
depth: 1
register: git_clone
- name: 创建符号链接到新版本
file:
src: "{{ app_deploy_path }}/releases/{{ deploy_timestamp }}"
dest: "{{ app_deploy_path }}/current"
state: link
when: git_clone.changed
- name: 生成环境配置文件
template:
src: env.conf.j2
dest: "{{ app_deploy_path }}/shared/.env"
owner: "{{ app_user }}"
group: "{{ app_group }}"
mode: '0600'
notify: 重启应用
- name: 预编译Python应用
command: pip install -r requirements.txt
args:
chdir: "{{ app_deploy_path }}/current"
when: ansible_distribution == "Ubuntu"
failed_when: false
- name: 启动应用
systemd:
name: "{{ app_name }}"
state: started
enabled: yes
daemon_reload: yes
vars:
ansible_python_interpreter: /usr/bin/python3
# roles/app/vars/main.yml
---
deploy_timestamp: "{{ ansible_date_time.iso8601 }}"
4.2 案例二:配置文件批量管理
4.2.1 配置文件分发场景
yaml
# playbooks/config-management.yml
---
- name: 批量管理系统配置
hosts: all
become: yes
vars:
config_source_dir: ./config_files
backup_suffix: ".ansible.bak"
tasks:
- name: 获取主机分类
set_fact:
is_database_server: "{{ inventory_hostname in groups.get('dbservers', []) }}"
is_web_server: "{{ inventory_hostname in groups.get('webservers', []) }}"
- name: 分发基础配置文件
copy:
src: "{{ config_source_dir }}/base/{{ item }}"
dest: "/etc/{{ item }}"
owner: root
group: root
mode: '0644'
backup: yes
loop:
- resolv.conf
- limits.conf
when: not is_database_server and not is_web_server
- name: 分发limits配置
copy:
src: "{{ config_source_dir }}/limits.conf"
dest: /etc/security/limits.conf
owner: root
group: root
mode: '0644'
notify: 验证系统限制
- name: 更新sysctl配置
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
sysctl_file: /etc/sysctl.d/99-ansible.conf
loop:
- { name: 'net.core.somaxconn', value: '65535' }
- { name: 'net.ipv4.tcp_fin_timeout', value: '30' }
- { name: 'vm.swappiness', value: '10' }
- { name: 'fs.file-max', value: '100000' }
notify: 重新加载sysctl
4.2.2 敏感配置管理
yaml
# playbooks/secure-config.yml
---
- name: 分发加密配置文件
hosts: dbservers
become: yes
vars_files:
- vars/credentials.yml # 加密文件,需要vault密码
tasks:
- name: 分发数据库配置文件
template:
src: mysql.conf.j2
dest: /etc/mysql/mysql.conf.d/ansible.cnf
owner: mysql
group: mysql
mode: '0600'
backup: yes
notify: 重启MySQL
- name: 设置MySQL root密码
mysql_user:
name: root
host: "{{ item }}"
password: "{{ mysql_root_password }}"
priv: '*.*:ALL,GRANT'
loop:
- localhost
- 127.0.0.1
- "{{ ansible_fqdn }}"
when: mysql_root_password is defined
- name: 创建应用数据库
mysql_db:
name: "{{ db_name }}"
state: present
encoding: utf8mb4
collation: utf8mb4_unicode_ci
- name: 创建应用用户
mysql_user:
name: "{{ db_app_user }}"
host: "%"
password: "{{ db_app_password }}"
priv: "{{ db_name }}.*:SELECT,INSERT,UPDATE,DELETE,CREATE,DROP"
state: present
4.3 案例三:服务编排与滚动更新
4.3.1 滚动更新策略
yaml
# playbooks/rolling-update.yml
---
- name: 滚动更新Web应用
hosts: webservers
become: yes
vars:
update_batch_size: 1 # 每次更新的主机数
health_check_retries: 10
health_check_delay: 5
pre_deploy_script: /opt/scripts/pre_deploy.sh
post_deploy_script: /opt/scripts/post_deploy.sh
serial: "{{ update_batch_size }}" # 关键:串行执行
pre_tasks:
- name: 通知负载均衡器下线节点
debug:
msg: "将 {{ inventory_hostname }} 从负载均衡器下线"
changed_when: true
- name: 执行部署前脚本
script: "{{ pre_deploy_script }}"
delegate_to: localhost
tasks:
- name: 停止应用服务
systemd:
name: "{{ app_service }}"
state: stopped
- name: 备份当前版本
archive:
path: "{{ app_deploy_path }}/current"
dest: "{{ app_backup_path }}/{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.tar.gz"
format: gz
remove: no
- name: 更新代码
git:
repo: "{{ app_repo }}"
dest: "{{ app_deploy_path }}/current"
version: "{{ app_version }}"
force: yes
depth: 1
- name: 执行数据库迁移
command: flask db upgrade
args:
chdir: "{{ app_deploy_path }}/current"
when: run_migrations | default(true)
failed_when: false
- name: 启动应用服务
systemd:
name: "{{ app_service }}"
state: started
- name: 健康检查
uri:
url: "http://{{ inventory_hostname }}:{{ app_port }}/health"
status_code: 200
register: health
until: health.status == 200
retries: "{{ health_check_retries }}"
delay: "{{ health_check_delay }}"
- name: 更新依赖
pip:
requirements: "{{ app_deploy_path }}/current/requirements.txt"
executable: pip3
when: update_dependencies | default(true)
post_tasks:
- name: 通知负载均衡器上线节点
debug:
msg: "将 {{ inventory_hostname }} 上线到负载均衡器"
changed_when: true
- name: 发送部署通知
slack:
token: "{{ slack_bot_token }}"
msg: |
:white_check_mark: *部署成功*
主机: {{ inventory_hostname }}
版本: {{ app_version }}
时间: {{ ansible_date_time.iso8601 }}
when: slack_notification | default(false)
handlers:
- name: 验证Nginx配置
command: nginx -t
changed_when: false
- name: 重载Nginx
systemd:
name: nginx
state: reloaded
4.3.2 一键回滚
yaml
# playbooks/rollback.yml
---
- name: 回滚到上一版本
hosts: webservers
become: yes
vars:
backup_path: "{{ app_backup_path }}"
tasks:
- name: 查找最新备份
shell: |
ls -t {{ backup_path }}/*.tar.gz | head -1
register: latest_backup
changed_when: false
- name: 检查备份文件存在
assert:
that:
- latest_backup.stdout | length > 0
fail_msg: "未找到可用的备份文件"
- name: 停止应用服务
systemd:
name: "{{ app_service }}"
state: stopped
- name: 恢复上一版本
unarchive:
src: "{{ latest_backup.stdout }}"
dest: "{{ app_deploy_path }}"
remote_src: yes
- name: 启动应用服务
systemd:
name: "{{ app_service }}"
state: started
- name: 健康检查
uri:
url: "http://{{ inventory_hostname }}:{{ app_port }}/health"
status_code: 200
register: health
until: health.status == 200
retries: 5
delay: 3
- name: 回滚通知
debug:
msg: "{{ inventory_hostname }} 已回滚到上一版本"
五、最佳实践与常见问题
5.1 编码规范
1. Playbook编写规范
yaml
# ✅ 推荐:使用清晰的命名和结构
- name: 确保Nginx服务运行
systemd:
name: nginx
state: started
enabled: yes
# ❌ 避免:模糊的命名
- name: start nginx
systemd:
name: nginx
state: start
2. 变量命名规范
yaml
# 使用描述性变量名
vars:
# ✅ 好的命名
nginx_worker_processes: 4
app_database_host: db.internal
# ❌ 避免使用
var1: 4
host: db.internal
3. 任务注释规范
yaml
tasks:
# ✅ 清晰的任务说明
# 安装并配置Nginx反向代理,监听80端口
- name: 安装Nginx
apt:
name: nginx
state: present
# ❌ 过于简单或无注释
- name: install nginx
apt:
name: nginx
state: present
5.2 性能优化
1. 开启SSH连接复用
ini
# ansible.cfg
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
2. 调整并行数
yaml
# ansible.cfg
[defaults]
forks = 20 # 根据网络和CPU能力调整
# 或者命令行
ansible-playbook site.yml -f 20
3. 关闭不必要的Facts收集
yaml
# Playbook中关闭
- hosts: all
gather_facts: no
# 只收集需要的facts
- hosts: all
gather_facts: yes
setup_facts_filter:
- ansible_fqdn
- ansible_distribution
- ansible_processor_vcpus
4. 使用异步执行长时间任务
yaml
- name: 批量执行长时间任务
shell: /opt/long-task.sh
async: 3600
poll: 0
register: long_tasks
- name: 等待所有任务完成
async_status:
jid: "{{ item.ansible_job_id }}"
loop: "{{ long_tasks.results }}"
register: job_results
until: job_results.finished
retries: 100
delay: 30
5.3 安全最佳实践
1. 敏感信息管理
yaml
# 使用ansible-vault加密敏感文件
ansible-vault create vars/secrets.yml
# 分离敏感变量
# vars/secrets.yml (加密)
---
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"
# vars/public.yml (不加密)
---
db_host: db.internal
db_port: 3306
2. 权限最小化
yaml
# 使用最小必要权限
- name: 复制配置文件
copy:
src: app.conf
dest: /etc/app.conf
owner: root
group: root
mode: '0644' # 不过度开放权限
# 避免在生产环境使用root
- name: 应用部署
hosts: webservers
become: yes
become_user: appuser # 使用专用用户
3. SSH安全配置
bash
# inventory中配置SSH参数
[all:vars]
ansible_ssh_common_args='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
ansible_ssh_private_key_file=~/.ssh/deploy_key
5.4 常见问题与解决方案
问题1:SSH连接超时
yaml
# 解决方案:调整超时时间和SSH参数
# ansible.cfg
[defaults]
timeout = 60
[ssh_connection]
ssh_args = -o ConnectTimeout=30 -o ServerAliveInterval=60
pipelining = True
# 或者在inventory中设置
[all:vars]
ansible_ssh_timeout=60
问题2:权限被拒绝 (Permission Denied)
yaml
# 解决方案:正确配置become
- name: 安装软件
yum:
name: nginx
state: present
become: yes # 启用权限提升
become_user: root
become_method: sudo
# 或在inventory中设置
[webservers:vars]
ansible_become=yes
ansible_become_user=root
ansible_become_method=sudo
ansible_become_flags='--ask-become-pass' # 需要输入密码
问题3:幂等性问题
yaml
# 问题:某些命令每次都报告changed
- name: 创建目录
command: mkdir -p /path/to/dir
creates: /path/to/dir # 添加creates检查
# 或者使用专用模块
- name: 创建目录
file:
path: /path/to/dir
state: directory # 内置幂等性
问题4:模板变量未定义
yaml
# 解决方案:使用默认值
{{ variable | default('default_value') }}
# 严格模式检查
- name: 验证必要变量
assert:
that:
- required_var is defined
- required_var | length > 0
fail_msg: "required_var 必须定义且不能为空"
问题5:Facts缓存问题
yaml
# 问题:Facts过期
# 解决方案:配置facts缓存
# ansible.cfg
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400 # 缓存24小时
# 强制刷新facts
ansible-playbook site.yml --tags always
# tasks中
- setup:
when: False # 使用缓存的facts
问题6:Windows管理问题
yaml
# 安装winrm连接器
pip install pywinrm
# inventory配置
[windows]
win01.example.com
[windows:vars]
ansible_user=administrator
ansible_password="{{ vault_windows_password }}"
ansible_connection=winrm
ansible_winrm_server_cert_validation=ignore
ansible_winrm_transport=ntlm
5.5 调试技巧
1. 查看详细输出
bash
# 开启debug模式
ansible-playbook site.yml -v # 详细输出
ansible-playbook site.yml -vv # 更详细
ansible-playbook site.yml -vvv # 包含SSH连接信息
ansible-playbook site.yml -vvvv # 包含SSH调试信息
# 仅测试语法
ansible-playbook site.yml --syntax-check
# 列出所有任务(不执行)
ansible-playbook site.yml --list-tasks
2. 测试单个任务
bash
# 只执行特定任务
ansible-playbook site.yml --tags=install
ansible-playbook site.yml --start-at-task="安装Nginx"
# 模拟执行(check模式)
ansible-playbook site.yml --check
ansible-playbook site.yml --check --diff # 显示文件变化
3. 收集调试信息
yaml
- name: 调试变量
debug:
msg: "Variable value: {{ my_variable }}"
verbosity: 2 # 只在 -vv 时显示
- name: 打印所有facts
debug:
var: ansible_facts
六、总结
本文系统介绍了Ansible自动化运维工具的核心概念、核心组件、部署配置和实战案例。通过本文,你应该能够:
| 技能 | 掌握程度 |
|---|---|
| 核心概念 | 理解控制节点、受控节点、模块、剧本的协作关系 |
| Inventory | 熟练使用INI/YAML格式定义主机清单 |
| Modules | 掌握常用模块的用法(package/copy/template/service等) |
| Playbooks | 能够编写结构清晰、功能完整的Playbook |
| Roles | 运用Roles实现代码模块化和复用 |
| Variables | 理解变量优先级和Jinja2模板语法 |
| 实战应用 | 完成批量部署、配置管理、滚动更新等场景 |
| 最佳实践 | 遵循编码规范、性能优化和安全最佳实践 |
Ansible作为DevOps时代的核心工具,能够极大提升运维效率,减少人为错误。建议从简单场景开始,逐步掌握高级特性,构建适合自己团队的自动化运维体系。
💡 推荐学习路径:
- 先熟悉基础概念和Inventory配置
- 尝试编写简单的Playbook
- 学习常用模块的用法
- 掌握Roles组织代码
- 在测试环境验证后应用到生产
更多推荐
所有评论(0)