【YOLO系列10】YOLOv8详解——Ultralytics的Anchor-Free革新
YOLOv8是Ultralytics推出的新一代实时目标检测器,在YOLOv5基础上进行了多项创新:采用Anchor-Free设计简化检测流程,引入C2f模块增强特征融合能力,使用解耦检测头提升分类与定位精度。核心改进包括任务对齐学习(TAL)标签分配策略、分布焦点损失(DFL)和变焦点损失(VFL)优化训练过程。架构上支持多任务统一框架,涵盖检测、分割、姿态估计和分类。相比前代,YOLOv8在保
【YOLO系列10】YOLOv8详解——Ultralytics的Anchor-Free革新
本文是YOLO系列博客的第十篇,深入解析Ultralytics发布的YOLOv8,包括Anchor-Free设计、C2f模块、解耦检测头、任务对齐学习以及多任务支持。
1. 引言
2023年1月,Ultralytics发布了YOLOv8,这是YOLO系列的一次重大升级。YOLOv8不仅继承了YOLOv5的工程化优势,还引入了Anchor-Free设计、更先进的骨干网络和检测头,成为当时最先进的实时目标检测器之一。
项目信息:
- 作者:Glenn Jocher(Ultralytics)
- 框架:PyTorch
- 代码:https://github.com/ultralytics/ultralytics
- 特点:统一的多任务框架(检测、分割、姿态估计、分类)
2. 核心改进
3. Anchor-Free设计
3.1 从Anchor-Based到Anchor-Free
| 特性 | Anchor-Based | Anchor-Free |
|---|---|---|
| 先验框 | 需要预定义 | 不需要 |
| 超参数 | Anchor尺寸、比例 | 无相关超参 |
| 预测方式 | 相对Anchor偏移 | 直接回归 |
| 匹配复杂度 | 较高 | 较低 |
3.2 YOLOv8的预测方式
直接预测特征点到边界框四边的距离:
box=(x−l⋅s,y−t⋅s,x+r⋅s,y+b⋅s)\text{box} = (x - l \cdot s, y - t \cdot s, x + r \cdot s, y + b \cdot s)box=(x−l⋅s,y−t⋅s,x+r⋅s,y+b⋅s)
其中 sss 是特征图的步长。
3.3 DFL(Distribution Focal Loss)
将边界框回归建模为离散分布:
class DFL(nn.Module):
def __init__(self, c1=16):
super().__init__()
self.conv = nn.Conv2d(c1, 1, 1, bias=False).requires_grad_(False)
x = torch.arange(c1, dtype=torch.float)
self.conv.weight.data[:] = nn.Parameter(x.view(1, c1, 1, 1))
self.c1 = c1
def forward(self, x):
b, c, a = x.shape # batch, channels, anchors
return self.conv(x.view(b, 4, self.c1, a).transpose(2, 1).softmax(1)).view(b, 4, a)
优势:
- 更灵活的边界表示
- 对边界不确定性建模
- 提升定位精度
4. C2f模块
4.1 设计演进
4.2 C2f结构
4.3 代码实现
class C2f(nn.Module):
"""CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
super().__init__()
self.c = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1)
self.m = nn.ModuleList(
Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0)
for _ in range(n)
)
def forward(self, x):
y = list(self.cv1(x).chunk(2, 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
class Bottleneck(nn.Module):
def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
super().__init__()
c_ = int(c2 * e)
self.cv1 = Conv(c1, c_, k[0], 1)
self.cv2 = Conv(c_, c2, k[1], 1, g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
4.4 C2f vs C3对比
| 特性 | C3 | C2f |
|---|---|---|
| 分支数 | 2 | 2+n |
| 特征融合 | 简单拼接 | 多级拼接 |
| 梯度流 | 较少路径 | 更多路径 |
| 参数量 | 略多 | 略少 |
5. 解耦检测头
5.1 结构设计
5.2 实现代码
class Detect(nn.Module):
"""YOLOv8 Detect head."""
def __init__(self, nc=80, ch=()):
super().__init__()
self.nc = nc # number of classes
self.nl = len(ch) # number of detection layers
self.reg_max = 16 # DFL channels
self.no = nc + self.reg_max * 4 # number of outputs per anchor
self.stride = torch.zeros(self.nl)
c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], self.nc)
self.cv2 = nn.ModuleList(
nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3),
nn.Conv2d(c2, 4 * self.reg_max, 1))
for x in ch
)
self.cv3 = nn.ModuleList(
nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3),
nn.Conv2d(c3, self.nc, 1))
for x in ch
)
self.dfl = DFL(self.reg_max)
def forward(self, x):
for i in range(self.nl):
x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
return x
5.3 输出格式
每个尺度的输出通道:
channels=4×reg_max+num_classes\text{channels} = 4 \times \text{reg\_max} + \text{num\_classes}channels=4×reg_max+num_classes
默认:4×16+80=1444 \times 16 + 80 = 1444×16+80=144
6. TAL(Task-Aligned Learning)
6.1 任务对齐分配
YOLOv8使用TAL进行标签分配:
t=sα⋅uβt = s^\alpha \cdot u^\betat=sα⋅uβ
其中:
- sss:分类分数
- uuu:IoU分数
- α,β\alpha, \betaα,β:权重超参数
6.2 实现
class TaskAlignedAssigner:
def __init__(self, topk=13, num_classes=80, alpha=1.0, beta=6.0):
self.topk = topk
self.num_classes = num_classes
self.alpha = alpha
self.beta = beta
@torch.no_grad()
def forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes):
# 计算对齐度量
align_metric = pd_scores.pow(self.alpha) * overlaps.pow(self.beta)
# 选择top-k
topk_metrics, topk_idxs = align_metric.topk(self.topk, dim=-1)
# 动态选择正样本
# ...
return target_labels, target_bboxes, target_scores
6.3 TAL优势
| 特性 | 传统分配 | TAL |
|---|---|---|
| 考虑因素 | 仅IoU | 分类+IoU |
| 适应性 | 固定阈值 | 动态调整 |
| 训练一致性 | 任务分离 | 任务对齐 |
7. 损失函数
7.1 总损失
L=λboxLbox+λclsLcls+λdflLdfl\mathcal{L} = \lambda_{box}\mathcal{L}_{box} + \lambda_{cls}\mathcal{L}_{cls} + \lambda_{dfl}\mathcal{L}_{dfl}L=λboxLbox+λclsLcls+λdflLdfl
7.2 分类损失:BCE with Logits
Lcls=BCE(p^,p)\mathcal{L}_{cls} = \text{BCE}(\hat{p}, p)Lcls=BCE(p^,p)
或使用Varifocal Loss:
VFL={−q(qlog(p)+(1−q)log(1−p))q>0−αpγlog(1−p)q=0\text{VFL} = \begin{cases} -q(q\log(p) + (1-q)\log(1-p)) & q > 0 \\ -\alpha p^\gamma \log(1-p) & q = 0 \end{cases}VFL={−q(qlog(p)+(1−q)log(1−p))−αpγlog(1−p)q>0q=0
7.3 边界框损失:CIoU + DFL
Lbox=LCIoU+LDFL\mathcal{L}_{box} = \mathcal{L}_{CIoU} + \mathcal{L}_{DFL}Lbox=LCIoU+LDFL
def bbox_loss(pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, fg_mask):
# IoU loss
iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], CIoU=True)
loss_iou = ((1.0 - iou) * target_scores[fg_mask]).sum() / target_scores_sum
# DFL loss
target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)
loss_dfl = self._df_loss(pred_dist[fg_mask], target_ltrb[fg_mask])
return loss_iou, loss_dfl
8. 网络架构
8.1 整体结构
# YOLOv8n
backbone:
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f, [256, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]]
- [-1, 3, C2f, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]]
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]]
- [-1, 3, C2f, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]]
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]]
8.2 模型变体
| 模型 | 深度 | 宽度 | 参数量 | FLOPs | mAP |
|---|---|---|---|---|---|
| YOLOv8n | 0.33 | 0.25 | 3.2M | 8.7G | 37.3 |
| YOLOv8s | 0.33 | 0.50 | 11.2M | 28.6G | 44.9 |
| YOLOv8m | 0.67 | 0.75 | 25.9M | 78.9G | 50.2 |
| YOLOv8l | 1.00 | 1.00 | 43.7M | 165.2G | 52.9 |
| YOLOv8x | 1.00 | 1.25 | 68.2M | 257.8G | 53.9 |
9. 多任务支持
9.1 统一API
from ultralytics import YOLO
# 目标检测
model = YOLO('yolov8n.pt')
results = model('image.jpg')
# 实例分割
model = YOLO('yolov8n-seg.pt')
results = model('image.jpg')
# 姿态估计
model = YOLO('yolov8n-pose.pt')
results = model('image.jpg')
# 图像分类
model = YOLO('yolov8n-cls.pt')
results = model('image.jpg')
# OBB检测(旋转框)
model = YOLO('yolov8n-obb.pt')
results = model('image.jpg')
9.2 分割头
class Segment(Detect):
def __init__(self, nc=80, nm=32, npr=256, ch=()):
super().__init__(nc, ch)
self.nm = nm # number of masks
self.npr = npr # number of protos
self.proto = Proto(ch[0], self.npr, self.nm)
self.cv4 = nn.ModuleList(
nn.Sequential(Conv(x, self.npr, 3), Conv(self.npr, self.npr, 3),
nn.Conv2d(self.npr, self.nm, 1))
for x in ch
)
9.3 姿态估计头
class Pose(Detect):
def __init__(self, nc=80, kpt_shape=(17, 3), ch=()):
super().__init__(nc, ch)
self.kpt_shape = kpt_shape
self.nk = kpt_shape[0] * kpt_shape[1]
self.cv4 = nn.ModuleList(
nn.Sequential(Conv(x, x, 3), Conv(x, x, 3),
nn.Conv2d(x, self.nk, 1))
for x in ch
)
10. 训练与推理
10.1 训练命令
# CLI训练
yolo detect train data=coco.yaml model=yolov8n.pt epochs=100 imgsz=640
# Python训练
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
model.train(data='coco.yaml', epochs=100, imgsz=640)
10.2 推理命令
# 推理
results = model('image.jpg')
# 解析结果
for result in results:
boxes = result.boxes.xyxy # 边界框
masks = result.masks # 分割掩码
keypoints = result.keypoints # 关键点
probs = result.probs # 分类概率
10.3 导出格式
yolo export model=yolov8n.pt format=onnx # ONNX
yolo export model=yolov8n.pt format=engine # TensorRT
yolo export model=yolov8n.pt format=coreml # CoreML
yolo export model=yolov8n.pt format=tflite # TFLite
11. 实验结果
11.1 COCO val2017
| 模型 | mAP@50 | mAP@50-95 | 速度(T4 TRT) |
|---|---|---|---|
| YOLOv8n | 52.6 | 37.3 | 0.99ms |
| YOLOv8s | 61.8 | 44.9 | 1.20ms |
| YOLOv8m | 67.2 | 50.2 | 1.83ms |
| YOLOv8l | 69.8 | 52.9 | 2.39ms |
| YOLOv8x | 71.0 | 53.9 | 3.53ms |
11.2 与其他模型对比
| 模型 | mAP | 参数量 | 速度 |
|---|---|---|---|
| YOLOv5s | 37.4 | 7.2M | 1.0ms |
| YOLOv7-tiny | 37.4 | 6.2M | 1.1ms |
| YOLOv8n | 37.3 | 3.2M | 0.99ms |
12. 总结
YOLOv8的核心特点:
| 改进 | 说明 |
|---|---|
| Anchor-Free | 简化设计,减少超参数 |
| C2f模块 | 更丰富的梯度流 |
| 解耦头 | 分类回归分离 |
| TAL | 任务对齐的标签分配 |
| DFL | 分布式边界框回归 |
| 多任务 | 检测/分割/姿态/分类统一 |
YOLOv8是目前最流行的实时目标检测框架之一,兼具高性能和易用性。
参考资源
- 官方仓库:https://github.com/ultralytics/ultralytics
- 官方文档:https://docs.ultralytics.com
上一篇:【YOLO系列09】YOLOv7详解——E-ELAN与辅助训练头
下一篇:【YOLO系列11】YOLOv9详解——GELAN与可编程梯度信息
更多推荐
所有评论(0)