【YOLO系列10】YOLOv8详解——Ultralytics的Anchor-Free革新

本文是YOLO系列博客的第十篇,深入解析Ultralytics发布的YOLOv8,包括Anchor-Free设计、C2f模块、解耦检测头、任务对齐学习以及多任务支持。

1. 引言

2023年1月,Ultralytics发布了YOLOv8,这是YOLO系列的一次重大升级。YOLOv8不仅继承了YOLOv5的工程化优势,还引入了Anchor-Free设计、更先进的骨干网络和检测头,成为当时最先进的实时目标检测器之一。

项目信息

  • 作者:Glenn Jocher(Ultralytics)
  • 框架:PyTorch
  • 代码:https://github.com/ultralytics/ultralytics
  • 特点:统一的多任务框架(检测、分割、姿态估计、分类)

2. 核心改进

功能特性

多任务支持

统一API

丰富导出格式

训练改进

TAL标签分配

DFL损失

VFL损失

架构改进

Anchor-Free

C2f模块

解耦头

3. Anchor-Free设计

3.1 从Anchor-Based到Anchor-Free

特性 Anchor-Based Anchor-Free
先验框 需要预定义 不需要
超参数 Anchor尺寸、比例 无相关超参
预测方式 相对Anchor偏移 直接回归
匹配复杂度 较高 较低

3.2 YOLOv8的预测方式

YOLOv8 Anchor-Free

特征点

直接预测距离

l, t, r, b

YOLOv5 Anchor-Based

特征点

预测相对Anchor的偏移

tx, ty, tw, th

直接预测特征点到边界框四边的距离:

box=(x−l⋅s,y−t⋅s,x+r⋅s,y+b⋅s)\text{box} = (x - l \cdot s, y - t \cdot s, x + r \cdot s, y + b \cdot s)box=(xls,yts,x+rs,y+bs)

其中 sss 是特征图的步长。

3.3 DFL(Distribution Focal Loss)

将边界框回归建模为离散分布:

class DFL(nn.Module):
    def __init__(self, c1=16):
        super().__init__()
        self.conv = nn.Conv2d(c1, 1, 1, bias=False).requires_grad_(False)
        x = torch.arange(c1, dtype=torch.float)
        self.conv.weight.data[:] = nn.Parameter(x.view(1, c1, 1, 1))
        self.c1 = c1

    def forward(self, x):
        b, c, a = x.shape  # batch, channels, anchors
        return self.conv(x.view(b, 4, self.c1, a).transpose(2, 1).softmax(1)).view(b, 4, a)

优势

  • 更灵活的边界表示
  • 对边界不确定性建模
  • 提升定位精度

4. C2f模块

4.1 设计演进

CSP in YOLOv5

C3模块

C2f in YOLOv8

4.2 C2f结构

C2f模块

输入

Conv

Split

Part1

Part2

Bottleneck

Bottleneck

Concat

Conv

4.3 代码实现

class C2f(nn.Module):
    """CSP Bottleneck with 2 convolutions."""
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)
        self.m = nn.ModuleList(
            Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0)
            for _ in range(n)
        )

    def forward(self, x):
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

class Bottleneck(nn.Module):
    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

4.4 C2f vs C3对比

特性 C3 C2f
分支数 2 2+n
特征融合 简单拼接 多级拼接
梯度流 较少路径 更多路径
参数量 略多 略少

5. 解耦检测头

5.1 结构设计

输入特征

Stem

分类分支

回归分支

Conv 3×3

Conv 3×3

类别预测 nc

Conv 3×3

Conv 3×3

边界框预测 4×reg_max

5.2 实现代码

class Detect(nn.Module):
    """YOLOv8 Detect head."""
    def __init__(self, nc=80, ch=()):
        super().__init__()
        self.nc = nc  # number of classes
        self.nl = len(ch)  # number of detection layers
        self.reg_max = 16  # DFL channels
        self.no = nc + self.reg_max * 4  # number of outputs per anchor
        self.stride = torch.zeros(self.nl)

        c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], self.nc)
        self.cv2 = nn.ModuleList(
            nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3),
                         nn.Conv2d(c2, 4 * self.reg_max, 1))
            for x in ch
        )
        self.cv3 = nn.ModuleList(
            nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3),
                         nn.Conv2d(c3, self.nc, 1))
            for x in ch
        )
        self.dfl = DFL(self.reg_max)

    def forward(self, x):
        for i in range(self.nl):
            x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
        return x

5.3 输出格式

每个尺度的输出通道:

channels=4×reg_max+num_classes\text{channels} = 4 \times \text{reg\_max} + \text{num\_classes}channels=4×reg_max+num_classes

默认:4×16+80=1444 \times 16 + 80 = 1444×16+80=144

6. TAL(Task-Aligned Learning)

6.1 任务对齐分配

YOLOv8使用TAL进行标签分配:

t=sα⋅uβt = s^\alpha \cdot u^\betat=sαuβ

其中:

  • sss:分类分数
  • uuu:IoU分数
  • α,β\alpha, \betaα,β:权重超参数

6.2 实现

class TaskAlignedAssigner:
    def __init__(self, topk=13, num_classes=80, alpha=1.0, beta=6.0):
        self.topk = topk
        self.num_classes = num_classes
        self.alpha = alpha
        self.beta = beta

    @torch.no_grad()
    def forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes):
        # 计算对齐度量
        align_metric = pd_scores.pow(self.alpha) * overlaps.pow(self.beta)

        # 选择top-k
        topk_metrics, topk_idxs = align_metric.topk(self.topk, dim=-1)

        # 动态选择正样本
        # ...

        return target_labels, target_bboxes, target_scores

6.3 TAL优势

特性 传统分配 TAL
考虑因素 仅IoU 分类+IoU
适应性 固定阈值 动态调整
训练一致性 任务分离 任务对齐

7. 损失函数

7.1 总损失

L=λboxLbox+λclsLcls+λdflLdfl\mathcal{L} = \lambda_{box}\mathcal{L}_{box} + \lambda_{cls}\mathcal{L}_{cls} + \lambda_{dfl}\mathcal{L}_{dfl}L=λboxLbox+λclsLcls+λdflLdfl

7.2 分类损失:BCE with Logits

Lcls=BCE(p^,p)\mathcal{L}_{cls} = \text{BCE}(\hat{p}, p)Lcls=BCE(p^,p)

或使用Varifocal Loss:

VFL={−q(qlog⁡(p)+(1−q)log⁡(1−p))q>0−αpγlog⁡(1−p)q=0\text{VFL} = \begin{cases} -q(q\log(p) + (1-q)\log(1-p)) & q > 0 \\ -\alpha p^\gamma \log(1-p) & q = 0 \end{cases}VFL={q(qlog(p)+(1q)log(1p))αpγlog(1p)q>0q=0

7.3 边界框损失:CIoU + DFL

Lbox=LCIoU+LDFL\mathcal{L}_{box} = \mathcal{L}_{CIoU} + \mathcal{L}_{DFL}Lbox=LCIoU+LDFL

def bbox_loss(pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, fg_mask):
    # IoU loss
    iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], CIoU=True)
    loss_iou = ((1.0 - iou) * target_scores[fg_mask]).sum() / target_scores_sum

    # DFL loss
    target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)
    loss_dfl = self._df_loss(pred_dist[fg_mask], target_ltrb[fg_mask])

    return loss_iou, loss_dfl

8. 网络架构

8.1 整体结构

# YOLOv8n
backbone:
  - [-1, 1, Conv, [64, 3, 2]]      # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]     # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]     # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]     # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]    # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]       # 9

head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]
  - [-1, 3, C2f, [512]]            # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]
  - [-1, 3, C2f, [256]]            # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]
  - [-1, 3, C2f, [512]]            # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]
  - [-1, 3, C2f, [1024]]           # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]]

8.2 模型变体

模型 深度 宽度 参数量 FLOPs mAP
YOLOv8n 0.33 0.25 3.2M 8.7G 37.3
YOLOv8s 0.33 0.50 11.2M 28.6G 44.9
YOLOv8m 0.67 0.75 25.9M 78.9G 50.2
YOLOv8l 1.00 1.00 43.7M 165.2G 52.9
YOLOv8x 1.00 1.25 68.2M 257.8G 53.9

9. 多任务支持

9.1 统一API

from ultralytics import YOLO

# 目标检测
model = YOLO('yolov8n.pt')
results = model('image.jpg')

# 实例分割
model = YOLO('yolov8n-seg.pt')
results = model('image.jpg')

# 姿态估计
model = YOLO('yolov8n-pose.pt')
results = model('image.jpg')

# 图像分类
model = YOLO('yolov8n-cls.pt')
results = model('image.jpg')

# OBB检测(旋转框)
model = YOLO('yolov8n-obb.pt')
results = model('image.jpg')

9.2 分割头

class Segment(Detect):
    def __init__(self, nc=80, nm=32, npr=256, ch=()):
        super().__init__(nc, ch)
        self.nm = nm  # number of masks
        self.npr = npr  # number of protos
        self.proto = Proto(ch[0], self.npr, self.nm)
        self.cv4 = nn.ModuleList(
            nn.Sequential(Conv(x, self.npr, 3), Conv(self.npr, self.npr, 3),
                         nn.Conv2d(self.npr, self.nm, 1))
            for x in ch
        )

9.3 姿态估计头

class Pose(Detect):
    def __init__(self, nc=80, kpt_shape=(17, 3), ch=()):
        super().__init__(nc, ch)
        self.kpt_shape = kpt_shape
        self.nk = kpt_shape[0] * kpt_shape[1]
        self.cv4 = nn.ModuleList(
            nn.Sequential(Conv(x, x, 3), Conv(x, x, 3),
                         nn.Conv2d(x, self.nk, 1))
            for x in ch
        )

10. 训练与推理

10.1 训练命令

# CLI训练
yolo detect train data=coco.yaml model=yolov8n.pt epochs=100 imgsz=640

# Python训练
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
model.train(data='coco.yaml', epochs=100, imgsz=640)

10.2 推理命令

# 推理
results = model('image.jpg')

# 解析结果
for result in results:
    boxes = result.boxes.xyxy   # 边界框
    masks = result.masks        # 分割掩码
    keypoints = result.keypoints  # 关键点
    probs = result.probs        # 分类概率

10.3 导出格式

yolo export model=yolov8n.pt format=onnx  # ONNX
yolo export model=yolov8n.pt format=engine  # TensorRT
yolo export model=yolov8n.pt format=coreml  # CoreML
yolo export model=yolov8n.pt format=tflite  # TFLite

11. 实验结果

11.1 COCO val2017

模型 mAP@50 mAP@50-95 速度(T4 TRT)
YOLOv8n 52.6 37.3 0.99ms
YOLOv8s 61.8 44.9 1.20ms
YOLOv8m 67.2 50.2 1.83ms
YOLOv8l 69.8 52.9 2.39ms
YOLOv8x 71.0 53.9 3.53ms

11.2 与其他模型对比

模型 mAP 参数量 速度
YOLOv5s 37.4 7.2M 1.0ms
YOLOv7-tiny 37.4 6.2M 1.1ms
YOLOv8n 37.3 3.2M 0.99ms

12. 总结

YOLOv8的核心特点:

改进 说明
Anchor-Free 简化设计,减少超参数
C2f模块 更丰富的梯度流
解耦头 分类回归分离
TAL 任务对齐的标签分配
DFL 分布式边界框回归
多任务 检测/分割/姿态/分类统一

YOLOv8是目前最流行的实时目标检测框架之一,兼具高性能和易用性。


参考资源

  • 官方仓库:https://github.com/ultralytics/ultralytics
  • 官方文档:https://docs.ultralytics.com

上一篇:【YOLO系列09】YOLOv7详解——E-ELAN与辅助训练头
下一篇:【YOLO系列11】YOLOv9详解——GELAN与可编程梯度信息

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐