YOLO系列13:YOLO11详解——Ultralytics最新一代目标检测模型

1. 引言

YOLO11是Ultralytics于2024年9月发布的最新一代YOLO模型。作为YOLOv8的继任者,YOLO11在保持高效率的同时进一步提升了检测精度,引入了C3k2模块C2PSA注意力机制等创新设计,并在多任务支持、模型部署等方面进行了全面优化。

1.1 核心特点

特点 描述
C3k2模块 更高效的特征提取模块,替代C2f
C2PSA 融合位置敏感注意力的特征增强
多任务统一 检测、分割、姿态、OBB、分类
轻量化设计 减少参数量同时提升性能
部署友好 支持多种导出格式和边缘部署

1.2 性能表现

MS COCO 数据集性能:

YOLO11n: 39.5% AP, 2.6M参数, 6.5 GFLOPs
YOLO11s: 47.0% AP, 9.4M参数, 21.5 GFLOPs
YOLO11m: 51.5% AP, 20.1M参数, 68.0 GFLOPs
YOLO11l: 53.4% AP, 25.3M参数, 86.9 GFLOPs
YOLO11x: 54.7% AP, 56.9M参数, 194.9 GFLOPs

相比YOLOv8的提升:
- YOLO11m vs YOLOv8m: 51.5% vs 50.2% (+1.3% AP)
- 参数减少约22%

2. 架构改进

2.1 整体架构

YOLO11延续了YOLO系列的backbone-neck-head架构,但在各模块上进行了优化:

YOLO11架构

Head

Neck

Backbone

检测头 P3

Conv stem

Stage1: C3k2

Stage2: C3k2

Stage3: C3k2

Stage4: C3k2+SPPF

P3特征

P4特征

C2PSA

上采样融合

上采样融合

下采样融合

下采样融合

检测头 P4

检测头 P5

输入图像 640×640

2.2 与YOLOv8的架构对比

组件 YOLOv8 YOLO11 改进点
特征模块 C2f C3k2 更高效的特征提取
注意力 C2PSA 增强高层特征
下采样 Conv(s=2) Conv(s=2) 保持不变
Head Decoupled Decoupled 轻量化优化
标签分配 TAL TAL 保持不变

3. C3k2模块详解

3.1 模块设计

C3k2是YOLO11的核心特征提取模块,相比C2f有更好的效率-精度权衡:

C2f vs C3k2

C3k2

输入

Conv1x1

Split

C3k Block

C3k Block

Concat

Conv1x1

输出

C2f

输入

Split

Bottleneck

Bottleneck

Concat

输出

3.2 C3k2实现

class C3k2(nn.Module):
    """C3k2模块:使用两个可定制的卷积块"""
    def __init__(self, c1, c2, n=1, c3k=False, e=0.5, shortcut=True):
        """
        Args:
            c1: 输入通道数
            c2: 输出通道数
            n: 堆叠块的数量
            c3k: 是否使用C3k块(3x3卷积核)
            e: 通道扩展比例
            shortcut: 是否使用残差连接
        """
        super().__init__()
        self.c = int(c2 * e)  # 隐藏通道数

        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)

        # 根据c3k参数选择块类型
        if c3k:
            self.m = nn.ModuleList(
                C3k(self.c, self.c, 2, shortcut) for _ in range(n)
            )
        else:
            self.m = nn.ModuleList(
                Bottleneck(self.c, self.c, shortcut, e=1.0) for _ in range(n)
            )

    def forward(self, x):
        # 第一次卷积并分割
        y = list(self.cv1(x).chunk(2, 1))

        # 通过各个块
        for m in self.m:
            y.append(m(y[-1]))

        # 拼接并输出
        return self.cv2(torch.cat(y, 1))


class C3k(nn.Module):
    """C3k块:使用3x3卷积核的CSP结构"""
    def __init__(self, c1, c2, n=1, shortcut=True, e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)
        self.m = nn.Sequential(
            *[Bottleneck_C3k(c_, c_, shortcut, k=3) for _ in range(n)]
        )

    def forward(self, x):
        return self.cv3(torch.cat([self.m(self.cv1(x)), self.cv2(x)], 1))


class Bottleneck_C3k(nn.Module):
    """使用3x3卷积的Bottleneck"""
    def __init__(self, c1, c2, shortcut=True, k=3, e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, k, 1)
        self.cv2 = Conv(c_, c2, k, 1)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

3.3 效率分析

对于输入通道 C C C,隐藏通道 C ′ = C × e C' = C \times e C=C×e,堆叠数 n n n

C2f参数量
P C 2 f = 2 C ⋅ C ′ + ( 2 + n ) ⋅ C ′ ⋅ C + n ⋅ ( C ′ 2 + 9 C ′ 2 ) P_{C2f} = 2C \cdot C' + (2+n) \cdot C' \cdot C + n \cdot (C'^2 + 9C'^2) PC2f=2CC+(2+n)CC+n(C′2+9C′2)

C3k2参数量
P C 3 k 2 = 2 C ⋅ C ′ + ( 2 + n ) ⋅ C ′ ⋅ C + n ⋅ ( 2 ⋅ 9 C ′ 2 ⋅ e ) P_{C3k2} = 2C \cdot C' + (2+n) \cdot C' \cdot C + n \cdot (2 \cdot 9C'^2 \cdot e) PC3k2=2CC+(2+n)CC+n(29C′2e)

在相同配置下,C3k2通常能减少约15-20%的参数量。

4. C2PSA注意力模块

4.1 位置敏感注意力

C2PSA(C2 with Position-Sensitive Attention)在高层特征图中引入注意力机制:

C2PSA

输入特征

Conv 1x1

通道分割

主分支

注意力分支

PSA Block

PSA Block

拼接

Conv 1x1

输出特征

4.2 PSA Block实现

class C2PSA(nn.Module):
    """带有位置敏感注意力的C2模块"""
    def __init__(self, c1, c2, n=1, e=0.5):
        super().__init__()
        self.c = int(c2 * e)
        self.cv1 = Conv(c1, 2 * self.c, 1)
        self.cv2 = Conv(2 * self.c, c2, 1)

        # PSA块序列
        self.m = nn.Sequential(
            *[PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64)
              for _ in range(n)]
        )

    def forward(self, x):
        # 分割通道
        a, b = self.cv1(x).chunk(2, 1)
        # 对一个分支应用PSA
        b = self.m(b)
        # 拼接并输出
        return self.cv2(torch.cat([a, b], 1))


class PSABlock(nn.Module):
    """位置敏感注意力块"""
    def __init__(self, c, attn_ratio=0.5, num_heads=4):
        super().__init__()
        self.attn = Attention(c, num_heads=num_heads, attn_ratio=attn_ratio)
        self.ffn = nn.Sequential(
            Conv(c, c * 2, 1),
            nn.GELU(),
            Conv(c * 2, c, 1)
        )

    def forward(self, x):
        x = x + self.attn(x)
        x = x + self.ffn(x)
        return x


class Attention(nn.Module):
    """多头自注意力"""
    def __init__(self, dim, num_heads=8, attn_ratio=0.5):
        super().__init__()
        self.num_heads = num_heads
        self.head_dim = dim // num_heads
        self.key_dim = int(self.head_dim * attn_ratio)
        self.scale = self.key_dim ** -0.5

        # QKV投影
        qk_dim = self.key_dim * num_heads
        v_dim = dim
        self.qkv = Conv(dim, qk_dim * 2 + v_dim, 1)

        # 输出投影
        self.proj = Conv(dim, dim, 1)

        # 位置编码
        self.pe = Conv(dim, dim, 3, 1, g=dim)

    def forward(self, x):
        B, C, H, W = x.shape
        N = H * W

        # QKV计算
        qkv = self.qkv(x)
        qk_dim = self.key_dim * self.num_heads
        q, k, v = qkv.split([qk_dim, qk_dim, C], dim=1)

        # 重塑为多头形式
        q = q.view(B, self.num_heads, self.key_dim, N).transpose(-1, -2)
        k = k.view(B, self.num_heads, self.key_dim, N).transpose(-1, -2)
        v = v.view(B, self.num_heads, self.head_dim, N).transpose(-1, -2)

        # 注意力计算
        attn = (q @ k.transpose(-2, -1)) * self.scale
        attn = attn.softmax(dim=-1)
        x = (attn @ v).transpose(-1, -2).reshape(B, C, H, W)

        # 添加位置编码
        x = x + self.pe(v.reshape(B, C, H, W))

        # 输出投影
        x = self.proj(x)

        return x

4.3 注意力的作用

C2PSA仅应用于backbone的最后一个stage(P5特征),原因:

  1. 高层特征更抽象:需要全局信息整合
  2. 计算效率:高层特征图尺寸小,注意力计算开销可控
  3. 精度提升明显:对小目标和遮挡场景帮助大

5. 网络配置详解

5.1 模型配置

# YOLO11模型配置示例
# YOLO11n
nc: 80  # 类别数
scales:
  n: [0.50, 0.25, 1024]  # depth, width, max_channels

backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]           # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]          # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]   # 2
  - [-1, 1, Conv, [256, 3, 2]]          # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]   # 4
  - [-1, 1, Conv, [512, 3, 2]]          # 5-P4/16
  - [-1, 2, C3k2, [512, True]]          # 6
  - [-1, 1, Conv, [1024, 3, 2]]         # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]         # 8
  - [-1, 1, SPPF, [1024, 5]]            # 9
  - [-1, 2, C2PSA, [1024]]              # 10

head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]]           # cat backbone P4
  - [-1, 2, C3k2, [512, False]]         # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]]           # cat backbone P3
  - [-1, 2, C3k2, [256, False]]         # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]]          # cat head P4
  - [-1, 2, C3k2, [512, False]]         # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]]          # cat head P5
  - [-1, 2, C3k2, [1024, True]]         # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]]     # Detect(P3, P4, P5)

5.2 模型缩放

YOLO11提供5种模型尺寸,通过depth和width系数控制:

模型 depth width max_channels 参数量 FLOPs
YOLO11n 0.50 0.25 1024 2.6M 6.5G
YOLO11s 0.50 0.50 1024 9.4M 21.5G
YOLO11m 0.50 1.00 512 20.1M 68.0G
YOLO11l 1.00 1.00 512 25.3M 86.9G
YOLO11x 1.00 1.50 512 56.9M 194.9G
def scale_model(base_config, depth_mul, width_mul, max_channels):
    """模型缩放函数"""
    scaled_config = copy.deepcopy(base_config)

    for layer in scaled_config['backbone'] + scaled_config['head']:
        # 缩放重复次数
        if layer[1] > 1:
            layer[1] = max(1, int(layer[1] * depth_mul))

        # 缩放通道数
        if layer[2] in ['Conv', 'C3k2', 'C2PSA', 'SPPF']:
            layer[3][0] = min(int(layer[3][0] * width_mul), max_channels)

    return scaled_config

6. 多任务支持

6.1 支持的任务类型

YOLO11提供统一的多任务框架:

YOLO11多任务

共享Backbone

目标检测

实例分割

姿态估计

旋转框检测

图像分类

6.2 检测头变体

class Detect(nn.Module):
    """YOLO11检测头"""
    def __init__(self, nc=80, ch=()):
        super().__init__()
        self.nc = nc  # 类别数
        self.nl = len(ch)  # 检测层数
        self.reg_max = 16  # DFL bins

        c2, c3 = max(16, ch[0] // 4, self.reg_max * 4), max(ch[0], min(self.nc, 100))

        # 回归分支
        self.cv2 = nn.ModuleList(
            nn.Sequential(
                Conv(x, c2, 3),
                Conv(c2, c2, 3),
                nn.Conv2d(c2, 4 * self.reg_max, 1)
            ) for x in ch
        )

        # 分类分支
        self.cv3 = nn.ModuleList(
            nn.Sequential(
                Conv(x, c3, 3),
                Conv(c3, c3, 3),
                nn.Conv2d(c3, self.nc, 1)
            ) for x in ch
        )

        # DFL层
        self.dfl = DFL(self.reg_max)

    def forward(self, x):
        for i in range(self.nl):
            x[i] = torch.cat([self.cv2[i](x[i]), self.cv3[i](x[i])], 1)
        return x


class Segment(Detect):
    """实例分割头"""
    def __init__(self, nc=80, nm=32, npr=256, ch=()):
        super().__init__(nc, ch)
        self.nm = nm  # mask数量
        self.npr = npr  # prototype数量

        c4 = max(ch[0] // 4, self.nm)
        self.cv4 = nn.ModuleList(
            nn.Sequential(
                Conv(x, c4, 3),
                Conv(c4, c4, 3),
                nn.Conv2d(c4, self.nm, 1)
            ) for x in ch
        )

        # Prototype预测
        self.proto = Proto(ch[0], self.npr, self.nm)


class Pose(Detect):
    """姿态估计头"""
    def __init__(self, nc=1, kpt_shape=(17, 3), ch=()):
        super().__init__(nc, ch)
        self.kpt_shape = kpt_shape
        self.nk = kpt_shape[0] * kpt_shape[1]  # 关键点维度

        c4 = max(ch[0] // 4, self.nk)
        self.cv4 = nn.ModuleList(
            nn.Sequential(
                Conv(x, c4, 3),
                Conv(c4, c4, 3),
                nn.Conv2d(c4, self.nk, 1)
            ) for x in ch
        )


class OBB(Detect):
    """旋转框检测头"""
    def __init__(self, nc=80, ne=1, ch=()):
        super().__init__(nc, ch)
        self.ne = ne  # 额外参数(角度)

        c4 = max(ch[0] // 4, self.ne)
        self.cv4 = nn.ModuleList(
            nn.Sequential(
                Conv(x, c4, 3),
                Conv(c4, c4, 3),
                nn.Conv2d(c4, self.ne, 1)
            ) for x in ch
        )

6.3 各任务性能

任务 模型 数据集 指标 性能
检测 YOLO11m COCO mAP50-95 51.5%
分割 YOLO11m-seg COCO mask mAP 42.0%
姿态 YOLO11m-pose COCO AP 66.9%
OBB YOLO11m-obb DOTAv1 mAP50 79.1%
分类 YOLO11m-cls ImageNet Top-1 77.3%

7. 训练与优化

7.1 训练配置

# YOLO11训练配置
train_config = {
    # 基础设置
    'epochs': 500,
    'batch': 16,
    'imgsz': 640,

    # 优化器
    'optimizer': 'SGD',
    'lr0': 0.01,
    'lrf': 0.01,  # 最终学习率 = lr0 * lrf
    'momentum': 0.937,
    'weight_decay': 0.0005,
    'warmup_epochs': 3.0,
    'warmup_momentum': 0.8,
    'warmup_bias_lr': 0.1,

    # 损失权重
    'box': 7.5,
    'cls': 0.5,
    'dfl': 1.5,

    # 数据增强
    'hsv_h': 0.015,
    'hsv_s': 0.7,
    'hsv_v': 0.4,
    'degrees': 0.0,
    'translate': 0.1,
    'scale': 0.5,
    'shear': 0.0,
    'perspective': 0.0,
    'flipud': 0.0,
    'fliplr': 0.5,
    'mosaic': 1.0,
    'mixup': 0.0,
    'copy_paste': 0.0,
}

7.2 损失函数

YOLO11沿用YOLOv8的损失函数设计:

总损失
L = λ b o x L b o x + λ c l s L c l s + λ d f l L d f l \mathcal{L} = \lambda_{box} \mathcal{L}_{box} + \lambda_{cls} \mathcal{L}_{cls} + \lambda_{dfl} \mathcal{L}_{dfl} L=λboxLbox+λclsLcls+λdflLdfl

边界框损失(CIoU)
L b o x = 1 − C I o U + λ d f l ⋅ D F L \mathcal{L}_{box} = 1 - CIoU + \lambda_{dfl} \cdot DFL Lbox=1CIoU+λdflDFL

分类损失(BCE with Logits)
L c l s = B C E ( p , y ) \mathcal{L}_{cls} = BCE(p, y) Lcls=BCE(p,y)

分布焦点损失
L d f l = − ( ( y i + 1 − y ) log ⁡ ( p i ) + ( y − y i ) log ⁡ ( p i + 1 ) ) \mathcal{L}_{dfl} = -((y_{i+1} - y)\log(p_i) + (y - y_i)\log(p_{i+1})) Ldfl=((yi+1y)log(pi)+(yyi)log(pi+1))

class YOLOv11Loss:
    """YOLO11损失函数"""
    def __init__(self, model):
        self.bce = nn.BCEWithLogitsLoss(reduction='none')
        self.hyp = model.hyp
        self.stride = model.stride
        self.nc = model.nc
        self.reg_max = model.reg_max

        self.assigner = TaskAlignedAssigner(
            topk=10, num_classes=self.nc, alpha=0.5, beta=6.0
        )
        self.bbox_loss = BboxLoss(self.reg_max - 1)

    def __call__(self, preds, batch):
        loss = torch.zeros(3, device=preds[0].device)
        feats = preds[1] if isinstance(preds, tuple) else preds

        # 解码预测
        pred_distri, pred_scores = torch.cat(
            [xi.view(feats[0].shape[0], self.nc + self.reg_max * 4, -1)
             for xi in feats], 2
        ).split((self.reg_max * 4, self.nc), 1)

        # 标签分配
        targets = self.preprocess(batch['cls'], batch['bboxes'])
        gt_labels, gt_bboxes, mask_gt = targets.split((1, 4, 1), 2)

        # TAL分配
        target_labels, target_bboxes, target_scores, fg_mask = self.assigner(
            pred_scores.sigmoid(),
            (pred_bboxes * stride).type(gt_bboxes.dtype),
            anchor_points * stride,
            gt_labels, gt_bboxes, mask_gt
        )

        # 计算损失
        if fg_mask.sum():
            # Box损失
            loss[0], loss[2] = self.bbox_loss(
                pred_distri[fg_mask], pred_bboxes[fg_mask],
                target_bboxes[fg_mask], target_scores[fg_mask]
            )

        # 分类损失
        loss[1] = self.bce(pred_scores, target_scores).sum() / max(fg_mask.sum(), 1)

        loss[0] *= self.hyp['box']
        loss[1] *= self.hyp['cls']
        loss[2] *= self.hyp['dfl']

        return loss.sum()

7.3 标签分配策略

YOLO11使用Task-Aligned Assigner:

class TaskAlignedAssigner(nn.Module):
    """任务对齐标签分配器"""
    def __init__(self, topk=13, num_classes=80, alpha=1.0, beta=6.0):
        super().__init__()
        self.topk = topk
        self.num_classes = num_classes
        self.alpha = alpha
        self.beta = beta

    @torch.no_grad()
    def forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt):
        """
        Args:
            pd_scores: [B, num_anchors, num_classes]
            pd_bboxes: [B, num_anchors, 4]
            anc_points: [num_anchors, 2]
            gt_labels: [B, max_gt, 1]
            gt_bboxes: [B, max_gt, 4]
            mask_gt: [B, max_gt, 1]
        """
        bs, n_max_boxes = gt_bboxes.shape[:2]

        if n_max_boxes == 0:
            return self.get_empty_targets(bs, pd_scores.device)

        # 计算对齐度量
        align_metric, overlaps = self.get_box_metrics(
            pd_scores, pd_bboxes, gt_labels, gt_bboxes
        )

        # 选择Top-K候选
        mask_topk = self.select_topk_candidates(align_metric, mask_gt)

        # 过滤重叠
        target_gt_idx, fg_mask, mask_pos = self.select_candidates_in_gts(
            anc_points, gt_bboxes, mask_topk
        )

        # 生成目标
        target_labels, target_bboxes, target_scores = self.get_targets(
            gt_labels, gt_bboxes, target_gt_idx, fg_mask
        )

        # 归一化对齐度量
        align_metric *= mask_pos
        pos_align_metrics = align_metric.max(-1)[0].unsqueeze(-1)
        pos_overlaps = (overlaps * mask_pos).max(-1)[0].unsqueeze(-1)
        norm_align_metric = (align_metric / (pos_align_metrics + 1e-9)).pow(self.alpha) * \
                           (overlaps / (pos_overlaps + 1e-9)).pow(self.beta)
        target_scores = target_scores * norm_align_metric.max(-1)[0].unsqueeze(-1)

        return target_labels, target_bboxes, target_scores, fg_mask

    def get_box_metrics(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes):
        """计算分类和IoU对齐度量"""
        # 获取预测分数
        ind = gt_labels.long().flatten()
        cls_scores = pd_scores.permute(0, 2, 1).flatten(0, 1)[:, ind].reshape(
            pd_scores.shape[0], -1, gt_labels.shape[1]
        )

        # 计算IoU
        overlaps = bbox_iou(pd_bboxes.unsqueeze(2), gt_bboxes.unsqueeze(1)).squeeze(3)

        # 对齐度量
        align_metric = cls_scores.pow(self.alpha) * overlaps.pow(self.beta)

        return align_metric, overlaps

8. 模型部署

8.1 导出格式支持

YOLO11支持多种部署格式:

格式 用途 命令
ONNX 通用推理 export format=onnx
TensorRT NVIDIA GPU export format=engine
CoreML Apple设备 export format=coreml
OpenVINO Intel设备 export format=openvino
TFLite 移动端 export format=tflite
NCNN 移动端 export format=ncnn
Edge TPU Google Edge TPU export format=edgetpu

8.2 导出示例

from ultralytics import YOLO

# 加载模型
model = YOLO('yolo11m.pt')

# 导出ONNX
model.export(format='onnx', dynamic=True, simplify=True)

# 导出TensorRT(需要NVIDIA GPU)
model.export(format='engine', device=0, half=True)

# 导出TFLite(INT8量化)
model.export(format='tflite', int8=True, data='coco128.yaml')

8.3 推理部署

from ultralytics import YOLO

# ONNX推理
model_onnx = YOLO('yolo11m.onnx')
results = model_onnx('image.jpg')

# TensorRT推理
model_trt = YOLO('yolo11m.engine')
results = model_trt('image.jpg')

# 批量推理
results = model(['img1.jpg', 'img2.jpg', 'img3.jpg'])

# 视频推理
results = model('video.mp4', stream=True)
for r in results:
    boxes = r.boxes  # 检测框
    masks = r.masks  # 分割掩码(如果有)
    keypoints = r.keypoints  # 关键点(如果有)

9. 实验与分析

9.1 与前代对比

模型 参数量 FLOPs mAP50-95 延迟(ms)
YOLOv8n 3.2M 8.7G 37.3% 1.21
YOLO11n 2.6M 6.5G 39.5% 1.15
YOLOv8s 11.2M 28.6G 44.9% 2.33
YOLO11s 9.4M 21.5G 47.0% 2.28
YOLOv8m 25.9M 78.9G 50.2% 5.09
YOLO11m 20.1M 68.0G 51.5% 4.85

9.2 消融实验

C3k2模块效果

配置 参数量 AP 说明
C2f 25.9M 50.2% YOLOv8基线
C3k2 (c3k=False) 22.3M 50.8% 标准C3k2
C3k2 (c3k=True) 20.1M 51.5% 使用C3k块

C2PSA效果

配置 参数量 AP 说明
无注意力 19.5M 50.8% 基线
SE注意力 19.8M 51.0% Squeeze-Excitation
C2PSA 20.1M 51.5% 位置敏感注意力

10. 使用示例

10.1 快速开始

from ultralytics import YOLO

# 加载预训练模型
model = YOLO('yolo11m.pt')

# 推理
results = model('https://ultralytics.com/images/bus.jpg')

# 显示结果
results[0].show()

# 保存结果
results[0].save('result.jpg')

10.2 训练自定义数据

from ultralytics import YOLO

# 创建新模型
model = YOLO('yolo11m.yaml')

# 或加载预训练模型
model = YOLO('yolo11m.pt')

# 训练
results = model.train(
    data='custom_data.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    device=0,
    workers=8,
    patience=50,  # 早停
    save=True,
    save_period=10,
)

# 验证
metrics = model.val()
print(f"mAP50-95: {metrics.box.map}")

# 推理
results = model('test_image.jpg')

10.3 多任务示例

from ultralytics import YOLO

# 实例分割
seg_model = YOLO('yolo11m-seg.pt')
results = seg_model('image.jpg')
masks = results[0].masks  # 分割掩码

# 姿态估计
pose_model = YOLO('yolo11m-pose.pt')
results = pose_model('image.jpg')
keypoints = results[0].keypoints  # 关键点

# 旋转框检测
obb_model = YOLO('yolo11m-obb.pt')
results = obb_model('aerial_image.jpg')
obb_boxes = results[0].obb  # 旋转框

# 分类
cls_model = YOLO('yolo11m-cls.pt')
results = cls_model('image.jpg')
probs = results[0].probs  # 分类概率

11. 总结

11.1 YOLO11核心创新

创新点 描述 效果
C3k2模块 可定制的高效特征提取 减少22%参数
C2PSA 位置敏感注意力增强 提升1.3% AP
多任务统一 5种视觉任务支持 一个框架
部署优化 多格式导出支持 易于部署

11.2 发展趋势

YOLO11代表了目标检测的几个重要趋势:

  1. 效率优先:在提升精度的同时减少参数和计算量
  2. 注意力融合:将Transformer思想引入CNN架构
  3. 任务统一:用统一框架支持多种视觉任务
  4. 部署友好:重视实际应用中的部署需求

11.3 适用场景

  • 实时目标检测应用
  • 多任务视觉系统
  • 边缘设备部署
  • 快速原型开发

系列导航

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐