1. YOLOV8环境配置

1.1 conda创建新的虚拟环境

 conda creat -name YOLOv8 python=3.9 

1.2 查看创建的虚拟环境

conda env list

1.3 激活虚拟环境

conda activate YOLOv8

1.4 更新pip

防止下载版本过低而无法正常下载,此步骤也可以省略。

 pip install --upgrade pip 
 pip install setuptools wheel

1.5 清除缓存

这一步可以大大提过下载速度,防止安装OpenCV时过卡。

 pip cache purge  

1.6 安装ultralytics

其中,–verbose用来显示安装进度

pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple  --verbose

这一步安装YOLOv8所需的库,但是看不到YOLOv8的代码,如果想看到代码,需要到官网下载。
问题:在YOLOv8虚拟环境下用matplotlib无法显示图片时,可以将matplotlib版本由3.9.2改为3.4.0。执行下面代码可以自动卸载matplotlib的3.9.2版本,下载matplotlib的3.4.0版本。

 pip install matplotlib==3.4.0 -i https://pypi.tuna.tsinghua.edu.cn/simple  --verbose 

2.使用YOLOV8进行目标跟踪

from collections import defaultdict

import cv2
import numpy as np

from ultralytics import YOLO

# Load the YOLOv8 model
model = YOLO("yolov8n.pt")

# Open the video file
video_path = "trafic.mp4"
cap = cv2.VideoCapture(video_path)

# Store the track history
track_history = defaultdict(lambda: [])

while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLOv8 tracking on the frame, persisting tracks between frames
        results = model.track(frame, persist=True)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLOv8 Tracking", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()

tracker

3.调试代码

3.1 官网下载代码

ultralytics-main,下载地址,下载后的代码存放在ultralytics-main文件夹中。

3.2 代码入口路径

代码入口路径在ultralytics-main/ultralytics/cfg/init.py 中的entrypoint(debug=“”)函数。
(1)调试方法一
在__init__.py文件最后有调用代码的程序。

if __name__ == "__main__":
    # Example: entrypoint(debug='yolo predict model=yolov8n.pt')
    ##entrypoint(debug="")
    ## 例如调试跟踪代码的命令
    entrypoint(debug="yolo detect train data=coco8.yaml model = yolo8n.pt epochs=100 imgz=640 ")

(2)调试方法二
或者新创建个文件来调试代码

## 在debug.py中
##############################  调试代码  ##############################
from ultralytics import YOLO  ## 导入库

## 1.1 导入模型,'yolov8n.pt'表示加载与训练好的模型
model = YOLO(model= 'yolov8n.pt')         
## 1.2 导入模型配置文件,'yolov8n.yaml'表示从新️训练模型;
## task可选的detact\segment\classify,如果不指定task,则模型根据model的类型选择task
# model = YOLO(model= 'yolov8n.yaml',task = detect)  
## 1.3  执行任务 model.train训练模型 ; model.val测试模型 ;model.predict预测模型; model.export 导出相应类型; model.track跟踪 ;
results = model.train(epochs=3) 

4 计算损失

4.1 流程

  1. 数据处理(1)preds处理,把三个特征层融合在一起(2)生成anchor的中心点 (3) targets 处理。把targets由[n_obj,1+1+4]变为[batch_size,counts.max(),1+4]。
  2. 正样本匹配 align_metric计算。
  3. 损失函数 DFL计算。

(1)preds处理
ultralytics/utils/loss.py

# 207行
    def __call__(self, preds, batch):
        """Calculate the sum of the loss for box, cls and dfl multiplied by batch size.
        img[640,640]
        
        preds:list 3,特征[[16,144,80,80],[16,144,40,40],[16,144,20,20]] [batch_size,channel(80cls+16*4(lrtb)),w,h] ,16表示(lrtb)这4个值在0~15上的分布。Anchor free
        
        DFL(distribution Focal loss) y_pred = sum(p_i*i) i=0,1,2,...,15即预测出的16个值分别与0~15相乘再求和。得到lrtb(left  right  top  bottom )
        
        batch: 字典包括7个keys(img_file,ori_shape,resize_shape,img[batch,3,640,640],cls(n_obj,1),bboxes[n_obj,4],batch_idx[n_obj],n_obj个真实框属于的图片idx)
        
        """
        
        
        loss = torch.zeros(3, device=self.device)  # 存放 box, cls, dfl 的损失。
        feats = preds[1] if isinstance(preds, tuple) else preds
        ## 对特征处理。[16,144,80,80]-->[16,144,6400];[16,144,40,40]-->[16,144,1600];[16,144,20,20]-->[16,144,400]
        ;
        ## [16,144,6400]+[16,144,1600]+[16,144,400](concate)-->[16,144,8400] 
        ## [16,144,8400](split 144=64+80)-->[16,64,8400]+[16,80,8400]
        pred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split(
            (self.reg_max * 4, self.nc), 1
        )
        ## 定位[16,64,8400]-->[16,8400,64]; 类别[16,80,8400]-->[16,8400,80]
        pred_scores = pred_scores.permute(0, 2, 1).contiguous()
        pred_distri = pred_distri.permute(0, 2, 1).contiguous()
        
        ## 生成anchor
        dtype = pred_scores.dtype           ## 数据类型
        batch_size = pred_scores.shape[0]   ## batch_size
        imgsz = torch.tensor(feats[0].shape[2:], device=self.device, dtype=dtype) * self.stride[0]       # original image size (h,w) feats[0].shape[2:]=80  ;  self.stride[0]下采样倍数数2**3 ; 80*8=640 
        anchor_points, stride_tensor = make_anchors(feats, self.stride, 0.5)   # 生成anchor_points

        # Targets
        targets = torch.cat((batch["batch_idx"].view(-1, 1), batch["cls"].view(-1, 1), batch["bboxes"]), 1)  # [n_obj,1+1+4] 1(batch_idx)+1(cls)+4 (bboxes:x,y,w,h)
        targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])  # 
        gt_labels, gt_bboxes = targets.split((1, 4), 2)  #[batch_size,counts.max(),1+4] -->  cls[batch_size,counts.max(),1], xyxy [batch_size,counts.max(),4]
        mask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0.0)  # (x_t+y_l+x_d+y_r) 判断框四个值加起来是否大于0,若大于0则为True。

        # Pboxes
        pred_bboxes = self.bbox_decode(anchor_points, pred_distri)  # xyxy, (b, h*w, 4)

        _, target_bboxes, target_scores, fg_mask, _ = self.assigner(
            pred_scores.detach().sigmoid(),
            (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),
            anchor_points * stride_tensor,
            gt_labels,
            gt_bboxes,
            mask_gt,
        )

        target_scores_sum = max(target_scores.sum(), 1)

        # Cls loss.  类别预测值pred_scores[batch_size,8400,80] 类别真实值target_scores[batch_size,8400,80]
        # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum  # VFL way
        loss[1] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum  # BCE

        # Bbox loss
        if fg_mask.sum():  # 是否有正样本
            target_bboxes /= stride_tensor
            loss[0], loss[2] = self.bbox_loss(
                pred_distri, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask
            )

        loss[0] *= self.hyp.box  # box gain
        loss[1] *= self.hyp.cls  # cls gain
        loss[2] *= self.hyp.dfl  # dfl gain

        return loss.sum() * batch_size, loss.detach()  # loss(box, cls, dfl)

(2)生成anchor代码 tal.py

# 303行
def make_anchors(feats, strides, grid_cell_offset=0.5):
    """Generate anchors from features.
    feats:[[16,144,80,80],[16,144,40,40],[16,144,20,20]] 模型输出的特征
    strides: 2**3,2**4,2**5,下采样倍数
    grid_cell_offset:图像生成网格的偏移值
    """
    anchor_points, stride_tensor = [], []   ## 空子典
    assert feats is not None
    dtype, device = feats[0].dtype, feats[0].device
    for i, stride in enumerate(strides):
        _, _, h, w = feats[i].shape    ## 
        sx = torch.arange(end=w, device=device, dtype=dtype) + grid_cell_offset  # shift x  x轴坐标
        sy = torch.arange(end=h, device=device, dtype=dtype) + grid_cell_offset  # shift y   y轴坐标
        sy, sx = torch.meshgrid(sy, sx, indexing="ij") if TORCH_1_10 else torch.meshgrid(sy, sx)    ## (x坐标,y坐标)
        anchor_points.append(torch.stack((sx, sy), -1).view(-1, 2))
        stride_tensor.append(torch.full((h * w, 1), stride, dtype=dtype, device=device))
    return torch.cat(anchor_points), torch.cat(stride_tensor)  ## [6400+1600+400,2] ; [8400,1];anchor_points*stride_tensor-->先验框在原图上的坐标

(3)处理y_true

'''loss.py 180行
targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])

'''
    def preprocess(self, targets, batch_size, scale_tensor):
        """Preprocesses the target counts and matches with the input batch size to output a tensor.
        targets:[n_obj,1+1+4]  idx+cls+box(x,y,w,h)
        batch_size
        scale_tensor: [4] 原图尺寸[w,h,w,h][640,640,640,640] 用于把targets归一化后的坐标映射到原图上
        """
        nl, ne = targets.shape
        if nl == 0:   
            out = torch.zeros(batch_size, 0, ne - 1, device=self.device)
        else:
            i = targets[:, 0]   # image index 
            _, counts = i.unique(return_counts=True)  # 统计相同索引的个数。计算每张图片上真实框的数量
            counts = counts.to(dtype=torch.int32)     # 转换counts类型
            out = torch.zeros(batch_size, counts.max(), ne - 1, device=self.device)  # [n_obj,1+1+4]-->[batch_size, counts.max(),5] ,每张图片上真实框的个数相同,不足的框的补零
            for j in range(batch_size):  # 
                matches = i == j
                n = matches.sum()
                if n:
                    out[j, :n] = targets[matches, 1:]  # out数值(0~1)
            out[..., 1:5] = xywh2xyxy(out[..., 1:5].mul_(scale_tensor)) #  把归一化的值映射到原图得到框在原图上的位置,(x,y,w,h)-->(x_t,y_l,x_d,y_r) 框左上角和右下角坐标
        return out  # [batch_size,counts.max(),1+4]

4.2 正样本匹配 align_metric计算。


4.3 损失函数 DFL计算。

  1. 解码:预测的ltrb结合anchor(ax,ay)中心点 转换成x_min ,y_min,x_max,y_max.(左上角和右下角坐标) x_min = (ax-l)*s; x_max = (ax+r)*s ;y_min = (ax-t)*s ;y_max = (ax+b)*s;s为下采样倍数,有8400个预测框。

  2. 初筛正样本:真实框覆盖的anchor_point是正样本,其他的anchor_point时负样本。特征图上的anchor_point映射到原图上,找到真实框在原图上覆盖的anchor_point。

  3. 再次筛选正样本:获取每个正样本对应的boxes_score(cls)及正样本与真实框的CIOU 。 + 计算align_metric (TAL):align_metric=boxes_score0.5 *CIOU6 + 根据align_metric筛选楚Top_n个作为正样本。

  4. 如果anchor_point与多个真实框匹配,则anchor_point根据CIOU值,选出最匹配的真实框。

  5. 用正样本和真实框计算损失。


模型


损失函数

loss = 0.5*BCE + 7.5CIoU + 1.5 DFL
DFL损失
DFL(S_i,S_(i+1))= -(y_(i+1)-y)log(S_i)-(y-y_i)log(S_(i+1))

# 65行 ; ultralytics/utils/loss.py
class DFLoss(nn.Module):
    """Criterion class for computing DFL losses during training."""

    def __init__(self, reg_max=16) -> None:
        """Initialize the DFL module."""
        super().__init__()
        self.reg_max = reg_max

    def __call__(self, pred_dist, target):
        """
        Return sum of left and right DFL losses.

        Distribution Focal Loss (DFL) proposed in Generalized Focal Loss
        https://ieeexplore.ieee.org/document/9792391
        pred_dist[n_p*4,16] , n_p(正样本数量) ,pred_dist有n_p(正样本数量)个lrtb,16表示每个lrtb有16个对应值
        target[n_p,4]  
        """
        target = target.clamp_(0, self.reg_max - 1 - 0.01)
        tl = target.long()  # target left  向下取整,作为真实值左边的值,[n_p,4]
        tr = tl + 1  # target right ,真实值右边的值 [n_p,4]
        wl = tr - target  # weight left ,[n_p,4]
        wr = 1 - wl  # weight right,[n_p,4]
        return (  ## 把左边值作为真实值计算的交叉熵和把右边值作为真实值计算的交叉熵加权作为DFL
            F.cross_entropy(pred_dist, tl.view(-1), reduction="none").view(tl.shape) * wl
            + F.cross_entropy(pred_dist, tr.view(-1), reduction="none").view(tl.shape) * wr
        ).mean(-1, keepdim=True)  #  [n_p,4]-->[n_p,1]
Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐