使用opencv结合yolov5实现视觉识别示例

本文介绍了使用OpenCV部署YOLOv5目标检测的两种方案：基于PyTorch环境运行.pt文件，或利用转换后的.onnx文件实现轻量级推理。文章详细对比了两种方法的实现代码、环境依赖与性能特点，指出.onnx方案具有更快速度与更少依赖的优势，并提供了完整的摄像头实时检测示例，为模型部署提供了实用参考。

三伏522

334人浏览 · 2026-01-05 15:57:07

三伏522 · 2026-01-05 15:57:07 发布

前言

使用opencv进行视觉识别可以使用yolov5的.pt文件与它的依赖环境进行识别，也可以使用opencv与使用.pt转换而来的.onnx文件实现识别。

一般而言使用.onnx文件实现识别比使用.pt文件实现识别要更快，依赖更少，所以我推荐使用.onnnx文件实现

这些示例代码既可以在win10运行，也可以在树莓派上使用。

注意：这些代码都是示例代码，可以直接使用，但是效率不高需要自行优化

前置条件

安装好yolov5环境
下载yolov5代码
安装numpy，opencv-python，Python 3.8 – 3.10(建议使用3.9版本)

视觉识别代码

pt版本

使用yolov5生成的.pt文件，需要在yolov5环境中使用，将代码放置在yolov5代码目录内

注意：这段代码要求必须在YOLOv5源代码目录下运行

import cv2  
import torch  
from models.common import DetectMultiBackend  
from utils.general import non_max_suppression, scale_boxes  
  
# 加载预训练模型（确保yolov5n.pt在当前目录或指定路径）  
model = DetectMultiBackend('yolov5n.pt', device='cpu')  # 使用GPU 0，若无GPU则用'cpu'  
model.eval()  # 设置为评估模式  
  
# 设置检测阈值  
conf_thres = 0.4  # 置信度阈值  
iou_thres = 0.7  # NMS IoU阈值  
  
# 打开摄像头  
cap = cv2.VideoCapture(0)  # 0表示默认摄像头  
  
# 检查摄像头是否成功打开  
if not cap.isOpened():  
    print("无法打开摄像头，请检查摄像头连接")  
    exit()  
  
print("开始实时检测，按 'q' 键退出")  
ret, frame = cap.read()  
  
while True:  
    # 读取帧  
    ret, frame = cap.read()  
    if not ret:  
        print("无法获取帧，退出...")  
        break  
  
    # 颜色空间转换：OpenCV默认使用BGR，但模型需要RGB  
    img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  
    # 维度转换：从(H, W, C)转换为(C, H, W) - PyTorch格式  
    img = img.transpose(2, 0, 1)  
    # 转换为张量并归一化：[0, 255] → [0, 1]  
    img = torch.from_numpy(img).to(model.device).float() / 255.0  
    # 添加批次维度：从(3, H, W)变为(1, 3, H, W)  
    img = img.unsqueeze(0)  
  
    # 进行推理  
    with torch.no_grad():  
        """  
        作用：从模型输出中提取检测结果  
        [0],获取第一个批次的结果  
        # output[0]: 检测结果 [1, 25200, 85]        # output[1]: 特征图或其他信息（可能为None）  
        # 所以我们需要取 [0]        output[0] - 主检测结果  
        output[1] - 原始特征图（如果有）  
        output[2] - 其他信息（训练时更常见）  
        """        pred = model(img)[0]  # 获取预测结果  
        """  
        目标检测后处理中至关重要的步骤，用于消除重叠的冗余检测框。  
        NMS算法原理:核心思想,保留最好的一个框，抑制其他高度重叠的框。  
        置信度阈值 (conf_thres),作用：过滤掉低质量的检测框,典型值：0.25（YOLOv5默认）  
        IoU阈值 (iou_thres),作用：判断两个框是否重叠太多,典型值：0.45（YOLOv5默认）  
        处理单张图片时，必须使用 [0] 来获取张量,从批次结果中提取单张图片的结果  
        """        pred = non_max_suppression(pred, conf_thres, iou_thres)[0]  # 应用NMS  
        print(img.shape)  
        print(frame.shape)  
        print(pred[:, :4])  
        print("-"*30)  
    # 如果检测到目标  
    if pred is not None and len(pred) > 0:  
        # 将坐标缩放到原始图像大小  
        """  
        将边界框从 img1_shape 缩放到 img0_shape        参数:  
            img1_shape: 模型输入图像的尺寸 (height, width) 或 (height, width, channels)            boxes: 预测的边界框 [n, 4] 或 [n, 6] (包含置信度和类别)  
            img0_shape: 原始图像的尺寸 (height, width) 或 (height, width, channels)            ratio_pad: 可选的 (缩放比例, 填充) 元组，避免重复计算  
        返回:  
            缩放后的边界框，坐标已四舍五入并限制在图像范围内  
        """        pred[:, :4] = scale_boxes(img.shape[2:], pred[:, :4], frame.shape).round()  
  
        # 绘制检测结果  
        for *xyxy, conf, cls in pred:  
            """  
            *xyxy：使用星号表示将前4个元素（即边界框的坐标x1, y1, x2, y2）收集到一个列表xyxy中。  
            conf：第5个元素，表示该检测框的置信度（confidence）。  
            cls：第6个元素，表示该检测框的类别索引（class index）。  
            """            x1, y1, x2, y2 = map(int, xyxy)  
            # 绘制边界框  
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)  
            # 获取类别名称  
            cls_name = model.names[int(cls)] if hasattr(model, 'names') else f'Class {int(cls)}'  
            # 显示类别和置信度  
            cv2.putText(frame, f'{cls_name} {conf:.2f}',  
                        (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX,  
                        0.5, (0, 255, 0), 2)  
  
    # 显示结果  
    cv2.imshow('YOLOv5 Object Detection', frame)  
  
    # 按 'q' 键退出  
    if cv2.waitKey(1) & 0xFF == ord('q'):  
        break  
  
# 释放资源  
cap.release()  
cv2.destroyAllWindows()  
print("检测已停止")

onnx版本

使用.onnx文件进行视觉识别，不需要yolov5环境，只需要按照numpy，opencv-python包即可使用

一般而言，我们使用居中版本的

这里默认是左上对齐的版本

import cv2  
import numpy as np  
import time  
  
# 初始化计时变量  
prev_time = time.time()  # 记录开始时间  
frame_count = 0  # 帧数计数器  
fps = 0  # 帧率（每秒帧数）  
font = cv2.FONT_HERSHEY_SIMPLEX  # 用于显示文字的字体  
  
# 开启摄像头  
cap = cv2.VideoCapture(0)  
if not cap.isOpened():  
    print("Error: Could not open video.")  
    exit()  
  
  
  
  
def calculate_fps():  
    """  
    计算并更新当前视频流的帧率(FPS)  
    该函数通过统计每秒钟处理的帧数来计算实时帧率，并更新全局变量中的FPS值。  
    每秒钟更新一次FPS计算结果，并在每次计算后重置帧计数器。  
    全局变量:  
        frame_count: 用于记录当前秒内处理的帧数  
        fps: 存储计算得到的帧率值  
        prev_time: 记录上一次计算FPS的时间点  
    """    global frame_count,fps,prev_time  
    # 计算FPS  
    frame_count += 1  # 每处理一帧，计数器加1  
    # 每过1秒计算一次FPS  
    current_time = time.time()  # 获取当前系统时间  
    time_diff = current_time - prev_time  # 计算距离上次计算FPS的时间差  
    if time_diff >= 1.0:  # 如果距离上次计算已经过了1秒  
        fps = frame_count / time_diff  # 计算FPS：帧数除以时间差  
        frame_count = 0  # 重置帧数计数器，准备开始下一秒的计数  
        prev_time = current_time  # 更新起始时间为当前时间，用于下一次计算  
  
  
  
# 加载ONNX模型  
net = cv2.dnn.readNetFromONNX("best_640.onnx")  
# 识别类别  
COCO_CLASSES = [  
    "cocl"  
]  
def opencv_onnx_yolov5(img, net_onnx,img_sizw,conf_thres, nms_thres):  
    """  
    使用OpenCV和ONNX运行YOLOv5模型进行目标检测  
    参数:  
        img: 输入图像  
        net_onnx: 加载的ONNX模型  
        img_sizw: 图像预处理时的目标尺寸  
        conf_thres: 置信度阈值，用于过滤低置信度检测  
        nms_thres: 非极大值抑制的阈值，用于去除重叠检测框  
    返回:  
        valid_detections: 有效检测的数量  
        boxes_buf: 检测框的坐标列表  
        class_ids_buf: 检测框对应的类别ID列表  
        scores_buf: 检测框的置信度分数列表  
    """    # 获取原始图像的高度和宽度  
    h0, w0 = img.shape[:2]  
    # ---------------- letterbox（左上角对齐） ----------------    # 缩放比例  
    scale = min(img_sizw / w0, img_sizw / h0)  # 计算缩放比例，保持宽高比  
    nw, nh = int(w0 * scale), int(h0 * scale)  # 计算缩放后的新宽度和高度  
    # 为什么使用缩放：因为当图像较大时不损失特征  
    img_resized = cv2.resize(img, (nw, nh))  # 调整图像大小  
    canvas = np.full((img_sizw, img_sizw, 3), 114, dtype=np.uint8)  # 使用灰色填充 (114, 114, 114)    canvas[:nh, :nw] = img_resized  # 左上角对齐  
    # ---------------- blob ----------------，将调整大小后的图像放置在画布左上角  
    # OpenCV DNN 模块的核心预处理函数，专门为深度学习模型准备输入数据  
    blob = cv2.dnn.blobFromImage(  
        canvas, 1 / 255.0, (img_sizw, img_sizw),  
        swapRB=True, crop=False  
    )  
    # ---------------- 加载模型 ----------------    # 设置模型输入数据  
    net_onnx.setInput(blob)  
    # ---------------- 推理 ----------------    # 执行推理,根据输入的blob数据计算得到输出结果  
    output = net_onnx.forward()  # (1, 25200, 85)  
    #  提取单张图像的预测结果,去掉批次维度，只取第一张图像的预测结果(一般我们也是一张一张的检测)  
    pred = output[0]  
    # ---------------- 后处理 ----------------    boxes = []  # 存储边界框坐标 [x1, y1, w, h]    scores = []  # 存储置信度分数  
    class_ids = []  # 存储类别ID  
    # 记录有效检测数量  
    valid_detections = 0  
    # 置信度筛选  
    for det in pred:  
        obj_conf = det[4]  
        if obj_conf < conf_thres:  
            continue  # 物体置信度  
        class_scores = det[5:]  # 获取所有类别的概率  
        class_id = np.argmax(class_scores)  # 找出最高分数的类别ID  # 如果物体置信度低于阈值，跳过此检测  
        score = obj_conf * class_scores[class_id]  # 综合置信度 = 物体置信度 × 类别概率  
        if score < conf_thres:  
            continue  
        # 模型输出是相对于640x640画布的坐标  
        cx, cy, w, h = det[:4]  # 如果综合置信度低于阈值，跳过此检测  
        # 转换到原始图像坐标（由于是左上角对齐，直接缩放即可）  
        x1 = (cx - w / 2) / scale  # 获取边界框的中心坐标和宽高  
        y1 = (cy - h / 2) / scale  
        x2 = (cx + w / 2) / scale  
        y2 = (cy + h / 2) / scale  
        # 确保坐标在图像范围内  
        x1 = max(0, min(x1, w0))  
        y1 = max(0, min(y1, h0))  
        x2 = max(0, min(x2, w0))  
        y2 = max(0, min(y2, h0))  
        w = x2 - x1  
        h = y2 - y1  
        # 过滤无效框  
        if w <= 0 or h <= 0 or w > w0 or h > h0:  
            continue  
        valid_detections += 1  
        boxes.append([int(x1), int(y1), int(w), int(h)])  # 边框  
        scores.append(float(score))  # 综合置信度  
        class_ids.append(class_id)  # 类别ID  
    boxes_buf=[] # 保存所有检测框  
    class_ids_buf=[] # 保存所有检测框的类别  
    scores_buf=[]# 保存所有检测框的置信度  
    # ---------------- NMS ----------------  
    if len(boxes) > 0:  
        # 非极大值抑制,只保留最可能的一个（或几个），去除冗余,置信度最高，保留  
        indices = cv2.dnn.NMSBoxes(boxes, scores, conf_thres, nms_thres)  
        # 处理可信结果  
        if len(indices) > 0:  
            for i_buf in indices.flatten():  
                x, y, w, h = boxes[i_buf]  
                # 确保坐标在图像范围内  
                x = max(0, min(x, w0 - 1))  
                y = max(0, min(y, h0 - 1))  
                w = min(w, w0 - x)  
                h = min(h, h0 - y)  
                boxes_buf.append([(x, y), (x + w, y + h)])  
                class_ids_buf.append(class_ids[i_buf])  
                scores_buf.append(scores[i_buf])  
        return len(indices),boxes_buf, class_ids_buf, scores_buf  
    return 0,[],[],[]  
  
  
  
while True:  
    # 获取视频图像  
    ret, frame = cap.read()  
    # 判断图像是否有效  
    if not ret:  
        break  
    num_onnx, boxes_onnx, id_onnx, scores_onnx = opencv_onnx_yolov5(frame, net, 640, 0.5, 0.5)  
    for i in range(num_onnx):  
        cv2.rectangle(frame, boxes_onnx[i][0], boxes_onnx[i][1], (0, 255, 0), 2)  
        cv2.putText(frame, str(COCO_CLASSES[id_onnx[i]]), boxes_onnx[i][0], cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255),2)  
        # print(scores_onnx[i])  
    # 计算FPS  
    calculate_fps()  
    # 将FPS显示在画面上  
    cv2.putText(frame, f"FPS: {int(fps)}", (10, 30), font, 1, (0, 255, 0), 2)  
    cv2.imshow('q', frame)  
    if cv2.waitKey(1) & 0xFF == ord('q'):  
        break  
  
cv2.destroyAllWindows()

这里是默认居中对齐的版本

import cv2
import numpy as np
import time

# ===================== FPS 统计 =====================
prev_time = time.time()
frame_count = 0
fps = 0
font = cv2.FONT_HERSHEY_SIMPLEX

def calculate_fps():
    global frame_count, fps, prev_time
    frame_count += 1
    current_time = time.time()
    time_diff = current_time - prev_time
    if time_diff >= 1.0:
        fps = frame_count / time_diff
        frame_count = 0
        prev_time = current_time

# ===================== 加载模型 =====================
net = cv2.dnn.readNetFromONNX("best_640.onnx")

# 你的类别（顺序要和训练一致）
COCO_CLASSES = ["cocl"]

def opencv_onnx_yolov5(img, net_onnx, img_size, conf_thres, nms_thres):
    """
    OpenCV DNN + ONNX YOLOv5 推理（按 YOLOv5 默认：居中 letterbox）
    返回:
        num: 通过 NMS 的框数量
        boxes_buf: [ [(x1,y1),(x2,y2)], ... ]
        class_ids_buf: [cls_id, ...]
        scores_buf: [score, ...]
    """
    h0, w0 = img.shape[:2]

    # ---------------- letterbox（居中 padding，YOLOv5 默认） ----------------
    scale = min(img_size / w0, img_size / h0)
    nw, nh = int(round(w0 * scale)), int(round(h0 * scale))
    img_resized = cv2.resize(img, (nw, nh), interpolation=cv2.INTER_LINEAR)

    canvas = np.full((img_size, img_size, 3), 114, dtype=np.uint8)
    dw = (img_size - nw) // 2
    dh = (img_size - nh) // 2
    canvas[dh:dh + nh, dw:dw + nw] = img_resized

    # ---------------- blob ----------------
    blob = cv2.dnn.blobFromImage(
        canvas, 1 / 255.0, (img_size, img_size),
        swapRB=True, crop=False
    )

    # ---------------- 推理 ----------------
    net_onnx.setInput(blob)
    output = net_onnx.forward()   # 常见: (1, 25200, 5+nc)
    pred = output[0]              # (N, 5+nc)

    boxes = []      # xywh for NMSBoxes
    scores = []
    class_ids = []

    # ---------------- 后处理 ----------------
    for det in pred:
        obj_conf = float(det[4])
        if obj_conf < conf_thres:
            continue

        class_scores = det[5:]
        class_id = int(np.argmax(class_scores))
        cls_conf = float(class_scores[class_id])
        score = obj_conf * cls_conf
        if score < conf_thres:
            continue

        cx, cy, w, h = det[:4]

        # canvas 上 xyxy
        x1 = cx - w / 2
        y1 = cy - h / 2
        x2 = cx + w / 2
        y2 = cy + h / 2

        # 去 padding，再除 scale 映射回原图
        x1 = (x1 - dw) / scale
        y1 = (y1 - dh) / scale
        x2 = (x2 - dw) / scale
        y2 = (y2 - dh) / scale

        # clamp
        x1 = max(0, min(x1, w0 - 1))
        y1 = max(0, min(y1, h0 - 1))
        x2 = max(0, min(x2, w0 - 1))
        y2 = max(0, min(y2, h0 - 1))

        bw = x2 - x1
        bh = y2 - y1
        if bw <= 1 or bh <= 1:
            continue

        boxes.append([int(x1), int(y1), int(bw), int(bh)])
        scores.append(float(score))
        class_ids.append(class_id)

    if len(boxes) == 0:
        return 0, [], [], []

    # ---------------- NMS ----------------
    indices = cv2.dnn.NMSBoxes(boxes, scores, conf_thres, nms_thres)
    if indices is None or len(indices) == 0:
        return 0, [], [], []

    boxes_buf = []
    class_ids_buf = []
    scores_buf = []

    for i in indices.flatten():
        x, y, w, h = boxes[i]
        x = max(0, min(x, w0 - 1))
        y = max(0, min(y, h0 - 1))
        w = min(w, w0 - x)
        h = min(h, h0 - y)

        boxes_buf.append([(x, y), (x + w, y + h)])
        class_ids_buf.append(class_ids[i])
        scores_buf.append(scores[i])

    return len(boxes_buf), boxes_buf, class_ids_buf, scores_buf

# ===================== 摄像头 =====================
cap = cv2.VideoCapture(0)
if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

# 如果你想限制摄像头分辨率，可取消注释
# cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
# cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

last_print_time = time.time()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    num_onnx, boxes_onnx, id_onnx, scores_onnx = opencv_onnx_yolov5(
        frame, net, 640, 0.5, 0.5
    )

    for i in range(num_onnx):
        cv2.rectangle(frame, boxes_onnx[i][0], boxes_onnx[i][1], (0, 255, 0), 2)
        cls_name = COCO_CLASSES[id_onnx[i]] if id_onnx[i] < len(COCO_CLASSES) else str(id_onnx[i])
        cv2.putText(frame, cls_name, boxes_onnx[i][0],
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

    # FPS
    calculate_fps()
    cv2.putText(frame, f"FPS: {int(fps)}", (10, 30), font, 1, (0, 255, 0), 2)

    #（可选）每秒打印一次 FPS，避免 print 拖慢
    if time.time() - last_print_time >= 1.0:
        print(f"FPS: {fps:.1f}, det: {num_onnx}")
        last_print_time = time.time()

    cv2.imshow('q', frame)

    # waitKey(1) 更流畅，不会限制帧率
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

终极指南：Flink SQL连接器版本管理从混乱到有序的升级之路

Apache Flink作为流处理领域的佼佼者，其SQL连接器的版本管理一直是开发者面临的核心挑战。本文将系统讲解Flink SQL连接器版本管理的最佳实践，帮助你轻松应对版本兼容性问题，实现从混乱到有序的升级之旅。## 连接器版本管理的常见痛点 😫在Flink应用开发中，连接器版本管理常常让开发者头疼不已。不同版本的连接器可能导致各种兼容性问题，例如API变更、功能差异甚至运行时错误。

腾讯云开发者社区

Elasticsearch复杂数据类型终极指南：从入门到精通

Elasticsearch作为功能强大的搜索引擎，支持多种复杂数据类型，让开发者能够灵活处理各种结构化和非结构化数据。本文将带你全面了解Elasticsearch中的复杂数据类型，从基础概念到实际应用，助你轻松掌握数据建模的核心技巧。## 内部对象：构建层级化数据结构在Elasticsearch中，对象类型（Object）是最基础的复杂数据类型之一，用于表示具有嵌套关系的数据。例如，我们可

腾讯云开发者社区

如何快速搭建Neon无服务器PostgreSQL：面向初学者的完整指南

Neon是一款革命性的无服务器PostgreSQL解决方案，它通过分离存储和计算层，实现了自动扩缩容、类代码式数据库分支以及零级扩展能力。本指南将帮助你从零开始搭建Neon开发环境，体验这款创新数据库的强大功能。## 准备工作：环境要求与依赖项在开始搭建Neon环境前，请确保你的系统满足以下要求：- Linux操作系统（推荐Ubuntu 20.04+或Debian 11+）- Git