Nvidia jetson使用tensorrt yolov8的 int8和mix precision混合量化的
具体可以看这个pull request:其他:转出来的模型可以正常inference,INT8的FPS,mix precision的FPS都提升了很多的,目测准确率下降了点但不多在自己train的yolov8x.pt模型上试验tensorrt,用了10000个图片算平均inference时间,结果是FP16的engine的FPS是72左右,pytorch的FPS是36左右,INT8的engine的
Nvidia jetson使用tensorrt yolov8的 int8和mix precision混合量化的
其他: tensorrt 10转yolov8模型engine和推理inference - 知乎 (zhihu.com)
tensorrt 10.0.06在win10安装以及版本的api变更 - 知乎 (zhihu.com)
转出来的模型可以正常inference,INT8的FPS,mix precision的FPS都提升了很多的,目测准确率下降了点但不多
在自己train的yolov8x.pt模型上试验tensorrt,用了10000个图片算平均inference时间,结果是FP16的engine的FPS是72左右,pytorch的FPS是36左右,INT8的engine的FPS是96左右,mix precision的engine的FPS是106左右
这边使用了官方的模型来试验,发现结果还是挺好的
INT8的模型压缩2x2倍,mix precision的模型压缩倍数类似,速度也类似的
INT8
int8就是权重和输入都采用int8的数字,int8的数值也就是在[-128, 127]之间
int8需要打开一个开关 half=False, int8 = True
mix precision
mix precision:就是某些层保证是FP16即float16,某些层保证是INT8,不同的层采用不同的压缩方式
一般来说的话,前几层、最后几层应该保证是FP16,其他层可以是INT8
下面就采用了前两层、最后一层保证是FP16,前两层和最后一层使用float16可以保证准确率下降少很多,前面几层和最后几层还是比较重要的,QAT的几个paper都提到了前面几层和后面几层尽量量化bit数不能太低,最好用float16或者float32。
mix precision的设置方式,就是依靠层的类型,以及Onnx里面层的名称来设置的,一般mix precision保证某些卷积层convolution是FP16。
mix precision需要打开两个开关 half=True, int8 = True
codes
运行以前还需要按照这个pull request做相应的修改
https://github.com/ultralytics/ultralytics/pull/9969
实际转换用到的codes,可以将yolov8x.pt修改到yolov8l.pt或者yolov8n.pt等
int8转换和mix precision转换都需要准备校准图片, https://github.com/NVIDIA/TensorRT/blob/main/samples/python/detectron2
1、下载 coco的val2017然后解压就行,这个就是下面的calib_input = r’E:\work\codeRepo\deploy\jz\val2017’
2、codes需要按照下面的方式
3、运行以前还需要注释掉ultralytics\cfg\init.py文件中的第322行raise SyntaxError(string + CLI_HELP_MSG) from e,防止报错的
转模型
模型转换的时候export,INT8设置 half=False, int8 = True ;mix precision设置 half = True, int8 = True
inference的时候,不需要配置这两个开关,也就是两者都是 half = False, int8 = False
官方转出来的模型,可以在这里下载
https://www.alipan.com/s/FdfFoPDGCWH
TensorRT/samples/python/detectron2 at main · NVIDIA/TensorRT
https://github.com/NVIDIA/TensorRT/blob/main/samples/python/efficientdet
设置batch size
根据英文文档, https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#enable_int8_c,export转换的时候,batch_size一般越大越好,这样校准的时候就越准确。而且要保证数据和train的时候相类似,尽量打乱顺序的。To avoid this issue, calibrate with as large a single batch as possible, and ensure that calibration batches are well randomized and have similar distribution.
infer
inference的时候,设置好转换出来的engine档案,然后正常使用model.predict就可以,不需要配置额外的参数,不需要其他操作的
而且输入的图片最好是float32的,输出也配置到float32,不过默认就是的,所以不需要额外配置的,保持下面的默认就行
import os
import gc
import sys
sys.path.append(r'E:\work\codeRepo\deploy\common\zj\ultralytics')
from ultralytics import YOLO # newest version from "git clone and git pull"
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
'''
Platform: Window11
Ultralytics YOLOv8.1.44 Python-3.9.18 torch-2.2.1+cu118 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)
onnx 1.16.0 opset 17
TensorRT 10.0.0b6:
https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.0/zip/TensorRT-10.0.0.6.Windows10.win10.cuda-11.8.zip
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
'''
if __name__ == '__main__':
file = r'yolov8x.pt'
model = YOLO(file) # load a pretrained model (recommended for training)
calib_input = r'E:\work\codeRepo\deploy\jz\val2017'
cache_file = r'E:\work\codeRepo\deploy\jz\calibration.cache'
if os.path.exists(cache_file):
os.remove(cache_file)
'''
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#enable_int8_c
To avoid this issue, calibrate with as large a single batch as possible,
and ensure that calibration batches are well randomized and have similar distribution.
'''
results0 = model.export(format='engine', simplify=True,
half=True,
int8=True,
calib_batch_size=20,
calib_num_images=len(os.listdir(calib_input)),
calib_input=calib_input,
cache_file=cache_file,
device='cuda:0')
del model
gc.collect()
model = YOLO(r"E:\work\%s"%(file.replace(".pt", ".engine")))
result = model.predict(
'https://ultralytics.com/images/bus.jpg',
save_dir=r'.//',
save=True)
将下面的 ultralytics/engine/tensorrt_int8/calibrator.py放入相应的地方
做了相应的修改,主要就是去掉了cuda的部分,直接使用了pytorch来分配显存和数组
根据英文文档, Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
可以修改不同的校准类,像:trt.IInt8EntropyCalibrator2, trt.IInt8MinMaxCalibrator,trt.IInt8EntropyCalibrator,trt.IInt8LegacyCalibrator
下 面使用的是trt.IInt8MinMaxCalibrator,也可以尝试使用trt.IInt8EntropyCalibrator2,根据英文文档,trt.IInt8EntropyCalibrator2比较适合CNN网络,IInt8MinMaxCalibrator比较适合bert或者NLP的网 络。
**IInt8EntropyCalibrator2** Entropy calibration chooses the tensor’s scale factor to optimize the quantized tensor’s information-theoretic content, and usually suppresses outliers in the distribution. This is the current and recommended entropy calibrator and is required for DLA. Calibration happens before Layer fusion by default. Calibration batch size may impact the final result. It is recommended for CNN-based networks.
IInt8MinMaxCalibrator This calibrator uses the entire range of the activation distribution to determine the scale factor. It seems to work better for NLP tasks. Calibration happens before Layer fusion by default. This is recommended for networks such as NVIDIA BERT (an optimized version of Google’s official implementation).
IInt8EntropyCalibrator This is the original entropy calibrator. It is less complicated to use than the LegacyCalibrator and typically produces better results. Calibration batch size may impact the final result. Calibration happens after Layer fusion by default.
IInt8LegacyCalibrator This calibrator is for compatibility with TensorRT 2.0 EA. This calibrator requires user parameterization and is provided as a fallback option if the other calibrators yield poor results. Calibration happens after Layer fusion by default. You can customize this calibrator to implement percentile max, for example, 99.99% percentile max is observed to have best accuracy for NVIDIA BERT and NeMo ASR model QuartzNet
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientdet/build_engine.py
import os
import sys
import tensorrt as trt
from ultralytics.utils import LOGGER
sys.path.insert(1, os.path.join(os.path.dirname(os.path.realpath(__file__)), os.pardir))
from ultralytics.engine.tensorrt_int8.image_batcher import ImageBatcher
log = LOGGER
# class EngineCalibrator(trt.IInt8EntropyCalibrator2):
class EngineCalibrator(trt.IInt8MinMaxCalibrator):
# class EngineCalibrator(trt.IInt8EntropyCalibrator):
# class EngineCalibrator(trt.IInt8LegacyCalibrator):
"""Implements the INT8 Entropy Calibrator 2 or IInt8MinMaxCalibrator."""
def __init__(self, cache_file, device="cuda:0"):
"""
:param cache_file: The location of the cache file.
"""
super().__init__()
self.cache_file = cache_file
self.image_batcher = None
self.batch_allocation = None
self.batch_generator = None
self.device = device
def set_image_batcher(self, image_batcher: ImageBatcher):
"""
Define the image batcher to use, if any.
If using only the cache file, an image batcher doesn't need to be defined.
:param image_batcher: The ImageBatcher object
"""
self.image_batcher = image_batcher
self.batch_generator = self.image_batcher.get_batch()
def get_batch_size(self):
"""
Overrides from trt.IInt8EntropyCalibrator2.
Get the batch size to use for calibration.
:return: Batch size.
"""
if self.image_batcher:
return self.image_batcher.batch_size
return 1
def get_batch(self, names):
"""
Overrides from trt.IInt8EntropyCalibrator2.
Get the next batch to use for calibration, as a list of device memory pointers.
:param names: The names of the inputs, if useful to define the order of inputs.
:return: A list of int-casted memory pointers.
"""
if not self.image_batcher:
return None
try:
batch = next(self.batch_generator)
LOGGER.info(
"Calibrating image {} / {}".format(self.image_batcher.image_index, self.image_batcher.num_images)
)
# common.memcpy_host_to_device(self.batch_allocation, np.ascontiguousarray(batch))
# return [int(self.batch_allocation)]
return [int(batch.data_ptr())]
# return [batch.data_ptr()]
except StopIteration:
LOGGER.info("Finished calibration batches")
return None
def read_calibration_cache(self):
"""
Overrides from trt.IInt8EntropyCalibrator2.
Read the calibration cache file stored on disk, if it exists.
:return: The contents of the cache file, if any.
"""
if self.cache_file is not None and os.path.exists(self.cache_file):
with open(self.cache_file, "rb") as f:
LOGGER.info("Using calibration cache file: {}".format(self.cache_file))
return f.read()
def write_calibration_cache(self, cache):
"""
Overrides from trt.IInt8EntropyCalibrator2.
Store the calibration cache to a file on disk.
:param cache: The contents of the calibration cache to store.
"""
if self.cache_file is None:
return
with open(self.cache_file, "wb") as f:
LOGGER.info("Writing calibration cache data to: {}".format(self.cache_file))
f.write(cache)
将下面的 ultralytics/engine/tensorrt_int8/image_batcher.py放入相应的地方
做了相应的修改,主要就是去掉了cuda的部分,直接使用了pytorch来分配显存和数组
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# https://github.com/NVIDIA/TensorRT/blob/release/10.0/samples/python/efficientnet/build_engine.py
import torch
import cv2
import os
import sys
import numpy as np
from PIL import Image
from ultralytics.data.augment import LetterBox
class ImageBatcher:
"""Creates batches of pre-processed images."""
def __init__(
self,
input,
shape,
dtype,
max_num_images=None,
exact_batches=False,
config_file=None,
shuffle_files=True,
device="cuda:0",
):
"""
:param input: The input directory to read images from.
:param shape: The tensor shape of the batch to prepare, either in NCHW or NHWC format.
:param dtype: The (numpy) datatype to cast the batched data to.
:param max_num_images: The maximum number of images to read from the directory.
:param exact_batches: This defines how to handle a number of images that is not an exact multiple of the batch
size. If false, it will pad the final batch with zeros to reach the batch size. If true, it will *remove* the
last few images in excess of a batch size multiple, to guarantee batches are exact (useful for calibration).
:param config_file: The path pointing to the Detectron 2 yaml file which describes the model.
"""
# Find images in the given input path.
input = os.path.realpath(input)
self.images = []
extensions = [".jpg", ".jpeg", ".png", ".bmp", ".ppm"]
def is_image(path):
return os.path.isfile(path) and os.path.splitext(path)[1].lower() in extensions
if os.path.isdir(input):
self.images = [os.path.join(input, f) for f in os.listdir(input) if is_image(os.path.join(input, f))]
self.images.sort()
if shuffle_files:
np.random.seed(999999999)
np.random.shuffle(self.images)
elif os.path.isfile(input):
if is_image(input):
self.images.append(input)
self.num_images = len(self.images)
if self.num_images < 1:
print("No valid {} images found in {}".format("/".join(extensions), input))
sys.exit(1)
# Handle Tensor Shape.
if dtype == np.float32:
self.dtype = torch.float32
elif dtype == np.float16:
self.dtype = torch.float16
elif dtype == np.int8:
self.dtype = torch.int8
self.shape = shape
assert len(self.shape) == 4
self.batch_size = shape[0]
assert self.batch_size > 0
self.format = None
self.width = -1
self.height = -1
if self.shape[1] == 3:
self.format = "NCHW"
self.height = self.shape[2]
self.width = self.shape[3]
elif self.shape[3] == 3:
self.format = "NHWC"
self.height = self.shape[1]
self.width = self.shape[2]
assert all([self.format, self.width > 0, self.height > 0])
# Adapt the number of images as needed.
if max_num_images and 0 < max_num_images < len(self.images):
self.num_images = max_num_images
if exact_batches:
self.num_images = self.batch_size * (self.num_images // self.batch_size)
if self.num_images < 1:
print("Not enough images to create batches")
sys.exit(1)
self.images = self.images[0 : self.num_images]
# Subdivide the list of images into batches.
self.num_batches = 1 + int((self.num_images - 1) / self.batch_size)
self.batches = []
for i in range(self.num_batches):
start = i * self.batch_size
end = min(start + self.batch_size, self.num_images)
self.batches.append(self.images[start:end])
# Indices.
self.image_index = 0
self.batch_index = 0
self.newshape = [self.height, self.width]
self.device = device
self.LetterBox = LetterBox(self.newshape, scaleup=False)
def preprocess_image(self, image_path):
"""
The image preprocessor loads an image from disk and prepares it as needed for batching. This includes padding,
resizing, normalization, data type casting, and transposing.
This Image Batcher implements one algorithm for now:
* Resizes and pads the image to fit the input size.
:param image_path: The path to the image on disk to load.
:return: Two values: A numpy array holding the image sample, ready to be contacatenated into the rest of the
batch, and the resize scale used, if any.
"""
image = Image.open(image_path)
image = image.convert(mode="RGB")
# Pad with mean values of COCO dataset, since padding is applied before actual model's
# preprocessor steps (Sub, Div ops), we need to pad with mean values in order to reverse
# the effects of Sub and Div, so that padding after model's preprocessor will be with actual 0s.
image = np.asarray(image, dtype=np.float32).copy()
image = self.LetterBox(labels=None, image=image)
# cv2.imwrite(r'E:\work\codeRepo\deploy\jz\image_batcher\%d.jpg'%np.random.randint(999999999), image)
# Change HWC -> CHW.
image = np.transpose(image, (2, 0, 1)) / 255.0
image = torch.from_numpy(image)
image = torch.tensor(image, dtype=self.dtype)
return image
def get_batch(self):
"""
Retrieve the batches.
This is a generator object, so you can use it within a loop as:
for batch, images in batcher.get_batch():
...
Or outside of a batch with the next() function.
:return: A generator yielding three items per iteration: a numpy array holding a batch of images, the list of
paths to the images loaded within this batch, and the list of resize scales for each image in the batch.
"""
for i, batch_images in enumerate(self.batches):
batch_data = torch.zeros(tuple(self.shape), dtype=self.dtype, device=self.device)
for j, image in enumerate(self.batches[self.batch_index]):
self.image_index += 1
batch_data[j] = self.preprocess_image(image)
self.batch_index += 1
yield batch_data
然后还需要修改文件 ultralytics/engine/exporter.py
@try_export
def export_engine(self, prefix=colorstr("TensorRT:")):
"""YOLOv8 TensorRT export https://developer.nvidia.com/tensorrt."""
assert self.im.device.type != "cpu", "export running on CPU but must be on GPU, i.e. use 'device=0'"
self.args.simplify = True
f_onnx, _ = self.export_onnx() # run before trt import https://github.com/ultralytics/ultralytics/issues/7016
try:
import tensorrt as trt # noqa
except ImportError:
if LINUX:
check_requirements("nvidia-tensorrt", cmds="-U --index-url https://pypi.ngc.nvidia.com")
import tensorrt as trt # noqa
check_version(trt.__version__, "7.0.0", hard=True) # require tensorrt>=7.0.0
LOGGER.info(f"\n{prefix} starting export with TensorRT {trt.__version__}...")
is_trt10 = int(trt.__version__.split(".")[0]) >= 10 # is TensorRT >= 10
assert Path(f_onnx).exists(), f"failed to export ONNX file: {f_onnx}"
f = self.file.with_suffix(".engine") # TensorRT engine file
logger = trt.Logger(trt.Logger.INFO)
if self.args.verbose:
logger.min_severity = trt.Logger.Severity.VERBOSE
builder = trt.Builder(logger)
config = builder.create_builder_config()
workspace = int(self.args.workspace * (1 << 30))
if is_trt10:
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace)
else: # TensorRT versions 7, 8
config.max_workspace_size = workspace
flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
network = builder.create_network(flag)
parser = trt.OnnxParser(network, logger)
if not parser.parse_from_file(f_onnx):
raise RuntimeError(f"failed to load ONNX file: {f_onnx}")
inputs = [network.get_input(i) for i in range(network.num_inputs)]
outputs = [network.get_output(i) for i in range(network.num_outputs)]
for inp in inputs:
LOGGER.info(f'{prefix} input "{inp.name}" with shape{inp.shape} {inp.dtype}')
for out in outputs:
LOGGER.info(f'{prefix} output "{out.name}" with shape{out.shape} {out.dtype}')
if self.args.dynamic:
shape = self.im.shape
if shape[0] <= 1:
LOGGER.warning(f"{prefix} WARNING ⚠️ 'dynamic=True' model requires max batch size, i.e. 'batch=16'")
profile = builder.create_optimization_profile()
min_shape = (1, shape[1], 32, 32) # minimum input shape
opt_shape = (max(1, shape[0] // 2), *shape[1:]) # optimal input shape
max_shape = (*shape[:2], *(max(1, self.args.workspace) * d for d in shape[2:])) # max input shape
for inp in inputs:
profile.set_shape(inp.name, min_shape, opt_shape, max_shape)
config.add_optimization_profile(profile)
half = builder.platform_has_fast_fp16 and self.args.half
int8 = builder.platform_has_fast_int8 and self.args.int8
mix_precision = half and int8
if mix_precision:
# https://github.com/NVIDIA/TensorRT/tree/main/samples/python/efficientdet
"""
Experimental precision mode.
Enable mixed-precision mode. When set, the layers defined here will be forced to FP16 to maximize INT8
inference accuracy, while having minimal impact on latency.
"""
config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
config.set_flag(trt.BuilderFlag.DIRECT_IO)
config.set_flag(trt.BuilderFlag.REJECT_EMPTY_ALGORITHMS)
# All convolution operations in the first four blocks of the graph are pinned to FP16.
# These layers have been manually chosen as they give a good middle-point between int8 and fp16
# accuracy in COCO, while maintining almost the same latency as a normal int8 engine.
# To experiment with other datasets, or a different balance between accuracy/latency, you may
# add or remove blocks.
collect = []
for i in range(network.num_layers):
layer = network.get_layer(i)
collect.append([layer.name, layer])
for i in range(network.num_layers):
layer = network.get_layer(i)
if (
layer.type == trt.LayerType.CONVOLUTION
and any(
[
"/model.0/" in layer.name,
"/model.1/" in layer.name,
# "/model.2/" in layer.name,
# "/model.3/" in layer.name,
# "/model.4/m.0/" in layer.name,
# "/model.4/m.1/" in layer.name,
# "/model.22/cv2.0/" in layer.name,
# "/model.22/cv2.1/" in layer.name,
# "/model.22/cv2.2/" in layer.name,
# "/model.22/cv3.0/" in layer.name,
# "/model.22/cv3.1/" in layer.name,
# "/model.22/cv3.2/" in layer.name,
# "/model.22/cv2.0/cv2.0.2/Conv" in layer.name,
# "/model.22/cv3.0/cv3.0.2/Conv" in layer.name,
# "/model.22/cv2.1/cv2.1.2/Conv" in layer.name,
# "/model.22/cv3.1/cv3.1.2/Conv" in layer.name,
# "/model.22/cv2.2/cv2.2.2/Conv" in layer.name,
# "/model.22/cv3.2/cv3.2.2/Conv" in layer.name,
"/model.22/dfl/conv/Conv" in layer.name,
]
)
) or (
any(
[
# "/model.22/Sigmoid" in layer.name,
# "/model.22/Mul_2" in layer.name,
]
)
):
network.get_layer(i).precision = trt.DataType.HALF
LOGGER.info("Mixed-Precision Layer {} set to HALF STRICT data type".format(layer.name))
LOGGER.info(f"{prefix} building a Mix Precision with FP16 and INT8 engine as {f}")
if half:
LOGGER.info(f"{prefix} building FP16 engine as {f}")
config.set_flag(trt.BuilderFlag.FP16)
if int8:
# https://github.com/NVIDIA/TensorRT/tree/main/samples/python/efficientdet
LOGGER.info(f"{prefix} building INT8 engine as {f}")
from ultralytics.engine.tensorrt_int8.calibrator import EngineCalibrator
from ultralytics.engine.tensorrt_int8.image_batcher import ImageBatcher
"""
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#enable_int8_c
To avoid this issue, calibrate with as large a single batch as possible,
and ensure that calibration batches are well randomized and have similar distribution.
"""
# The batch size for the calibration process, default:
calib_batch_size = self.args.calib_batch_size
# The maximum number of images to use for calibration, default: len(os.listdir(calib_input))
calib_num_images = self.args.calib_num_images
# The directory holding images to use for calibration
calib_input = self.args.calib_input
cache_file = self.args.cache_file
if calib_num_images == None:
calib_num_images = len(os.listdir(calib_input))
config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = EngineCalibrator(cache_file)
if cache_file is None or not os.path.exists(cache_file):
calib_shape = [calib_batch_size] + list(inputs[0].shape[1:])
calib_dtype = trt.nptype(inputs[0].dtype)
imagebatcher = ImageBatcher(
calib_input,
calib_shape,
calib_dtype,
max_num_images=calib_num_images,
exact_batches=True,
shuffle_files=True,
)
imagebatcher.newshape = inputs[0].shape[2:]
config.int8_calibrator.set_image_batcher(imagebatcher)
# Free CUDA memory
del self.model
torch.cuda.empty_cache()
# Write file
build = builder.build_serialized_network if is_trt10 else builder.build_engine
with build(network, config) as engine, open(f, "wb") as t:
# Metadata
meta = json.dumps(self.metadata)
t.write(len(meta).to_bytes(4, byteorder="little", signed=True))
t.write(meta.encode())
# Model
t.write(engine if is_trt10 else engine.serialize())
return f, None
更多推荐
所有评论(0)