(最全)PyTorch神经网络打印存储所有权重+状态+激活值(运行时中间值)+量化权重和激活
假设已经有模型model和pt文件了,在当前目录下新建weights文件夹,运行最后三行代码,就可以得到模型的权重(文本形式和二进制形式)很多时候嵌入式或者新硬件需要纯净的权重模型和激活值(运行时中间值),本文提供一种最简洁的方法。作为一个整体,目前没办法拆开来看其内部的中间值。和以下等价(不需要recursive了)对于二进制形式的文件,可以通过。查看其对应的浮点数值。
·
很多时候嵌入式或者新硬件需要纯净的权重模型和激活值(运行时中间值),本文提供一种最简洁的方法。
假设已经有模型model和pt文件了,在当前目录下新建weights文件夹,运行这段代码,就可以得到模型的权重(文本形式和二进制形式)。注意一定要使用state_dict()
,不要用named_parameters()
和named_children()
等等,有些数据比如BN的running_mean
和running_var
就不是parameter,但在dict里。 在推理过程中,这些不是权重的数据也是必要的!
model.load_state_dict(state_dict)
global_index = 0
for name, state in model.state_dict().items():
print(name, state.size())
print(state.numpy(),file=open(f"weights/{global_index}-{name}.txt", "w"))
state.numpy().tofile(f"weights/{global_index}-{name}.bin")
global_index += 1
对于二进制形式的文件,可以通过od -t f4 <binary file name>
查看其对应的浮点数值。f4
表示fp32.
打印forward的中间值:(这么复杂也是必要的)
global_index = 0
def hook_fn(module, input, output):
global global_index
module_name = str(module)
module_name=module_name.replace(" ", "")
module_name=module_name.replace("\n", "")
# print(name)
intermediate_outputs = {}
# input is a tuple, output is a tensor
for i, inp in enumerate(input):
intermediate_outputs[f"{global_index}-{module_name}-input-{i}"] = inp
intermediate_outputs[f"{global_index}-{module_name}-output"] = output
module_name = module_name[0:200] # make sure full path <= 255
print(intermediate_outputs)
print(f"Size input:",end=" ")
if(type(input) == tuple):
for i, inp in enumerate(input):
if type(inp) == torch.Tensor:
print(f"{i}-th Size: {inp.size()}", end=", ")
inp.numpy().tofile(f"activations/{global_index}-{module_name}-input-{i}.bin")
else:
print(f"{i}-th : {inp}", end=", ")
elif type(input) == torch.Tensor:
print(f"Size: {input.size()}")
input.numpy().tofile(f"activations/{global_index}-{module_name}-input.bin")
print(f"Size output: {output.size()}")
output.numpy().tofile(f"activations/{global_index}-{module_name}-output.bin")
global_index += 1
def register_hooks(model):
for name, layer in model.named_children():
# print(name, layer) # dump all layers, > layers.txt
# Register the hook to the current layer
layer.register_forward_hook(hook_fn)
# Recursively apply the same to all submodules
register_hooks(layer)
register_hooks(model)
其中regster_hooks
和以下等价(不需要recursive了)
def register_hooks(model):
for name, layer in model.named_modules():
# print(name, layer) # dump all layers
layer.register_forward_hook(hook_fn)
接下来还可能进行权重和激活的量化(动态量化)。
# quantize data per tensor
import numpy as np
import os
import re
# read from weights dir
weight_binaries = os.listdir("weights")
# filter files containing 'weight'
weight_binaries = [w for w in weight_binaries if w.endswith('weight.bin')]
print(weight_binaries)
num_bit = 8
# notice this is symetric quantization
# quantize weights
for w in weight_binaries:
# get the file name prefix
weight = np.fromfile(f"weights/{w}", dtype=np.float32)
max_weight_abs = np.max(np.abs(weight))
fmax = 2**(num_bit-1)-1
scale= (max_weight_abs / fmax).astype(np.float32)
print(scale)
weight = np.round(weight/scale).astype(np.int8)
print(weight)
# insert "q" in the file name before ".bin"
prefix = os.path.splitext(w)[0]
extension = os.path.splitext(w)[1]
weight.tofile(f"weights/{prefix}-q{extension}")
scale.tofile(f"weights/{prefix}-s{extension}")
# quantize activations
activation_binaries = os.listdir("activations")
# regular pattern input-<n>.bin
input_activation = [a for a in activation_binaries if bool(re.search(r'-input-(\d+)\.bin', a))]
output_activation = [a for a in activation_binaries if a.endswith('output.bin')]
for ia in input_activation:
# get the file name prefix
acts = np.fromfile(f"activations/{ia}", dtype=np.float32)
# only quantize GEMM and CONV layers right now
if '-Linear' not in ia and '-Conv' not in ia:
continue
max_acts_abs = np.max(np.abs(acts))
fmax = 2**(num_bit-1)-1
# min_weight = np.min(acts)
scale= (max_acts_abs / fmax).astype(np.float32)
print(scale)
acts = np.round(acts/scale).astype(np.int8)
print(acts)
# insert "q" in the file name before ".bin"
prefix = os.path.splitext(ia)[0]
extension = os.path.splitext(ia)[1]
acts.tofile(f"activations/{prefix}-q{extension}")
scale.tofile(f"activations/{prefix}-s{extension}")
for oa in output_activation:
# get the file name prefix
acts = np.fromfile(f"activations/{oa}", dtype=np.float32)
if '-Linear' not in oa and '-Conv' not in oa:
continue
max_acts_abs = np.max(np.abs(acts))
fmax = 2**(num_bit-1)-1
scale= (max_acts_abs / fmax).astype(np.float32)
print(scale)
acts = np.round(acts/scale).astype(np.int8)
print(acts)
# insert "q" in the file name before ".bin"
prefix = os.path.splitext(oa)[0]
extension = os.path.splitext(oa)[1]
acts.tofile(f"activations/{prefix}-q{extension}")
scale.tofile(f"activations/{prefix}-s{extension}")
更多推荐
已为社区贡献2条内容
所有评论(0)