Verilog模块递归:参数化加法器树
使用Verilog递归实现加法器树的参数化
文章目录
前言
在GitHub上看到一个使用递归实现的参数化加法器树,很受启发,在此依葫芦画瓢。
GitHub链接: https://github.com/raiker/tarsier/tree/master/src/
原代码使用System Verilog编写,这里用Verilog写一下。
该作者按照是否有符号、是否加入流水线将加法器树分为四种情况,考虑到有无符号的写法基本一样,这里只写一下有符号的,无符号的把signed去掉就可以了。
一、无流水线加法器树
这种情况比较简单,直接上代码。
原SV代码:
/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
module AdderTree(in_addends, out_sum);
parameter DATA_WIDTH = 8;
parameter LENGTH = 42;
localparam OUT_WIDTH = DATA_WIDTH + $clog2(LENGTH);
localparam LENGTH_A = LENGTH / 2;
localparam LENGTH_B = LENGTH - LENGTH_A;
localparam OUT_WIDTH_A = DATA_WIDTH + $clog2(LENGTH_A);
localparam OUT_WIDTH_B = DATA_WIDTH + $clog2(LENGTH_B);
input signed [DATA_WIDTH-1:0] in_addends [LENGTH];
output signed [OUT_WIDTH-1:0] out_sum;
generate
if (LENGTH == 1) begin
assign out_sum = in_addends[0];
end else begin
wire signed [OUT_WIDTH_A-1:0] sum_a;
wire signed [OUT_WIDTH_B-1:0] sum_b;
logic signed [DATA_WIDTH-1:0] addends_a [LENGTH_A];
logic signed [DATA_WIDTH-1:0] addends_b [LENGTH_B];
always_comb begin
for (int i = 0; i < LENGTH_A; i++) begin
addends_a[i] = in_addends[i];
end
for (int i = 0; i < LENGTH_B; i++) begin
addends_b[i] = in_addends[i + LENGTH_A];
end
end
//divide set into two chunks, conquer
AdderTree #(
.DATA_WIDTH(DATA_WIDTH),
.LENGTH(LENGTH_A)
) subtree_a (
.in_addends(addends_a),
.out_sum(sum_a)
);
AdderTree #(
.DATA_WIDTH(DATA_WIDTH),
.LENGTH(LENGTH_B)
) subtree_b (
.in_addends(addends_b),
.out_sum(sum_b)
);
assign out_sum = sum_a + sum_b;
end
endgenerate
endmodule
这段代码的功能是求LENGTH个位宽为DATA_WIDTH的数的和。在LENGTH>1时,将LENGTH个数分为两块,每块分别递归例化一个同样的子模块,模块的输出为两个子模块输出的和;在LENGTH==1时,递归终止,模块的输出等于输入。
Verilog代码:
module AdderTree(in_addends, out_sum);
parameter DATA_WIDTH = 8;
parameter LENGTH = 4;
localparam OUT_WIDTH = DATA_WIDTH + $clog2(LENGTH);
localparam LENGTH_A = LENGTH / 2;
localparam LENGTH_B = LENGTH - LENGTH_A;
localparam OUT_WIDTH_A = DATA_WIDTH + $clog2(LENGTH_A);
localparam OUT_WIDTH_B = DATA_WIDTH + $clog2(LENGTH_B);
input signed [DATA_WIDTH*LENGTH-1:0] in_addends;
output signed [OUT_WIDTH-1:0] out_sum;
generate
if (LENGTH == 1) begin
assign out_sum = in_addends;
end else begin
wire signed [OUT_WIDTH_A-1:0] sum_a;
wire signed [OUT_WIDTH_B-1:0] sum_b;
reg signed [DATA_WIDTH*LENGTH_A-1:0] addends_a;
reg signed [DATA_WIDTH*LENGTH_B-1:0] addends_b;
always@(*) begin
addends_a = in_addends[DATA_WIDTH*LENGTH_A-1:0];
addends_b = in_addends[DATA_WIDTH*LENGTH-1:DATA_WIDTH*LENGTH_A];
end
//divide set into two chunks, conquer
AdderTree #(
.DATA_WIDTH(DATA_WIDTH),
.LENGTH(LENGTH_A)
) subtree_a (
.in_addends(addends_a),
.out_sum(sum_a)
);
AdderTree #(
.DATA_WIDTH(DATA_WIDTH),
.LENGTH(LENGTH_B)
) subtree_b (
.in_addends(addends_b),
.out_sum(sum_b)
);
assign out_sum = sum_a + sum_b;
end
endgenerate
endmodule
把LENGTH改成4,综合一下:
并行计算高位两数的和与低位两数的和,再将所得结果相加,没有产生多余的逻辑。
二、流水线加法器树
当待求和的数据较多时,往往需要在加法器之间插入寄存器形成流水线。很容易想到,将LENGTH>1时的输出组合逻辑
assign out_sum = sum_a + sum_b;
改为时序逻辑
always@(posedge clk)begin
out_sum <= sum_a + sum_b;
end
即可实现在每个加法器后面插入一个寄存器。
但是这样编译器会报错,因为我们在同一个模块中既把out_sum当做wire类型来用,又把它当做reg类型来用。在SV中则没有这个问题,因为SV中reg和wire被统一为logic,编译器根据logic的使用方式判断它是线还是寄存器。在Verilog中我们只能绕个弯子:
reg signed [OUT_WIDTH-1:0] out_sum_buf;
assign out_sum = out_sum_buf;
always@(posedge clk)begin
out_sum_buf <= sum_a + sum_b;
end
如果LENGTH是2的整数次幂,这样就足够了。但是当LENGTH不是2的整数次幂时,就会出现下面这种情况(LENGTH==3):
当LENGTH不是2的整数次幂时,每条路径上的加法器数目可能会不同。在纯组合逻辑中,它会导致输出产生毛刺,我们可以通过在输出端加一级寄存器将其去除。而在时序逻辑中,如果我们只是简单地在每个加法器后面插一个寄存器,会直接导致时序错位。
以上图为例,输入in_addends的高16位(高位数)经过一级寄存器后到达第二个加法器,而低8位(低位数)直接到达第二个加法器,于是输出变成了输入的低位数与上一周期的高位数的和。
正确的电路应当是这样的:
在LENGTH等于1时,同样需要判断是否需要插入寄存器。好在判断的逻辑并不复杂。考虑到在递归开始前我们就可以确定每条路径应当有几级寄存器,即 log 2 \log_2 log2(LENGTH),另外,子模块的寄存器级数比它的上一级少1。我们可以增加一个参数DELAY_STAGE,在最高层它的值为$clog2(LENGTH),每次例化子模块时使用的值为DELAY_STAGE-1。这样,在LENGTH等于1时,DELAY_STAGE的数值就是该模块还需要插入的寄存器级数。
插入DELAY_STAGE级寄存器可以用移位寄存器实现:
if (LENGTH == 1) begin
if (DELAY_STAGE > 1) begin
reg [DELAY_STAGE * DATA_WIDTH - 1 : 0]delay_fifo;
always@(posedge clk) begin
delay_fifo <= {delay_fifo[(DELAY_STAGE-1)*DATA_WIDTH-1:0],in_addends};
end
assign out_sum = delay_fifo[(DELAY_STAGE-1)*DATA_WIDTH-1:0];
end
else if (DELAY_STAGE == 1) begin
reg [DATA_WIDTH - 1 : 0]delay_fifo;
always@(posedge clk) begin
delay_fifo <= in_addends;
end
assign out_sum = delay_fifo;
end
else begin
assign out_sum = in_addends;
end
end
这里额外加了个DELAY_STAGE 等于1的分支,是因为DELAY_STAGE 等于1时delay_fifo[(DELAY_STAGE-1)*DATA_WIDTH-1:0]变成了delay_fifo[-1:0],这也是编译器不允许的。
完整的Verilog代码如下:
module AdderTreePipelined(in_addends, out_sum, clk);
parameter DATA_WIDTH = 8;
parameter LENGTH = 3;
parameter DELAY_STAGE = $clog2(LENGTH);
localparam OUT_WIDTH = DATA_WIDTH + $clog2(LENGTH);
localparam LENGTH_A = LENGTH / 2;
localparam LENGTH_B = LENGTH - LENGTH_A;
localparam OUT_WIDTH_A = DATA_WIDTH + $clog2(LENGTH_A);
localparam OUT_WIDTH_B = DATA_WIDTH + $clog2(LENGTH_B);
input clk ;
input [DATA_WIDTH*LENGTH-1:0] in_addends ;
output signed [OUT_WIDTH-1:0] out_sum ;
reg signed[OUT_WIDTH-1:0] out_sum_pip;
generate
genvar i;
if (LENGTH == 1) begin
if (DELAY_STAGE > 1) begin
reg [DELAY_STAGE * DATA_WIDTH - 1 : 0]delay_fifo;
always@(posedge clk) begin
delay_fifo <= {delay_fifo[(DELAY_STAGE-1)*DATA_WIDTH-1:0],in_addends};
end
assign out_sum = delay_fifo[(DELAY_STAGE-1)*DATA_WIDTH-1:0];
end
else if (DELAY_STAGE == 1) begin
reg [DATA_WIDTH - 1 : 0]delay_fifo;
always@(posedge clk) begin
delay_fifo <= in_addends;
end
assign out_sum = delay_fifo;
end
else begin
assign out_sum = in_addends;
end
end
else begin
wire signed[OUT_WIDTH_A-1:0] sum_a;
wire signed[OUT_WIDTH_B-1:0] sum_b;
wire [DATA_WIDTH * LENGTH_A - 1:0] addends_a;
wire [DATA_WIDTH * LENGTH_B - 1:0] addends_b;
assign addends_a = in_addends[DATA_WIDTH * LENGTH_A - 1: 0];
assign addends_b = in_addends[LENGTH * DATA_WIDTH - 1: DATA_WIDTH * LENGTH_A];
//divide set into two chunks, conquer
AdderTreePipelined #(
.DATA_WIDTH(DATA_WIDTH),
.LENGTH(LENGTH_A),
.DELAY_STAGE(DELAY_STAGE - 1)
) subtree_a (
.in_addends(addends_a),
.out_sum(sum_a),
.clk(clk)
);
AdderTreePipelined #(
.DATA_WIDTH(DATA_WIDTH),
.LENGTH(LENGTH_B),
.DELAY_STAGE(DELAY_STAGE - 1)
) subtree_b (
.in_addends(addends_b),
.out_sum(sum_b),
.clk(clk)
);
always@(posedge clk) begin
out_sum_pip <= sum_a + sum_b;
end
assign out_sum = out_sum_pip;
end
endgenerate
endmodule
代码比较繁琐,但是综合出的电路没有多余的逻辑。
这样的加法器树可以达到较高的吞吐率,但是相当费资源。
加流水线的写法和原作者写法有差异,如有纰漏,欢迎指出。
更多推荐
所有评论(0)