3D点云深度学习PointNet源码解析——transform_nets.py （T-net）

参考：https://blog.csdn.net/u014636245/article/details/82763269transform_nets.py是T-net，完成输入接受与特征提取T-NetT−Net是一个微型网络，用于生成一个仿射变换矩阵来对点云的旋转、平移等变化进行规范化处理。这个变换/对齐网络是一个微型的PointNet，它输入原始点云数据，输出为一个3∗3 的旋转矩阵...

夜晓岚渺渺

4126人浏览 · 2019-10-24 20:58:46

夜晓岚渺渺 · 2019-10-24 20:58:46 发布

参考：https://blog.csdn.net/u014636245/article/details/82763269

transform_nets.py是T-net，完成输入接受与特征提取

T-NetT−Net是一个微型网络，用于生成一个仿射变换矩阵来对点云的旋转、平移等变化进行规范化处理。这个变换/对齐网络是一个微型的PointNet，它输入原始点云数据，输出为一个3∗3 的旋转矩阵

#一些包，其中tf_util为作者定义的工具包
import tensorflow as tf
import numpy as np
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(BASE_DIR, '../utils'))
import tf_util

一、对于输入变换代码的注解。

# K=3 代表输入的是原始点云，是每个点的维度(x,y,z). point_cloud 是一个Tensor，属性如下：  
# point_cloud=Tensor("Placeholder:0", shape=(32, 1024, 3), dtype=float32, device=/device:GPU:0)
def input_transform_net(point_cloud, is_training, bn_decay=None, K=3):
    """ Input (XYZ) Transform Net, input is BxNx3 gray image
        Return:
            Transformation matrix of size 3xK """
    batch_size = point_cloud.get_shape()[0].value #点云的个数(一个batch包含的点云数目，pointNet 为 32)
    num_point = point_cloud.get_shape()[1].value  #每个点云内点的个数 (pointNet 为 1024）

    input_image = tf.expand_dims(point_cloud, -1) #在point_cloud最后追加一个维度，BxNx3 变成 BxNx3x1 3d张量-->4d张量

    # 输入点云point_cloud有3个axis，即B×N×3，tf.expand_dims(point_cloud, -1) 将点云最后加上一个size为1 的axis
    # 作为 input_image（B×N×3×1），则input_image的channel数为1。

    # net=Tensor("transform_net1/tfc1/Relu:0", shape=(x,x,x,x), dtype=float32, device=/device:GPU:0)
    # 64 代表要输出的 channels (单通道变成64通道)
    # [1,3]代表1行3列的矩阵，作为卷积核。将B×N×3×1转换成 B×N×1×64
    # 步长：stride=[1,1] 代表滑动一个距离。决定滑动多少可以到边缘。
    # padding='VALID',在原始图像上加边界(这里默认不加)
    # bn: 批归一化
    # is_training=is_training 设置训练模式
    # bn_decay=bn_decay
    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)

    # 128 代表要输出的 channels
    # [1,1]代表1行1列的矩阵，作为卷积核。将B×N×1×64转换成 B×N×1×128
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)

    # 1024 代表要输出的 channels
    # [1,1]代表1行1列的矩阵，作为卷积核。将B×N×1×128转换成 B×N×1 X 1024
    net = tf_util.conv2d(net, 1024, [1, 1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)

    #对上一步做 max_pooling 操作，将B×N×1×1024 转换成 B×1×1 X 1024
    net = tf_util.max_pool2d(net, [num_point, 1], padding='VALID', scope='tmaxpool')

    # 利用1024维特征生成256维度的特征向量
    # 将 Bx1x1x1024变成 Bx1024
    net = tf.reshape(net, [batch_size, -1])

    # 将 Bx1024变成 Bx512
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)

    # 将 Bx512变成 Bx256
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)

    with tf.variable_scope('transform_XYZ') as sc:
        assert(K==3)
        # weights(wights,[256,9],dtype=tf.float32)
        weights = tf.get_variable('weights', [256, 3*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)

        #<tf.Variable 'transform_net1/transform_XYZ/biases:0' shape=(9,) dtype=float32_ref>
        biases = tf.get_variable('biases', [3*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)

        #<tf.Variable 'transform_net1/transform_XYZ/biases:0' shape=(9,) dtype=float32_ref>
        #变成
        #Tensor("transform_net1/transform_XYZ/add:0", shape=(9,), dtype=float32, device=/device:GPU:0)
        biases += tf.constant([1,0,0,0,1,0,0,0,1], dtype=tf.float32)


        # net = shape(32,256) weight = shape(256,9)  ===> net*weight = transform(32,9)
        # Tensor("transform_net1/transform_XYZ/MatMul:0", shape=(32, 9), dtype=float32, device=/device:GPU:0)
        transform = tf.matmul(net, weights)

        # Tensor("transform_net1/transform_XYZ/MatMul:0", shape=(32, 9), dtype=float32, device=/device:GPU:0)
        # 变成
        # Tensor("transform_net1/transform_XYZ/BiasAdd:0", shape=(32, 9), dtype=float32, device= / device: GPU:0)
        transform = tf.nn.bias_add(transform, biases)


    # 由Tensor("transform_net1/transform_XYZ/BiasAdd:0", shape=(32, 9), dtype=float32, device=/device:GPU:0)
    # 变成
    # Tensor("transform_net1/Reshape_1:0", shape=(32, 3, 3), dtype=float32, device=/device:GPU:0)
    transform = tf.reshape(transform, [batch_size, 3, K])

    return transform

代码中一些命令的 notes：

`1、tf.reshape（）`

1、reshape是如何进行矩阵的变换的，其简单的流程就是：将矩阵t变换为一维矩阵，然后再对矩阵的形式进行更改就好了。-1 的应用:-1 表示不知道该填什么数字合适的情况下，可以选择，由python通过a和其他的值3推测出来。利用reshape进行数组形状的转换时，一定要满足（x,y）中x×y=数组的个数。

`2、tf.expand_dims`

参考博客：https://blog.csdn.net/qq_41853758/article/details/82718493

tf.expand_dims(input,
               axis=None,
               name=None,
               dim=None)

在第axis位置增加一个维度，此操作在输入形状的维度索引轴处插入1的尺寸。尺寸索引轴从零开始;如果您指定轴的负数，则从最后向后计数。如果要将批量维度添加到单个元素，则此操作非常有用。例如，如果您有一个单一的形状[height，width channels]，您可以使用expand_dims（image，0）使其成为1个图像，这将使形状[1，高度，宽度，通道]。

For example:

# 't' is a tensor of shape [2]

shape(expand_dims(t, 0)) ==> [1, 2]

shape(expand_dims(t, 1)) ==> [2, 1]

shape(expand_dims(t, -1)) ==> [2, 1]

# 't2' is a tensor of shape [2, 3, 5]

shape(expand_dims(t2, 0)) ==> [1, 2, 3, 5]

shape(expand_dims(t2, 2)) ==> [2, 3, 1, 5]

shape(expand_dims(t2, 3)) ==> [2, 3, 5, 1]

Args:
输入：张量。
轴：0-D（标量）。 指定扩大输入形状的维度索引。
名称：输出名称Tensor。
dim：0-D（标量）。 等同于轴，不推荐使用。

`3、tf.get_variable()`

获取一个已经存在的变量或者创建一个新的变量

get_variable(name,
                 shape=None,
                 dtype=None,
                 initializer=None,
                 regularizer=None,
                 trainable=None,
                 collections=None,
                 caching_device=None,
                 partitioner=None,
                 validate_shape=True,
                 use_resource=None,
                 custom_getter=None,
                 constraint=None,
                 synchronization=VariableSynchronization.AUTO,
                 aggregation=VariableAggregation.NONE)

Args参数说明:

name：新变量或现有变量的名称。

shape：新变量或现有变量的形状。

dtype：新变量或现有变量的类型（默认为DT_FLOAT）。

ininializer：如果创建了则用它来初始化变量。

regularizer：A（Tensor - > Tensor或None）函数;将它应用于新创建的变量的结果将添加到集合tf.GraphKeys.REGULARIZATION_LOSSES中，并可用于正则化。

trainable：如果为True，还将变量添加到图形集合GraphKeys.TRAINABLE_VARIABLES（参见tf.Variable）。

collections：要将变量添加到的图表集合列表。默认为[GraphKeys.GLOBAL_VARIABLES]（参见tf.Variable）。

caching_device：可选的设备字符串或函数，描述变量应被缓存以供读取的位置。默认为Variable的设备。如果不是None，则在另一台设备上缓存。典型用法是在使用变量驻留的Ops的设备上进行缓存，以通过Switch和其他条件语句进行重复数据删除。

partitioner：可选callable，接受完全定义的TensorShape和要创建的Variable的dtype，并返回每个轴的分区列表（当前只能对一个轴进行分区）。

validate_shape：如果为False，则允许使用未知形状的值初始化变量。如果为True，则默认为initial_value的形状必须已知。

use_resource：如果为False，则创建常规变量。如果为true，则使用定义良好的语义创建实验性ResourceVariable。默认为False（稍后将更改为True）。在Eager模式下，此参数始终强制为True。

custom_getter：Callable，它将第一个参数作为true getter，并允许覆盖内部get_variable方法。 custom_getter的签名应与此方法的签名相匹配，但最适合未来的版本将允许更改：def custom_getter（getter，* args，** kwargs）。也允许直接访问所有get_variable参数：def custom_getter（getter，name，* args，** kwargs）。一个简单的身份自定义getter只需创建具有修改名称的变量是：python def custom_getter（getter，name，* args，** kwargs）：return getter（name +'_suffix'，* args，** kwargs）

注意：如果initializer初始化方法是None(默认值),则会使用variable_scope()中定义的initializer,如果也为None,则默认使用glorot_uniform_initializer,也可以使用其他的tensor来初始化，value,和shape与此tensor相同

正则化方法默认是None,如果不指定，只会使用variable_scope()中的正则化方式，如果也为None，则不使用正则化；

4、关于 tf.nn.bias_add

通俗解释：

一个叫bias的向量加到一个叫value的矩阵上，是向量与矩阵的每一行进行相加，得到的结果和value矩阵大小相同。

import tensorflow as tf
a=tf.constant([[1,1],[2,2],[3,3]],dtype=tf.float32)
b=tf.constant([1,-1],dtype=tf.float32)
c=tf.constant([1],dtype=tf.float32)
 
with tf.Session() as sess:
    print('a:')
    print(sess.run(a))
    print('b:')
    print(sess.run(b))
    print('bias_add:')
    print(sess.run(tf.nn.bias_add(a, b)))
    #执行下面语句错误
    #print(sess.run(tf.nn.bias_add(a, c)))

执行结果：
a:
[[1. 1.]
 [2. 2.]
 [3. 3.]]
b:
[ 1. -1.]
bias_add:
[[2. 0.]
 [3. 1.]
 [4. 2.]]

5、 tf.matmul( ) 将矩阵a乘以矩阵b，生成a * b。

参考:https://blog.csdn.net/mumu_1233/article/details/78887068

函数：tf.matmul

matmul(
    a,
    b,
    transpose_a=False,
    transpose_b=False,
    adjoint_a=False,
    adjoint_b=False,
    a_is_sparse=False,
    b_is_sparse=False,
    name=None
)

参数:
a: 一个类型为 float16, float32, float64, int32, complex64, complex128 且张量秩 > 1 的张量。
b: 一个类型跟张量a相同的张量。
transpose_a: 如果为真, a则在进行乘法计算前进行转置。
transpose_b: 如果为真, b则在进行乘法计算前进行转置。
adjoint_a: 如果为真, a则在进行乘法计算前进行共轭和转置。
adjoint_b: 如果为真, b则在进行乘法计算前进行共轭和转置。
a_is_sparse: 如果为真, a会被处理为稀疏矩阵。
b_is_sparse: 如果为真, b会被处理为稀疏矩阵。
name: 操作的名字（可选参数）
返回值：一个跟张量a和张量b类型一样的张量且最内部矩阵是a和b中的相应矩阵的乘积。
注意：
（1）输入必须是矩阵（或者是张量秩 >２的张量，表示成批的矩阵），并且其在转置之后有相匹配的矩阵尺寸。
（2）两个矩阵必须都是同样的类型，支持的类型如下：float16, float32, float64, int32, complex64, complex128。
引发错误:

import tensorflow as tf
a = tf.constant([1, 2, 3, 4, 5, 6], shape=[3, 2])
b = tf.constant([7, 8, 9, 10], shape=[2, 2])
#注意这里这里x,y要满足矩阵相乘的格式要求。
c = tf.matmul(a, b)
with tf.Session() as sess:
    print('a:')
    print(sess.run(a))
    print('b:')
    print(sess.run(b))
    print('mautl:')
    print(sess.run(c))

运行结果：
a:
[[1 2]
 [3 4]
 [5 6]]
b:
[[ 7  8]
 [ 9 10]]
mautl:
[[ 25  28]
 [ 57  64]
 [ 89 100]]



import tensorflow as tf
import numpy as np

# 3-D tensor `a`
  # [[[ 1,  2,  3],
  #   [ 4,  5,  6]],
  #  [[ 7,  8,  9],
  #   [10, 11, 12]]]
a = tf.constant([1,2,3,4,5,6,7,8,9,10,11,12],shape=[2,2,3])
print(a.shape)
  # 3-D tensor `b`
  # [[[13, 14],
  #   [15, 16],
  #   [17, 18]],
  #  [[19, 20],
  #   [21, 22],
  #   [23, 24]]]
b= tf.constant(np.arange(13, 25, dtype=np.int32),
                  shape=[2, 3, 2])
print(b.shape)

  # `a` * `b`
  # [[[ 94, 100],
  #   [229, 244]],
  #  [[508, 532],
  #   [697, 730]]]
c = tf.matmul(a, b)
print(c.shape)
print("\n")

with tf.Session() as sess:
    print("a:")
    print(sess.run(a))
    print("b:")
    print(sess.run(b))
    print("c:")
    print(sess.run(c))

运行结果：
(2, 2, 3)
(2, 3, 2)
(2, 2, 2)


a:
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
b:
[[[13 14]
  [15 16]
  [17 18]]

 [[19 20]
  [21 22]
  [23 24]]]
c:
[[[ 94 100]
  [229 244]]

 [[508 532]
  [697 730]]

6.tf.multiply（）两个矩阵中对应元素各自相乘

格式: tf.multiply(x, y, name=None)
参数:
x: 一个类型为:half, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128的张量。
y: 一个类型跟张量x相同的张量。
返回值： x * y element-wise.
注意：
（1）multiply这个函数实现的是元素级别的相乘，也就是两个相乘的数元素各自相乘，而不是矩阵乘法，注意和tf.matmul区别。
（2）两个相乘的数必须有相同的数据类型，不然就会报错

import tensorflow as tf
a1 = tf.constant([1, 2, 3, 4], shape=[2, 2])
b1 = tf.constant([7, 8, 9, 10], shape=[2, 2])
#注意这里这里x,y要满足矩阵相乘的格式要求。
c1 = tf.multiply(a1, b1)
with tf.Session() as sess:
    print('a1:')
    print(sess.run(a1))
    print('b1:')
    print(sess.run(b1))
    print('mautl:')
    print(sess.run(c1))

运行结果:
a1:
[[1 2]
 [3 4]]
b1:
[[ 7  8]
 [ 9 10]]
mautl:
[[ 7 16]
 [27 40]]

以下为一些相关的命令，参见博客 https://blog.csdn.net/mieleizhi0522/article/details/80416668

7、tf.add( x,y, name=None)：

通俗解释：

这个情况比较多，最常见的是，一个叫x的矩阵和一个叫y的数相加，就是y分别与x的每个数相加，得到的结果和x大小相同。

import tensorflow as tf    
  
x=tf.constant([[1,2],[1,2]])    
y=tf.constant([[1,1],[1,2]])  
z=tf.add(x,y)  
  
x1=tf.constant(1)  
y1=tf.constant(2)  
z1=tf.add(x1,y1)  
  
x2=tf.constant(2)  
y2=tf.constant([1,2])  
z2=tf.add(x2,y2)  
  
x3=tf.constant([[1,2],[3,4]])    
y3=tf.constant([[1,1]])  
z3=tf.add(x3,y3)  
  
with tf.Session() as sess:  
    z_result,z1_result,z2_result,z3_result=sess.run([z,z1,z2,z3])  
    print('z =\n%s'%(z_result))  
    print('z1 =%s'%(z1_result))  
    print('z2 =%s'%(z2_result))  
    print('z3 =%s'%(z3_result))


得到的结果是：

z =[[2 3]
    [2 4]]
z1 =3
 
z2 =[3 4]
 
z3 =[[2 3]
     [4 5]]

tf.add_n(inputs,name=None)

通俗解释：函数是实现一个列表的元素的相加。就是输入的对象是一个列表，列表里的元素可以是向量，矩阵等但没有广播功能

import tensorflow as tf;    
import numpy as np;    
    
input1 = tf.constant([1.0, 2.0, 3.0])    
input2 = tf.Variable(tf.random_uniform([3]))    
output = tf.add_n([input1, input2]) #注意输入是一个列表   
    
with tf.Session() as sess:    
    sess.run(tf.initialize_all_variables())    
    print (sess.run(input1 + input2))    
    print (sess.run(output))

输出结果：

[1.4135424 2.694611 3.2243743]
[1.4135424 2.694611 3.2243743]

二、对于特征提取代码。和初始点云变换代码的理解一致。

# 输入是一个张量：inputs = Tensor("conv2/Relu:0", shape=(32, 1024, 1, 64), dtype=float32, device=/device:GPU:0)
def feature_transform_net(inputs, is_training, bn_decay=None, K=64):
    """ Feature Transform Net, input is BxNx1xK
        Return:
            Transformation matrix of size KxK """
    batch_size = inputs.get_shape()[0].value
    num_point = inputs.get_shape()[1].value

    net = tf_util.conv2d(inputs, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)

    with tf.variable_scope('transform_feat') as sc:
        weights = tf.get_variable('weights', [256, K*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [K*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant(np.eye(K).flatten(), dtype=tf.float32)
        transform = tf.matmul(net, weights)
        transform = tf.nn.bias_add(transform, biases)

    transform = tf.reshape(transform, [batch_size, K, K])
    return transform

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

AI 浪潮下的锚与帆：工程师文化的变与不变 | 架构师夜生活

腾讯云开发者社区

腾讯云架构师技术沙龙 · 长沙站圆满落幕，共话AI驱动下的技术架构与前沿应用

人工智能已成为推动技术创新与产业变革的重要引擎，开发者正身处一场前所未有的技术变革之中。通过本次腾讯云架构师技术沙龙，各位专家深入分享前沿技术洞察，探讨 AI 落地的应用路径与实践经验，为架构师的职业发展指明方向。腾讯云架构师长沙同盟和腾讯云架构师技术同盟长沙地区理事会正式成立。未来，腾讯云架构师长沙同盟将凝心聚力，打造属于本地架构师的学习与成长的家园，助力中国架构的蓬勃发展。未来已来，让我们携手

腾讯云开发者社区

从具身智能到行业应用，腾讯云携业界专家共话 AI 新趋势

在热烈的讨论气氛下，本次活动圆满落幕。与会专家实地参观大模型创新生态社区“模速空间”，体验 AI 前沿创新应用落地。来自工业制造、数字化、AI领域的专家分享最新 AI 落地实践与思考，共同探讨从认知智能到物理交互的前沿先进路径，让我们看到AI在各行业释放出的巨大潜力。在头脑风暴环节，各位专家从不同角度深入探讨 AI 技术发展路径，提出诸多具有建设性的观点与建议，提供创新思路与方向，开启智能新时代的