下面是 2020年人工神经网络第一次作业参考答案 .
 

01 第一题参考


1.两种网络权系数学习公式

(1) 结构1

结构1网络是标准的分层(作业题中结构以是两层)前馈网络。可以根据 BP算法 ,利用误差反向传播关系写出各层权系数调整值算法公式。

下面给出课件中的对于具有 h + 1 h + 1 h+1层网络中,第 n n n层的权系数 w i j n w_{ij}^n wijn的调整公式:

其中 w i j n w_{ij}^n wijn 表示第 n − 1 n - 1 n1层的第 i i i个神经元连接到第 n n n层的第 j j j个神经元的权系数。

▲ 具有h+1层的前馈网络结构示意图

▲ 具有h+1层的前馈网络结构示意图

公式中:

  • δ k n + 1 \delta _k^{n + 1} δkn+1:是第 n + 1 n + 1 n+1层中的第 k k k个神经元的学习信号。它是从网络的输出层反向传播过来的。对于神经网络输出层的学习信号 δ k o \delta _k^o δko就定义为误差: δ k o = d k − o k \delta _k^o = d_k - o_k δko=dkok

  • y j ′ n y_j^{'n} yjn:表示第 n n n层神经元输出信息的导数;

  • y i n − 1 y_i^{n - 1} yin1:表示 n − 1 n - 1 n1层中第 i i i个神经元的输出。

本质上讲,上面的权系数调整公示都符合统一的神经元学习算法公式:

即对于神经元线性加权的权系数 w ˉ \bar w wˉ的调整 Δ w ˉ \Delta \bar w Δwˉ与学习信号 r ( x ˉ , w ˉ , d ) r\left( {\bar x,\bar w,d} \right) r(xˉ,wˉ,d),输入向量 x ˉ \bar x xˉ的乘积成正比。

根据以上公式,可以写出结构1中网络各个权系数的修改公式。对于网络中的变量做如下约束: Δ w 12 = η ⋅ ( d − o 1 ) ⋅ o 1 ( 1 − o 1 ) ⋅ o 2 \Delta w_{12} = \eta \cdot \left( {d - o_1 } \right) \cdot o_1 \left( {1 - o_1 } \right) \cdot o_2 Δw12=η(do1)o1(1o1)o2

  • η \eta η是学习速率。
  • o 1 , o 2 , o 3 o_1 ,o_2 ,o_3 o1,o2,o3分别是神经网络中神经元1,2,3的输出;
  • 输入节点4,5的输出分别是 x 1 , x 2 x_1 ,x_2 x1,x2

由于结构1中的所有神经元的激励函数都是sigmoid函数: f ( x ) = 1 1 + e − x f\left( x \right) = {1 \over {1 + e^{ - x} }} f(x)=1+ex1

对应的函数导数为:
f ′ ( x ) = e − x ( 1 + e − x ) 2 = f ( x ) ⋅ [ 1 − f ( x ) ] f'\left( x \right) = {{e^{ - x} } \over {\left( {1 + e^{ - x} } \right)^2 }} = f\left( x \right) \cdot \left[ {1 - f\left( x \right)} \right] f(x)=(1+ex)2ex=f(x)[1f(x)]

这样,神经元 o 1 , o 2 , o 3 o_1 ,o_2 ,o_3 o1,o2,o3的输出导数可以表示成:

o i ⋅ ( 1 − o i ) ,      i = 1 , 2 , 3 o_i \cdot \left( {1 - o_i } \right),\,\,\,\,i = 1,2,3 oi(1oi),i=1,2,3

  • 输出层神经元权系数:

神经元1的学习信号: δ 1 = ( d − o 1 ) ⋅ o 1 ⋅ ( 1 − o 1 ) \delta _1 = \left( {d - o_1 } \right) \cdot o_1 \cdot \left( {1 - o_1 } \right) δ1=(do1)o1(1o1)

那么它的三个神经元权向量的修正公式为:
Δ w 12 = η ⋅ ( d − o 1 ) ⋅ o 1 ( 1 − o 1 ) ⋅ o 2 \Delta w_{12} = \eta \cdot \left( {d - o_1 } \right) \cdot o_1 \left( {1 - o_1 } \right) \cdot o_2 Δw12=η(do1)o1(1o1)o2

Δ w 13 = η ⋅ ( d − o 1 ) ⋅ o 1 ( 1 − o 1 ) ⋅ o 3 \Delta w_{13} = \eta \cdot \left( {d - o_1 } \right) \cdot o_1 \left( {1 - o_1 } \right) \cdot o_3 Δw13=η(do1)o1(1o1)o3

Δ w 10 = − η ⋅ ( d − o 1 ) ⋅ o 1 ( 1 − o 1 ) \Delta w_{10} =- \eta \cdot \left( {d - o_1 } \right) \cdot o_1 \left( {1 - o_1 } \right) Δw10=η(do1)o1(1o1)

注意:对于权系数 w 10 w_{10} w10,由于网络中规定的输入为-1,所以在修正公式前具有负号(-1)。对于后面的 w 20 , w 30 w_{20} ,w_{30} w20,w30也是一样的。

  • 隐层神经元权系数:
    隐层神经元2,3的学习信号分别为:
    δ 2 = w 12 ⋅ δ 1 ⋅ o 2 ⋅ ( 1 − o 2 ) \delta _2 = w_{12} \cdot \delta _1 \cdot o_2 \cdot \left( {1 - o_2 } \right) δ2=w12δ1o2(1o2)

δ 2 = w 13 ⋅ δ 1 ⋅ o 3 ⋅ ( 1 − o 3 ) \delta _2 = w_{13} \cdot \delta _1 \cdot o_3 \cdot \left( {1 - o_3 } \right) δ2=w13δ1o3(1o3)

因此隐层神经元的各个权系数修正公式分别为:

Δ w 24 = η ⋅ w 12 ⋅ δ 1 ⋅ o 2 ⋅ ( 1 − o 2 ) ⋅ x 1 \Delta w_{24} = \eta \cdot w_{12} \cdot \delta _1 \cdot o_2 \cdot \left( {1 - o_2 } \right) \cdot x_1 Δw24=ηw12δ1o2(1o2)x1

Δ w 25 = η ⋅ w 12 ⋅ δ 1 ⋅ o 2 ⋅ ( 1 − o 2 ) ⋅ x 2 \Delta w_{25} = \eta \cdot w_{12} \cdot \delta _1 \cdot o_2 \cdot \left( {1 - o_2 } \right) \cdot x_2 Δw25=ηw12δ1o2(1o2)x2

Δ w 34 = η ⋅ w 13 ⋅ δ 1 ⋅ o 3 ⋅ ( 1 − o 3 ) ⋅ x 1 \Delta w_{34} = \eta \cdot w_{13} \cdot \delta _1 \cdot o_3 \cdot \left( {1 - o_3 } \right) \cdot x_1 Δw34=ηw13δ1o3(1o3)x1

Δ w 35 = η ⋅ w 13 ⋅ δ 1 ⋅ o 3 ⋅ ( 1 − o 3 ) ⋅ x 2 \Delta w_{35} = \eta \cdot w_{13} \cdot \delta _1 \cdot o_3 \cdot \left( {1 - o_3 } \right) \cdot x_2 Δw35=ηw13δ1o3(1o3)x2

Δ w 20 = − η ⋅ w 12 ⋅ δ 1 ⋅ o 2 ⋅ ( 1 − o 2 ) \Delta w_{20} =- \eta \cdot w_{12} \cdot \delta _1 \cdot o_2 \cdot \left( {1 - o_2 } \right) Δw20=ηw12δ1o2(1o2)

Δ w 30 = − η ⋅ w 13 ⋅ δ 1 ⋅ o 3 ⋅ ( 1 − o 3 ) \Delta w_{30} =- \eta \cdot w_{13} \cdot \delta _1 \cdot o_3 \cdot \left( {1 - o_3 } \right) Δw30=ηw13δ1o3(1o3)

(2) 结构2

相比于结构1,结构2不是一个严格的分层网络,因为它的输入层的信息 x 3 , x 4 x_3 ,x_4 x3,x4直接越过隐层,到达输出层。

该网络只有两个神经元1,2。它们的输出分别记为: o 1 , o 2 o_1 ,o_2 o1,o2。对于输入层的神经元3,4的输出就是输入信号 x 3 , x 4 x_3 ,x_4 x3,x4

神经元的传递函数采用双曲正切函数:

f ( x ) = 1 − e − x 1 + e − x f(x) = {{1 - e^{ - x} } \over {1 + e^{ - x} }} f(x)=1+ex1ex f ′ ( x ) = 1 2 [ 1 − f 2 ( x ) ] f'(x) = {\textstyle{1 \over 2}}\left[ {1 - f^2 (x)} \right] f(x)=21[1f2(x)]

由于结构2的网络比较简单,可以直接根据公式(1-2)写出其中两个神经元1,2的各个权系数的修正公式:

  • 神经元1:
    学习信号: δ 1 = ( d − o 1 ) ⋅ 1 2 ( 1 − o 1 2 ) \delta _1 = \left( {d - o_1 } \right) \cdot {1 \over 2}\left( {1 - o_1^2 } \right) δ1=(do1)21(1o12)
    四个权系数的修正公式为:
    Δ w 13 = η ⋅ δ 1 ⋅ x 3 \Delta w_{13} = \eta \cdot \delta _1 \cdot x_3 Δw13=ηδ1x3

Δ w 14 = η ⋅ δ 1 ⋅ x 4 \Delta w_{14} = \eta \cdot \delta _1 \cdot x_4 Δw14=ηδ1x4

Δ w 12 = η ⋅ δ 1 ⋅ o 2 \Delta w_{12} = \eta \cdot \delta _1 \cdot o_2 Δw12=ηδ1o2

Δ w 10 = η ⋅ δ 1 ⋅ ( − 1 ) \Delta w_{10} = \eta \cdot \delta _1 \cdot \left( { - 1} \right) Δw10=ηδ1(1)

  • 神经元2:

学习信号:
δ 2 = w 12 ⋅ δ 1 ⋅ 1 2 ( 1 − o 2 2 ) \delta _2 = w_{12} \cdot \delta _1 \cdot {1 \over 2}\left( {1 - o_2^2 } \right) δ2=w12δ121(1o22)

三个权系数修正公式:

Δ w 23 = η ⋅ δ 2 ⋅ x 3 \Delta w_{23} = \eta \cdot \delta _2^{} \cdot x_3 Δw23=ηδ2x3

Δ w 24 = η ⋅ δ 2 ⋅ x 4 \Delta w_{24} = \eta \cdot \delta _2^{} \cdot x_4 Δw24=ηδ2x4

Δ w 20 = η ⋅ δ 2 ⋅ ( − 1 ) \Delta w_{20} = \eta \cdot \delta _2^{} \cdot \left( { - 1} \right) Δw20=ηδ2(1)

2.使用编程语言实现上述基本算法

由于结构1属于典型的前馈网络,所以可以在标准的网络平台上实现该算法,比如MATLAB,TensorFlow,Keras,PaddlePaddle上等等。

对于结构2,由于它不属于标准的前馈网络,所以实现该算法,则需要通过C、Python等通用的编程语言来实现。

(1) 使用Python实现第一种结构算法

第一接驳算法的Python程序参见附录中:第一种结构(BP网络程序)

网络参数:

  • 中间隐层结构2各节点,输出节点传递函数为线性函数。
  • 使用学习速率:0.5。

下面是进行训练,网络误差曲线降低的过程。
▲ 训练误差收敛曲线

▲ 训练误差收敛曲线

训练之后,对于异或问题的计算结构为:

样本S1S2S3S4
期望输出0110
神经网络输出0.04041430.941393780.924363390.09499248

讨论1: 输入采用线性传递函数,相当于对于损失函数,不是使用均方差: L p = 1 m ∑ i = 1 m ( y ˉ i − y i ) 2 L_p = {1 \over m}\sum\limits_{i = 1}^m {\left( {\bar y_i - y_i } \right)^2 } Lp=m1i=1m(yˉiyi)2
而是使用的对数似然函数:
L ( y ˉ , y ) = − [ y ⋅ log ⁡ y + ( 1 − y ) ⋅ log ⁡ ( 1 − y ˉ ) ] L\left( {\bar y,y} \right) = - \left[ {y \cdot \log y + \left( {1 - y} \right) \cdot \log \left( {1 - \bar y} \right)} \right] L(yˉ,y)=[ylogy+(1y)log(1yˉ)]

讨论2: 如果输出的传递函数采用sigmoid函数,同时误差仍然采用均方差,上述网络在训练过程中对于XOR问题无法收敛。对于逻辑与则可以收敛到一个误差大约最小为0.05的结果。

(2) 使用Python 实现第二种结构

神经元1采用线性输出传递函数;神经元2采用双曲正切传递函数。

具体Python程序参见本文最后附录中的:第二种结构的程序

如下是在学习速率为0.5, 使用(-1,1)来表示异或逻辑样本是,网络训练误差收敛过程:
▲ 使用双极性训练网络,误差收敛曲线

▲ 使用双极性训练网络,误差收敛曲线

网络输出结果:

样本x1x2x3x3
期望输出-111-1
实际输出-0.995037550.992124550.99503753-0.99503758
  • 讨论1: 使用(0,1)来表示异或逻辑,上述训练过程不收敛。而使用(-1,1)逻辑来表示,则训练过程中很快就收敛了。

 

※ 作业1-1中的程序


1.第一种结构(BP网络)程序

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# HWDATA.PY                    -- by Dr. ZhuoQing 2020-11-17
#
# Note:
#============================================================

from headm import *

#------------------------------------------------------------
# Samples data construction

xor_x = array([[0,0],[1,0],[0,1],[1,1]])       # row->sample
xor_y = array([0, 1, 1, 0]).reshape(1, -1)     # col->sample

xor_x0 = array([[-1,-1],[1,-1],[-1,1],[1,1]])  # row->sample
xor_y0 = array([-1, 1, 1, -1]).reshape(1, -1)  # col->sample

#------------------------------------------------------------
def shuffledata(X, Y):
    id = list(range(X.shape[0]))
    random.shuffle(id)
    return X[id], (Y.T[id]).T

#------------------------------------------------------------
# Define and initialization NN
def initialize_parameters(n_x, n_h, n_y):
    random.seed(2)

    W1 = random.randn(n_h, n_x) * 0.5          # dot(W1,X.T)
    W2 = random.randn(n_y, n_h) * 0.5          # dot(W2,Z1)
    b1 = zeros((n_h, 1))                       # Column vector
    b2 = zeros((n_y, 1))                       # Column vector

    parameters = {'W1':W1,
                  'b1':b1,
                  'W2':W2,
                  'b2':b2}

    return parameters

#------------------------------------------------------------
# Forward propagattion
# X:row->sample;
# Z2:col->sample
def forward_propagate(X, parameters):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    Z1 = dot(W1, X.T) + b1                    # X:row-->sample; Z1:col-->sample
    A1 = 1/(1+exp(-Z1))

    Z2 = dot(W2, A1) + b2                     # Z2:col-->sample
#    A2 = 1/(1+exp(-Z2))                       # A:col-->sample
    A2 = Z2                                   # Linear output

    cache = {'Z1':Z1,
             'A1':A1,
             'Z2':Z2,
             'A2':A2}
    return Z2, cache

#------------------------------------------------------------
# Calculate the cost
# A2,Y: col->sample
def calculate_cost(A2, Y, parameters):
    err = A2 - Y
    cost = dot(err, err.T) / Y.shape[1]
    return cost

#------------------------------------------------------------
# Backward propagattion
def backward_propagate(parameters, cache, X, Y):
    m = X.shape[0]                  # Number of the samples

    W1 = parameters['W1']
    W2 = parameters['W2']
    A1 = cache['A1']
    A2 = cache['A2']

    dZ2 = (A2 - Y) #* (A2 * (1-A2))
    dW2 = dot(dZ2, A1.T) / m
    db2 = sum(dZ2, axis=1, keepdims=True) / m

    dZ1 = dot(W2.T, dZ2) * (A1 * (1-A1))
    dW1 = dot(dZ1, X) / m
    db1 = sum(dZ1, axis=1, keepdims=True) / m

    grads = {'dW1':dW1,
             'db1':db1,
             'dW2':dW2,
             'db2':db2}

    return grads

#------------------------------------------------------------
# Update the parameters
def update_parameters(parameters, grads, learning_rate):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']

    W1 = W1 - learning_rate * dW1
    W2 = W2 - learning_rate * dW2
    b1 = b1 - learning_rate * db1
    b2 = b2 - learning_rate * db2

    parameters = {'W1':W1,
                  'b1':b1,
                  'W2':W2,
                  'b2':b2}

    return parameters

#------------------------------------------------------------
# Define the training
def train(X, Y, num_iterations, learning_rate, print_cost=False):
#    random.seed(3)

    n_x = 2
    n_y = 1
    n_h = 3

    lr = learning_rate

    parameters = initialize_parameters(n_x, n_h, n_y)
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    XX,YY = shuffledata(X, Y)

    costdim = []

    for i in range(0, num_iterations):
        A2, cache = forward_propagate(XX, parameters)
        cost = calculate_cost(A2, YY, parameters)
        grads = backward_propagate(parameters, cache, XX, YY)
        parameters = update_parameters(parameters, grads, lr)

        if print_cost and i % 50 == 0:
            printf('Cost after iteration:%i: %f'%(i, cost))
            costdim.append(cost[0][0])

            if cost < 0.01:
                break

#            XX, YY = shuffledata(X, Y)

    return parameters, costdim

#------------------------------------------------------------
parameter,costdim = train(xor_x, xor_y, 10000, 0.5, True)

A2, cache = forward_propagate(xor_x, parameter)
printf(A2, xor_y)

plt.plot(arange(len(costdim))*50, costdim)
plt.xlabel("Step(10)")
plt.ylabel("Cost")
plt.grid(True)
plt.tight_layout()
plt.show()

#------------------------------------------------------------
#        END OF FILE : HWDATA.PY
#============================================================

2.第二种结构Python程序

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# HWXOR1.PY                    -- by Dr. ZhuoQing 2020-11-17
#
# Note:
#============================================================

from headm import *

#------------------------------------------------------------
# Samples data construction

xor_x = array([[0,0],[1,0],[0,1],[1,1]])       # row->sample
xor_y = array([0, 1, 1, 1]).reshape(1, -1)     # col->sample

xor_x0 = array([[-1,-1],[1,-1],[-1,1],[1,1]])  # row->sample
xor_y0 = array([-1, 1, 1, -1]).reshape(1, -1)  # col->sample

#------------------------------------------------------------
def shuffledata(X, Y):
    id = list(range(X.shape[0]))
    random.shuffle(id)
    return X[id], (Y.T[id]).T

#------------------------------------------------------------
# Define and initialization NN
def initialize_parameters():
    random.seed(2)

    w10 = random.randn(1) * 0.1
    w20 = random.randn(1) * 0.1
    w13 = random.randn(1) * 0.1
    w12 = random.randn(1) * 0.1
    w14 = random.randn(1) * 0.1
    w23 = random.randn(1) * 0.1
    w24 = random.randn(1) * 0.1

    parameters = {'w10':w10, 'w20':w20,
                  'w13':w13, 'w12':w12, 'w14':w14,
                  'w23':w23, 'w24':w24}

    return parameters

#------------------------------------------------------------
# Forward propagattion
# X:row->sample;
# Z2:col->sample
def forward_propagate(X, parameters):
    w10 = parameters['w10']
    w20 = parameters['w20']
    w13 = parameters['w13']
    w12 = parameters['w12']
    w14 = parameters['w14']
    w23 = parameters['w23']
    w24 = parameters['w24']

    W2 = array([w23, w24])
    W1 = array([w13, w14])
    Z2 = dot(W2.T, X.T) - w20
#    A2 = 1/(1+exp(-Z2))
    A2 = (1-exp(-Z2)) / (1+exp(-Z2))

    Z1 = dot(W1.T, X.T) + w12 * A2 - w10
    A1 = Z1

    cache = {'Z1':Z1,
             'A1':A1,
             'Z2':Z2,
             'A2':A2}
    return Z1, cache

#------------------------------------------------------------
# Calculate the cost
# A2,Y: col->sample
def calculate_cost(A2, Y, parameters):
    err = A2 - Y
    cost = dot(err, err.T) / Y.shape[1]
    return cost

#------------------------------------------------------------
# Backward propagattion
def backward_propagate(parameters, cache, X, Y):
    m = X.shape[0]                  # Number of the samples

    w10 = parameters['w10']
    w20 = parameters['w20']
    w13 = parameters['w13']
    w12 = parameters['w12']
    w14 = parameters['w14']
    w23 = parameters['w23']
    w24 = parameters['w24']

    A1 = cache['A1']
    A2 = cache['A2']

    dZ1 = A1 - Y
    d10 = -1 * sum(dZ1, axis=1, keepdims=True) / m
    d13 = dot(dZ1, X.T[0].T) / m
    d12 = dot(dZ1, A2.T) / m
    d14 = dot(dZ1, X.T[1].T) / m

    dZ2 = w12 * dZ1 * (1 - power(A2, 2))
    d23 = dot(dZ2, X.T[0].T) / m
    d24 = dot(dZ2, X.T[1].T) / m
    d20 = -1 * sum(dZ2, axis=1, keepdims=True) / m

    grads = {'d10':d10, 'd20':d20,
             'd13':d13, 'd12':d12, 'd14':d14,
             'd23':d23, 'd24':d24}

    return grads

#------------------------------------------------------------
# Update the parameters
def update_parameters(parameters, grads, learning_rate):
    w10 = parameters['w10']
    w20 = parameters['w20']
    w13 = parameters['w13']
    w12 = parameters['w12']
    w14 = parameters['w14']
    w23 = parameters['w23']
    w24 = parameters['w24']

    d10 = grads['d10']
    d20 = grads['d20']
    d13 = grads['d13']
    d12 = grads['d12']
    d14 = grads['d14']
    d23 = grads['d23']
    d24 = grads['d24']

    w10 = w10 - learning_rate * d10
    w20 = w20 - learning_rate * d20
    w13 = w13 - learning_rate * d13
    w12 = w12 - learning_rate * d12
    w14 = w14 - learning_rate * d14
    w23 = w23 - learning_rate * d23
    w24 = w24 - learning_rate * d24

    parameters = {'w10':w10, 'w20':w20,
                  'w13':w13, 'w12':w12, 'w14':w14,
                  'w23':w23, 'w24':w24}

    return parameters

#------------------------------------------------------------
# Define the training
def train(X, Y, num_iterations, learning_rate, print_cost=False):
#    random.seed(3)

    lr = learning_rate

    parameters = initialize_parameters()

    XX,YY = X, Y #shuffledata(X, Y)
    costdim = []

    for i in range(0, num_iterations):
        A2, cache = forward_propagate(XX, parameters)
        cost = calculate_cost(A2, YY, parameters)
        grads = backward_propagate(parameters, cache, XX, YY)
        parameters = update_parameters(parameters, grads, lr)

        if print_cost and i % 50 == 0:
            printf('Cost after iteration:%i: %f'%(i, cost))
            costdim.append(cost[0][0])

            if cost < 0.01:
                break

#            XX, YY = shuffledata(X, Y)

    return parameters, costdim

#------------------------------------------------------------
parameter,costdim = train(xor_x0, xor_y0, 1000, 0.5, True)

A2, cache = forward_propagate(xor_x0, parameter)
printf(A2, xor_y)

plt.plot(arange(len(costdim))*50, costdim)
plt.xlabel("Step(10)")
plt.ylabel("Cost")
plt.grid(True)
plt.tight_layout()
plt.show()

#------------------------------------------------------------
#        END OF FILE : HWXOR1.PY
#============================================================
Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐