
  本篇博客主要介绍一下Seq2Seq模型,以及模型训练后的部署,使用的深度学习框架为TensorFlow2.1,GPU为Tesla P100(白嫖Kaggle的),由于网站有时间限制,只训练了两个epoch就先部署了哈,所以机器人目前还很沙雕。


1. 模型介绍

   S e q 2 S e q Seq2Seq Seq2Seq的全称是 S e q u e n c e Sequence Sequence t o to to S e q u e n c e Sequence Sequence,也就是我们常说的序列到序列模型,它是基于 E n c o d e r − D e c o d e r Encoder-Decoder EncoderDecoder框架的 R N N ( R e c u r r e n t RNN(Recurrent RNN(Recurrent N e u r a l Neural Neural N e t w o r k , 循环神经网络 ) Network,循环神经网络) Network,循环神经网络)变种。 S e q 2 S e q Seq2Seq Seq2Seq引入 E n c o d e r − D e c o d e r Encoder-Decoder EncoderDecoder框架,提高了神经网络对长文本信息的提取能力,取得了比单纯使用 L S T M ( L o n g LSTM(Long LSTM(Long S h o r t − T e r m Short-Term ShortTerm M e m o r y , 长短期记忆神经网络 ) Memory,长短期记忆神经网络) Memory,长短期记忆神经网络)更好的效果。 S e q 2 S e q Seq2Seq Seq2Seq中有两个很重要的概念,一个就是上面提到的 E n c o d e r − D e c o d e r Encoder-Decoder EncoderDecoder框架,另一个就是 A t t e n t i o n Attention Attention机制。这里简单介绍一下这两个概念。

1.1 Encoder-Decoder框架

   E n c o d e r − D e c o d e r Encoder-Decoder EncoderDecoder又称为编码器-解码器模型,顾名思义,它有两部分组成,即编码器和解码器。它是一种处理输入、输出长短不一的多对多文本预测问题的框架,其提供了有效的文本特征提取、输出预测的机制。

  • 直译式解码:按照编码器的费那事进行逆操作得到的预测文本
  • 循环式解码:将编码器输出的编码向量作为第一时刻的输入,然后将得到的输出作为下一个时刻的输入,依次进行循环解码
  • 增强循环式解码:在循环式解码的基础上,每一时刻增加一个编码器输出的编码向量作为输入
  • 注意力机制解码:在增强式循环解码的基础上增加注意力机制,这样可以有效地训练解码器在繁多的输入中重点关注某些有效特征信息,以增加解码器的特征获取能力,进而得到更好的解码效果。

1.2 Attention机制

  虽然 E n c o d e r − D e c o d e r Encoder-Decoder EncoderDecoder结构的模型在机器翻译、语音识别以及文本生成等诸多领域均取得了非常不错的效果,但同时也存在着不足之处。编码器将输入的序列编码成一个固定长度的向量,再由解码器将其解码,得到输出序列。但个固定长度的向量所具有的表征能力是有限的,解码器又受限于这个固定长度的向量,当输入的文本序列较长时,编码器很难将所有的重要信息都编码到这个定长的向量中,从而使得模型的输出结果大大折扣。
   A t t e n t i o n Attention Attention机制有效解决了输入长序列信息时真实含义难以获取的问题。在进行长文本序列处理的任务中,影响当前时刻状态的信息可能隐藏在前面的时刻里,根据马尔可夫假设,这些信息有可能就会被忽略掉。比如,在“我快饿死了,今天搬了一天的砖,我要大吃一顿”这句话中,我们知道“我要大吃一顿”是因为“我快饿死了”,但是基于马尔可夫假设,“今天搬了一天的砖”“我要大吃一顿”在时序上离得更近,相比于“我快饿死了”“今天搬了一天的砖”“我要大吃一顿”的影响力更强,但是在真实的 N L P ( N a t u r a l NLP(Natural NLP(Natural L a n g u a g e Language Language P r o c e s s i n g , 自然语言处理 ) Processing,自然语言处理) Processing,自然语言处理)中不是这样的。从这个例子中可以看出,神经网络模型没有办法很好地准确获取倒装时序的语言信息,要解决这个问题就需要经过训练自动建立起“我要大吃一顿”“我快饿死了”的关联关系,这就是 A t t e n t i o n Attention Attention机制,即注意力机制。

1.3 代码实现

class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, enc_units, batch_size):
        super(Encoder, self).__init__()

        self.batch_size = batch_size
        self.enc_units = enc_units
        self.embedding = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim)
        self.gru = tf.keras.layers.GRU(units=self.enc_units, recurrent_initializer='glorot_uniform',
                                       return_sequences=True, return_state=True)

    def call(self, x, hidden):
        # 此处添加模型调用的代码(处理输入并返回输出)
        x = self.embedding(x)
        output, state = self.gru(inputs=x, initial_state=hidden)
        return output, state

    def initialize_hidden_state(self):
        return tf.zeros(shape=(self.batch_size, self.enc_units))

class BahdanauAttention(tf.keras.Model):
    """Bahdanau Attention"""
    def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units=units)
        self.W2 = tf.keras.layers.Dense(units=units)
        self.V = tf.keras.layers.Dense(units=1)

    def call(self, query, values):
        # query为Encoder最后一个时间步的隐状态(hidden), shape为(batch_size, hidden_size)
        # values为Encoder部分的输出,即每个时间步的隐状态,shape为(batch_size, max_length, hidden_size)
        # 为方便后续计算,需将query的shape转为(batch_size, 1, hidden_size)
        # 给query增加一个维度
        query = tf.expand_dims(input=query, axis=1)

        # 计算score(相似度), 使用MLP网络,即再引入一个神经网络来专门计算score
        # score的shape为(batch_size, max_length, 1)
        score = self.V(
            inputs=tf.nn.tanh(self.W1(inputs=query) + self.W2(inputs=values))

        # 计算attention_weights
        # 计算attention_weights的shape为(batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(logits=score, axis=1)

        # 计算context vector
        # context vector的shape为(batch_size, max_length, hidden_size)
        context_vector = attention_weights * values
        # 加权求和
        # 求和之后的shape为(batch_size, hidden_size)
        context_vector = tf.reduce_sum(input_tensor=context_vector, axis=1)

        return context_vector, attention_weights

class Decoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, dec_units, batch_size):
        super(Decoder, self).__init__()

        self.batch_size = batch_size
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim)
        self.gru = tf.keras.layers.GRU(units=self.dec_units, recurrent_initializer='glorot_uniform',
                                       return_sequences=True, return_state=True)
        self.fc = tf.keras.layers.Dense(units=vocab_size)
        self.attention = BahdanauAttention(units=self.dec_units)

    def call(self, x, hidden, enc_output):
        # 获取context vector和attention weights
        context_vector, attention_weights = self.attention(hidden, enc_output)

        # 编码之后x的shape为(batch_size, 1, embedding_dim)
        x = self.embedding(inputs=x)

        # 将context_vector与输入x进行拼接
        # 拼接后的shape为(batch_size, 1, embedding_dim + hidden_size)
        # 这里的hidden_size即context_vector向量的长度
        x = tf.concat(values=[tf.expand_dims(input=context_vector, axis=1), x], axis=-1)

        # 拼接后输入GRU网络
        output, state = self.gru(inputs=x)
        # print("Decoder output shape: {}".format(output.shape))
        # print("Decoder state shape: {}".format(state.shape))

        # (batch_size, 1, hidden_size) ==> (batch_size, hidden_size)
        output = tf.reshape(tensor=output, shape=(-1, output.shape[2]))

        # x的shape为(batch_size, vocab_size)
        x = self.fc(inputs=output)

        return x, state, attention_weights

  我也是这学期才开始入手TensorFlow2,以前用的都是TensorFlow 1.13.1,代码不明白的地方可以查看《简单粗暴 TensorFlow 2》文档

2. 安装依赖库

  • 安装TensorFlow 2.1
	pip3 install tensorflow==2.1.0
  • 安装jieba
	pip3 install jieba


3. 模型部署








# views.py
# 导入模型的接口
from tencent.chatRobot import predict

input_info = recMsg.Content.decode('utf-8')
	content = predict(sentence=input_info)
except Exception as err:
	content = '小悠没理解主银的意思~'
replyMsg = TextMsg(toUser, fromUser, content)


# chatRobot.py
# -*- coding: utf-8 -*-
# @Time    : 2021/1/4 22:47
# @Author  : XiaYouRan
# @Email   : youran.xia@foxmail.com
# @File    : chatRobot.py
# @Software: PyCharm

import tensorflow as tf
import jieba
import os

def preprocess_sentence(sentence):
    :param sentence:
    sentence = '<start> ' + sentence + ' <end>'
    return sentence

def max_length(tensor):
    :param tensor:
    return max([len(t) for t in tensor])

def tokenize(sentences):
    :param sentence:
    # 初始化分词器,并生成词典
    sentence_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')

    # 利用字典将文本数据转为id
    # 也是二维的
    tensor = sentence_tokenizer.texts_to_sequences(texts=sentences)

    # 将数据填充成统一长度
    # 默认统一为最长句子长度
    # 将长为nb_samples的序列(标量序列)转化为形如(nb_samples,nb_timesteps) 2D numpy array
    tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor, maxlen=30, padding='post')

    return tensor, sentence_tokenizer

def load_dataset(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
        q = ''
        a = ''
        qa_pairs = []
        # len(lines) 总行数
        for i in range(len(lines)):
            if i % 3 == 0:
                q = ' '.join(jieba.cut(lines[i].strip()))
            elif i % 3 == 1:
                a = ' '.join(jieba.cut(lines[i].strip()))
                # 问句与答句进行组合
                pair = [preprocess_sentence(q), preprocess_sentence(a)]

    # zip 拆解
    q_sentences, a_sentences = zip(*qa_pairs)

    # question数据集(id)及其分类器词汇表
    q_tensor, q_tokenizer = tokenize(q_sentences)
    # answer数据集(id)及其分类器词汇表
    a_tensor, a_tokenizer = tokenize(a_sentences)

    return q_tensor, a_tensor, q_tokenizer, a_tokenizer

class Encoder(tf.keras.Model):

class BahdanauAttention(tf.keras.Model):
    """Bahdanau Attention"""

class Decoder(tf.keras.Model):

# 使用Adam优化器
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

def predict(sentence):
    # 加载模型
    checkpoint = tf.train.Checkpoint(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),

    sentence = ' '.join(jieba.cut(sentence.strip()))
    sentence = preprocess_sentence(sentence=sentence)

    inputs = [q_tokenizer.word_index[i] for i in sentence.split(' ')]
    inputs = tf.keras.preprocessing.sequence.pad_sequences(sequences=[inputs], maxlen=30, padding='post')
    inputs = tf.convert_to_tensor(value=inputs)

    result = ''

    hidden = [tf.zeros(shape=(1, units))]
    enc_out, enc_hidden = encoder(inputs, hidden)

    dec_hidden = enc_hidden
    dec_input = tf.expand_dims(input=[a_tokenizer.word_index['<start>']], axis=0)

    for t in range(q_tesor_length):
        predictions, dec_hidden, attention_weights = decoder(dec_input, dec_hidden, enc_out)

        predicted_id = tf.argmax(predictions[0]).numpy()
        result += a_tokenizer.index_word[predicted_id] + ' '

        if a_tokenizer.index_word[predicted_id] == '<end>':

        dec_input = tf.expand_dims(input=[predicted_id], axis=0)

    # print("Q: %s" % sentence[8:-6].replace(' ', ''))
    # print("A: {}".format(result[:-6].replace(' ', '')))
    # print("A: {}".format(result.replace(' ', '')))

    return result[:-6].replace(' ', '')

file_path = os.path.dirname(__file__)
corpus_path = os.path.join(file_path, 'dataset/corpus.txt')

checkpoint_dir = os.path.join(file_path, 'model/train_checkpoints')

q_tensor, a_tensor, q_tokenizer, a_tokenizer = load_dataset(file_path=corpus_path)

q_tesor_length = max_length(q_tensor)
a_tesor_length = max_length(a_tensor)

buffer_size = len(q_tensor)
batch_size = 32
steps_per_epoch = len(q_tensor) // batch_size
embedding_dim = 128
units = 256

# q_tokenizer.word_index 字典类型(word, id)
vocab_q_size = len(q_tokenizer.word_index) + 1
vocab_a_size = len(a_tokenizer.word_index) + 1

# 模型初始化
encoder = Encoder(vocab_size=vocab_q_size, embedding_dim=embedding_dim, enc_units=units, batch_size=batch_size)
attention_layer = BahdanauAttention(units=10)
decoder = Decoder(vocab_size=vocab_a_size, embedding_dim=embedding_dim, dec_units=units, batch_size=batch_size)

if __name__ == '__main__':
    input_sentence = "Start chatting..."
    while input_sentence != "stop":
        input_sentence = input()
        except Exception as err:
            print('Test model error info: ', err)

4. 测试






