在Pytorch环境下，利用随机森林算法、MobileNet V1、ResNet-18、VGG16、DLA-34卷积神经网络对Fashion MNIST数据集进行训练和测试（文末附百度网盘分享链接）

本文在Pytorch环境下利用随机森林算法、MobileNet-V1、ResNet-18、VGG16以及DLA-34网络模型对Fashion MNIST数据集进行训练和测试。文章首先简要介绍了上述数据集以及五种模型的基本原理，然后基于Fashion MNIST数据集构建具体的算法和模型结构并进行图像分类任务，最后从测试精度、模型损失值、训练速度、参数量等方面对模型和数据集进行了分析。

月下逢759

1502人浏览 · 2023-11-18 13:29:10

月下逢759 · 2023-11-18 13:29:10 发布


算法名称	精确度(30轮训练)
随机森林	85.56%
Mobile Net V1	91.93%
ResNet 18	92.08%
VGG16	94.33%
DLA-34	94.90%

1. Fashion MNIST数据集简介

Fashion MNIST数据集主要应用于机器学习和深度学习中，其与MNIST手写数据集极为相似，下面简述该数据集的特点：

（1）类别：Fashion MNIST包含了10个不同的数据类别，分别是：上衣、裤子、套头衫、裙子、外套、凉鞋、衬衣、运动鞋、手提包和靴子。

（2）图像数量：Fashion MNIST数据集总共包含70000张图像，其中有60000张用于训练，10000张用于测试。每个服饰类别分别包含了6000张训练图像和1000张测试图像。

（3）图像尺寸：每张图像的尺寸为28*28像素的灰度图像（单通道）。

Fashion MNIST数据集中服饰的纹理、款式、深浅、形状不一，虽为灰度图像，但其中有容易混淆的类别，比如上衣和衬衣、运动鞋和靴子等，且灰度图像还导致其所能提供的特征信息减少。故而相比于传统的MNIST手写数字数据集，其对模型的要求更高，更加符合图像分类的需求，且需要下载的数据量较少，很适合学习CNN的小白，甚至一些图像分类的网络也会将这个数据集作为鉴定模型能力的标准之一。下图为Fashion MNIST数据集中的部分图像：

2. 分类算法设计和训练

2.1 随机森林算法

2.1.1 决策树简介

决策树常被应用在分类和回归问题中，其主要是以二叉树或者多叉树的形式来表达预测分析的模型，它的每个叶结点对应不同特征的决策结果，根结点、内部结点对应特征属性的判断规则。使用决策树的过程中本质上就是从根结点开始，根据待测试数据内部的特征属性做判断，然后按照其数值选择不同的分支结点，直至将结果输出到叶子结点。决策树既可以应用于分类任务，也可用于回归任务。通过空气中化学物质含量评判空气质量的决策树结构如下图所示：

决策树中最重要的概念是不纯度函数，常见的不纯度函数有信息熵和基尼系数。不纯度函数是决策树判断每个分类好坏程度的标准。信息熵越小，该数据集的纯度越高。通过信息熵，可以计算出信息增益，信息增益是以某特征来划分数据集前后熵的差值。信息增益越大，以该特征划分所得数据纯度也越大。

2.1.2 随机森林算法简介

集成算法分为Bagging和boosting两种，随机森林是Bagging算法的代表模型，其是一种有监督的学习算法。随机森林算法的原理如下所示：

给定一个样本为m的测试样本数据集，采用有放回的随机采样方式抽取一个样本，再将该样本放回原测试集中。当重复m次后，得到一个数据量同为m的测试样本数据子集。假设每个样本被抽到的概率为1/m,当采样次数趋于无穷大时，每个样本被抽到的概率为：

故而原始测试样本中大概有63.2%的样本出现在采样子集中。采样T次后，可以得到T个含有m个测试样本的训练集，将这T个训练集分别去训练一个决策树，最后将这些弱学习器的成果集成在一起，采取投票法对最后的结果进行预测。随机森林算法原理简单、部署方便、内存消耗量小、在回归问题和分类问题上均表现出了较好的性能。

2.1.3 利用随机森林算法进行模型训练和数据分析

读取数据集后将tensor数据类型转化为ndarry数据类型，创建随机森林模型。使用网格搜索方法（GridSearchCV）来寻找最优参数组合。将树的数量设置为100，200，300三种可能，树的最大深度设置为10，20，30三种可能，并使用交叉验证方法将数据划分为5份来评估模型的性能。

获取最优参数：

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from torchvision import datasets, transforms
import multiprocessing

def main():
    # 1. 数据准备
    transform = transforms.Compose([
        transforms.ToTensor()  # 将图像转换为张量
    ])

    # 下载Fashion MNIST数据集
    train_dataset = datasets.FashionMNIST(
        root="./dataset/mnist",
        train=True,
        transform=transform,
        download=True
    )

    test_dataset = datasets.FashionMNIST(
        root="./dataset/mnist",
        train=False,
        transform=transform,
        download=True
    )

    # 提取训练集和测试集数据和标签
    X_train = train_dataset.data.numpy().reshape(-1, 28 * 28)  # 将tensor数据类型转化为numpy数据类型
    y_train = train_dataset.targets.numpy()
    X_test = test_dataset.data.numpy().reshape(-1, 28 * 28)
    y_test = test_dataset.targets.numpy()

    # 2. 创建和拟合随机森林模型
    rf_classifier = RandomForestClassifier(random_state=0)

    # 3. 使用网格搜索来优化超参数
    param_grid = {
        'n_estimators': [100, 200, 300],  # 树的数量
        'max_depth': [10, 20, 30, None],  # 树的最大深度
    }

    grid_search = GridSearchCV(estimator=rf_classifier, param_grid=param_grid, cv=5, n_jobs=multiprocessing.cpu_count())
    grid_search.fit(X_train, y_train)

    # 打印最佳超参数组合
    print("Best Parameters: ", grid_search.best_params_)

    # 使用最佳模型
    best_rf_classifier = grid_search.best_estimator_
    best_rf_classifier.fit(X_train, y_train)

    # 4. 模型评估
    y_pred = best_rf_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print("Accuracy:", accuracy)

if __name__ == '__main__':
    main()

获取随机森林分类报告和特征下的贡献度：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml

# 加载Fashion MNIST数据集
fashion_mnist = fetch_openml(data_id=40996)

# 将图像数据和标签提取出来
X = fashion_mnist.data.astype('float32')
y = fashion_mnist.target.astype('int')

# 将数据分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 初始化随机森林分类器
rf_classifier = RandomForestClassifier(n_estimators=300, max_depth=30, random_state=0)

# 训练随机森林分类器
rf_classifier.fit(X_train, y_train)

# 使用模型进行预测
y_pred = rf_classifier.predict(X_test)  # 模型预测标签

# 计算准确度
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# 输出分类报告
print(classification_report(y_test, y_pred))

# 获取特征重要性
feature_importance = rf_classifier.feature_importances_

# 可视化特征重要性
plt.figure(figsize=(12, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.xlabel("Feature Index")
plt.ylabel("Feature Importance")
plt.title("Random Forest Feature Importance")
plt.savefig('./Random Forest Feature Importance.png')
plt.show()

最后生成随机森林模型在数据集上的分类报告如下所示：


类别	准确率	召回率	调和平均值	样本数量
0	0.83	0.86	0.84	1394
1	0.99	0.97	0.98	1402
2	0.80	0.83	0.81	1407
3	0.89	0.92	0.90	1449
4	0.77	0.85	0.81	1357
5	0.97	0.96	0.96	1449
6	0.74	0.60	0.67	1407
7	0.94	0.94	0.94	1359
8	0.96	0.95	0.97	1342
3	0.95	0.96	0.95	1434

调用主成分分析法对数据进行降维，输出测试集上的随机森林分类情况如下所示：

随机森林算法运行时间大概在5min左右（GTX1650），是目前五种分类算法中用时最短的方法，但其准确率也最低，不足90%。由上述运行结果可知，类别1、类别5、类别7、类别8、类别9具有较高的准确率和召回率，说明这些类别的预测比较准确且被正确识别的样本比较多。类别6的准确率相对较低，说明模型在预测类别6时出现了较多的错误预测，同时其召回率也相对较低，说明真实类别6的样本中被正确预测的比例较低。总的来说，上述随机森林模型在不同类别上表现不均衡。

2.2 MobileNet-V1算法

2.2.1 MobileNet-V1模型简介

MobileNet-V1是一种轻量级卷积神经网络架构，其可以在移动设备和嵌入式系统等资源受限的环境下实现高效的性能。MobileNet-V1 的核心思想是使用深度可分离卷积（Depthwise Separable Convolution）来减小模型的参数数量和计算复杂度，同时保持较高性能。深度可分离卷积主要分为以下两个步骤：

（1）深度卷积（Depthwise Convolution）：在输入的每个通道上执行卷积操作。

（2）逐点卷积（Pointwise Convolution）：对深度卷积的输出图像应用逐点卷积（1x1卷积）以减小通道的数量，实现对通道之间的信息进行线性组合。

深度可分离卷积的主要作用是通过减小参数数量、计算复杂度和提高计算效率，使得深度学习模型更加轻量和高效，可以在资源有限的设备上提供更好的性能。下表所示为MobileNet-V1的网络结构（假设数据输入为224*224*3的3通道RGB图像）：

表1-1 MobileNet-V1的网络结构

操作/步长,padding	特征提取器参数(kernel_size*channel)	输入大小(HWchannel)
Conv/s=2,p=3	3332	2242243
Conv dw/s=1,p=1	3332dw	11211232
Conv/s=1,p=0	1132*64	11211232
Conv dw/s=2,p=1	3364dw	11211264
Conv/s=1,p=0	11128	565664
Conv dw/s=1,p=1	33128dw	5656128
Conv/s=1,p=0	11128	5656128
Conv dw/s=2,p=1	33128dw	5656128
Conv/s=1,p=0	11256	5656128
Conv dw/s=1,p=1	33256dw	5656256
Conv/s=1,p=0	11256	5656256
Conv dw/s=2,p=1	33256dw	5656256
Conv/s=1,p=0	11512	5656256
5* Conv dw/s=1,p=1 5*Conv/s=1,p=0	33512dw 11512	1414512 1414512
Conv dw/s=2,p=1	33512dw	1414512
Conv/s=1,p=0	111024	77512
Conv dw/s=2,p=1	331024dw	771024
Conv/s=1,p=0	111024	771024
Avg Pool/s=1,p=0	Pool 7*7	771024
FC	1024*1000	771024
Softmax	Classfier	1000

MobileNet-V1 在计算机视觉领域得到了广泛应用，特别适用于嵌入式设备和移动设备上的实时图像处理和计算机视觉任务。它通过深度可分离卷积有效地减小了模型的体积和计算负担，同时保持了优秀的性能。

利用下式可计算卷积层输出特征大小：

2.2.2 MobileNet-V1模型训练和数据分析

为方便Fashion MNIST数据集更好的在MobileNet-V1模型中进行训练，添加1层padding=3*3卷积层，将其图像更改为3通道的32*32图像，并增加归一化和ReLU函数层，则该模型在28*28单通道图像输入下的网络结构及参数表为：

操作/步长,padding	特征提取器参数(kernel_size*channel)	输入大小(HWchannel)
Conv/s=1,p=3	333	28281
Conv/s=2,p=1	3332	32323
Conv dw/s=1,p=1	3332dw	161632
Conv/s=1,p=0	1132*64	161632
Conv dw/s=2,p=1	3364dw	161664
Conv/s=1,p=0	11128	8864
Conv dw/s=1,p=1	33128dw	88128
Conv/s=1,p=0	11128	88128
Conv dw/s=2,p=1	33128	88128
Conv/s=1,p=0	11256	44128
Conv dw/s=1	33256dw	44256
Conv/s=1,p=1	11256	44256
Conv dw/s=2,p=1	33256dw	44256
Conv/s=1,p=0	11512	22256
5* Conv dw/s=1,p=1 5*Conv/s=1,p=0	33512dw 11512*512	22512 22512
Conv dw/s=2,p=1	33512dw	11512
Conv/s=1,p=0	111024	111024
Conv dw/s=1,p=1	331024dw	111024
Conv/s=1,p=0	111024	111024
Avg Pool/s=1,p=0	Pool 1*1	111024
FC	1024*10	1110
Softmax	Classfier	10

由于MobileNet是一种轻量级卷积神经网络，对于Fashion MNIST这样的小型数据集，可以适当减少模型中防止过拟合的操作。因此，在数据预处理中仅添加4层padding、随机裁剪以及水平翻转以及归一化处理。为使得学习率随训练次数递减，设置训练轮数为30轮，初始学习率为0.1，轮数每增加15轮，学习率下降为原来的0.1倍。搭建MobileNet V1模型：


import torch
from torch import nn
import torch
import torch.nn as nn


def conv_bn(in_channel, out_channel, stride=1):
    return nn.Sequential(
        nn.Conv2d(in_channel, out_channel, 3, stride, 1),
        nn.BatchNorm2d(out_channel),
        nn.ReLU6(inplace=True)
    )

def conv_dsc(in_channel, out_channel, stride=1):
    return nn.Sequential(
        nn.Conv2d(in_channel, in_channel, 3, stride, 1, groups=in_channel),
        nn.Conv2d(in_channel, out_channel, 1, 1, 0),
        nn.BatchNorm2d(out_channel),
        nn.ReLU6(inplace=True),
    )


class MobileNetV1(nn.Module):
    def __init__(self):
        super(MobileNetV1, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 3, 3, 1, 3),  # 28*28*1->32*32*3
            nn.BatchNorm2d(3),
            nn.ReLU6(inplace=True),
            conv_bn(3, 32, 2),  # 32*32*3->16*16*32
            conv_dsc(32, 64, 1),  # 16*16*32->(DW)16*16*32->16*16*64
            conv_dsc(64, 128, 2),  # 16*16*64->(DW)8*8*64->8*8*128
            conv_dsc(128, 128, 1),  # 8*8*128->8*8*128->8*8*128
            conv_dsc(128, 256, 2),  # 8*8*128->(DW)4*4*128->4*4*256
            conv_dsc(256, 256, 1),  # 4*4*256->4*4*256->4*4*256
        )

        self.layer2 = nn.Sequential(
            conv_dsc(256, 512, 2),  # 4*4*256->(DW)2*2*256->2*2*512
            conv_dsc(512, 512, 1),  # 2*2*512->2*2*512->2*2*512
            conv_dsc(512, 512, 1),  # 2*2*512->2*2*512->2*2*512
            conv_dsc(512, 512, 1),  # 2*2*512->2*2*512->2*2*512
            conv_dsc(512, 512, 1),  # 2*2*512->2*2*512->2*2*512
            conv_dsc(512, 512, 1),  # 2*2*512->2*2*512->2*2*512
        )

        self.layer3 = nn.Sequential(
            conv_dsc(512, 1024, 2),  # 2*2*512->(DW)2*2*1024->1*1*1024
            conv_dsc(1024, 1024, 1),  # 1*1*1024->1*1*1024->1*1*1024
        )
        self.layer4 = nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)),
            nn.Flatten(),
            nn.Linear(1024, 512),
            nn.Dropout2d(p=0.5),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.Dropout2d(p=0.5),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.Dropout2d(p=0.5),
            nn.ReLU(),
        )
        self.layer5 = nn.Sequential(
            nn.Linear(256, 10)
        )
        self.model = nn.Sequential(
            self.layer1,
            self.layer2,
            self.layer3,
            self.layer4,
            self.layer5,
        )
    def forward(self, x):
        x = self.model(x)
        return x

利用上述模型进行训练：

import torchvision
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from model import*
import time
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

train_dataset_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(28, padding=4),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # 有gpu在gpu上跑，否则在cpu上跑
train_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=True, transform=train_dataset_transform, download=True)  # 下载训练数据并做转化
test_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=False, transform=torchvision.transforms.ToTensor(), download=True)  # 下载测试数据并做转化


train_data_size = len(train_data)  # 输出训练集和测试集数据长度
test_data_size = len(test_data)
print("训练集的长度为：{}".format(train_data_size))
print("测试集的长度为：{}".format(test_data_size))

train_dataloader = DataLoader(train_data, batch_size=64)  # 一批放入64张图片
test_dataloader = DataLoader(test_data, batch_size=64)

mnist_model = MobileNetV1()  # 卷积神经网络模型
mnist_model = mnist_model.to(device)  # 模型在gpu上跑

loss_fn = nn.CrossEntropyLoss()  # 定义损失函数为交叉熵
loss_fn = loss_fn.to(device)  # 损失函数在gpu上跑

learning_rate = 0.1  # 学习率
optimizer = torch.optim.SGD(mnist_model.parameters(), lr=learning_rate)  # 定义模型参数优化器
scheduler = StepLR(optimizer, step_size=15, gamma=0.1)  # 自适应改变学习率，每经过10轮训练，学习率减少90%

train_step = 0  # 定义训练次数
test_step = 0
epoch = 30  # 定义训练轮数
train_accuracy = np.zeros((epoch, 1), float)
test_accuracy = np.zeros((epoch, 1), float)  # 定义损失值和正确率
train_loss = np.zeros((epoch, 1), float)
test_loss = np.zeros((epoch, 1), float)
train_time = np.zeros((epoch, 1), float)
for i in range(epoch):  # i为训练轮数
    start_time = time.time()  # 开始时间
    print("第{}轮训练开始".format(i+1))
    mnist_model.train()  # 置为训练状态
    for data in train_dataloader:  # 取数据并令其在gpu上运行  一轮为一整个训练集经过一次网络
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)  # 将图片输入模型
        outputs = mnist_model(imgs)
        loss = loss_fn(outputs, targets)  # 计算模型输出值和目标值的交叉熵
        train_loss[i] += loss.item()  # 计算每一轮的总损失值
        accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
        train_accuracy[i] += accuracy.item()
        optimizer.zero_grad()  # 优化器梯度置零
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        train_step += 1
    end_time = time.time()
    train_accuracy[i] = train_accuracy[i] / train_data_size
    train_accuracy[i] = "{:.4f}".format(train_accuracy[i].item())
    train_time[i] = "{:.3f}".format(end_time-start_time)
    print("---------第{}轮训练的结果：所用时间：{}---------".format(i+1, train_time[i]))
    print("整体训练集上的Loss：{}，Accuracy：{}".format(train_loss[i], train_accuracy[i]))

    scheduler.step()  # 更新学习率

    mnist_model.eval()  # 置为测试状态
    with torch.no_grad():  # 测试集不进行训练
        for data in test_dataloader:  # 一轮为一整个测试集经过一次网络正向
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)  # 取数据并令其在gpu上运行
            outputs = mnist_model(imgs)  # 将图片输入模型
            loss = loss_fn(outputs, targets)  # 计算损失值
            test_loss[i] += loss.item()  # 计算每一轮的总损失值
            accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
            test_accuracy[i] += accuracy.item()
        test_accuracy[i] = test_accuracy[i]/test_data_size
        test_accuracy[i] = "{:.4f}".format(test_accuracy[i].item())
        print("整体测试集上的Loss：{}，Accuracy：{}".format(test_loss[i], test_accuracy[i]))  # 输出正确率和损失值
    test_step += 1

epo = np.arange(1, epoch+1, 1)
plt.figure(1)
plt.plot(epo, train_accuracy, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的正确率')
plt.plot(epo, test_accuracy, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的正确率')
plt.axis([0, 30, 0, 1])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('正确率')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0.65, 0.15))
plt.savefig('./Accuracy.png')
plt.show()

plt.figure(2)
plt.plot(epo, train_loss, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的损失值')
plt.plot(epo, test_loss, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的损失值')
plt.axis([0, 30, 0, 1000])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('总损失值')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0, 1.0))
plt.savefig('./Loss.png')
plt.show()

# 将训练轮数、test_accuracy和train_accuracy转换为一维数组
epo = epo.ravel()
test_accuracy = test_accuracy.ravel()
train_accuracy = train_accuracy.ravel()
train_time = train_time.ravel()

# 将训练轮数、test_accuracy和train_accuracy转换为DataFrame
result_df = pd.DataFrame({'训练轮数': epo, '测试集准确率': test_accuracy, '训练集准确率': train_accuracy, '所花时间': train_time})

# 创建一个Excel写入器
writer = pd.ExcelWriter('Result.xlsx')

# 将DataFrame写入Excel文件的指定sheet中
result_df.to_excel(writer, 'page_1', index=False)

# 保存Excel文件
writer.save()
writer.close()

得出模型在部分轮数下训练集和测试集上的准确率和训练耗时如下表所示：

数据集/训练轮数	1	5	10	15	20	25	30
训练集	0.5699	0.8366	0.8714	0.8884	0.9084	0.9122	0.9149
测试集	0.7541	0.8482	0.8709	0.8843	0.9128	0.9153	0.9193
训练耗时（s）	34.124	23.096	23.859	26.143	27.171	25.909	27.201

模型在训练集和测试集上的准确率随训练轮数变化图像如下所示：

模型在训练集和测试集上的总损失值随时间变化图像如下所示：

MobileNet-V1模型在Fashion MNIST数据集下的损失函数在训练早期急剧下降，从第10轮开始下降速度放缓，在第15轮时测试集上的损失值和准确率即将出现震荡，此时调整学习率使得损失值陡然下降，而后损失值缓慢下降，准确率缓慢上升。最后模型在测试集上的准确率达到91.93%，测试集准确率在训练过程中基本大于训练集准确率，没有出现过拟合情况。

每一轮训练所花时间的平均值为26.257s，相比于ResNet-18以及VGG-16耗时短得多，虽然准确率稍有下降，但将MobileNet-V1模型应用在Fashion MNIST数据集上不易过拟合且运行速度快。

2.3 ResNet-18模型训练结果与数据分析

2.3.1 ResNet-18模型简介

ResNet-18（Residual Network-18）是ResNet系列中的一个相对较小的模型。ResNet的核心思想是引入了残差块（Residual Blocks和Basic Blocks），以解决深度神经网络训练过程中的梯度消失和梯度爆炸问题。 ResNet网络主要包括ResNet-18、ResNet-34、ResNet-101、ResNet-152，主要区别在于各自使用的残差块的数量、类型不同。假设数据输入为224*224*3的3通道RGB图像，ResNet-18网络结构及参数图如下所示：

ResNet-18主要由多个残差块组成，每个残差块包含了多个卷积层。一个典型的残差块包括两个主要分支：
（1）主要分支（Main Path）：主要包括一系列用于提取特征的3*3卷积层。
（2）跳跃连接（Skip Connection）：这是一个绕过主要分支的连接，直接将输入特征主要分支的输出相加，其有助于保留图像的原始特征，避免信息损失。

残差块主要是通过将主分支所学习到的特征与原输入特征相加，从而得到残差块的输出。如果输入和输出具有相同的维度，残差块可以直接将原始输入添加到主要分支的输出。如果维度不同，需要在跳跃连接部分引入卷积层来调整维度。ResNet-18是ResNet系列中的一个相对较小的模型，它包含18个卷积层，4个残差块。ResNet-50、ResNet-101等ResNet模型包含更多的卷积层和残差块，可以处理更复杂的任务和更深的网络架构。

2.3.2 ResNet-18模型训练和数据分析

为方便Fashion MNIST数据集更好的在ResNet-18模型中进行训练，添加一层padding=3的3*3卷积层，将其图像更改为3通道的32*32图像其在28*28单通道图像下的网络结构及参数如下图所示：

在训练过程中，发现模型在训练集上的准确率会慢慢高于在测试集上的准确率，原因在于随着训练次数的增加，模型越来越依赖于训练集数据，导致在测试集上的分类效果不佳。因此，在训练前将图像增加6层padding、随机裁剪至28*28大小、水平翻转以及归一化处理。设置训练轮数为30轮，初始学习率为0.1，轮数每增加15轮，学习率下降为原来的0.1。

搭建ResNet-18模型如下所示：

import torchvision
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from pandas import DataFrame
from model import*
import time
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import pandas as pd

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

train_dataset_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(28, padding=6),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # 有gpu在gpu上跑，否则在cpu上跑
train_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=True, transform=train_dataset_transform, download=True)  # 下载训练数据并做转化
test_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=False, transform=torchvision.transforms.ToTensor(), download=True)  # 下载测试数据并做转化


train_data_size = len(train_data)  # 输出训练集和测试集数据长度
test_data_size = len(test_data)
print("训练集的长度为：{}".format(train_data_size))
print("测试集的长度为：{}".format(test_data_size))

train_dataloader = DataLoader(train_data, batch_size=64)  # 一批放入64张图片
test_dataloader = DataLoader(test_data, batch_size=64)

mnist_model = ResNet18()  # 卷积神经网络模型
mnist_model = mnist_model.to(device)  # 模型在gpu上跑

loss_fn = nn.CrossEntropyLoss()  # 定义损失函数为交叉熵
loss_fn = loss_fn.to(device)  # 损失函数在gpu上跑

learning_rate = 0.1  # 学习率
optimizer = torch.optim.SGD(mnist_model.parameters(), lr=learning_rate)  # 定义模型参数优化器
scheduler = StepLR(optimizer, step_size=15, gamma=0.1)  # 自适应改变学习率，每经过10轮训练，学习率减少90%

train_step = 0  # 定义训练次数
test_step = 0
epoch = 30  # 定义训练轮数
train_accuracy = np.zeros((epoch, 1), float)
test_accuracy = np.zeros((epoch, 1), float)  # 定义损失值和正确率
train_loss = np.zeros((epoch, 1), float)
test_loss = np.zeros((epoch, 1), float)
train_time = np.zeros((epoch, 1), float)
for i in range(epoch):  # i为训练轮数
    start_time = time.time()  # 开始时间
    print("第{}轮训练开始".format(i+1))
    mnist_model.train()  # 置为训练状态
    for data in train_dataloader:  # 取数据并令其在gpu上运行  一轮为一整个训练集经过一次网络
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)  # 将图片输入模型
        outputs = mnist_model(imgs)
        loss = loss_fn(outputs, targets)  # 计算模型输出值和目标值的交叉熵
        train_loss[i] += loss.item()  # 计算每一轮的总损失值
        accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
        train_accuracy[i] += accuracy.item()
        optimizer.zero_grad()  # 优化器梯度置零
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        train_step += 1
    end_time = time.time()
    train_accuracy[i] = train_accuracy[i] / train_data_size
    train_accuracy[i] = "{:.4f}".format(train_accuracy[i].item())
    train_time[i] = "{:.3f}".format(end_time - start_time)
    print("---------第{}轮训练的结果：所用时间：{}---------".format(i+1, train_time[i]))
    print("整体训练集上的Loss：{}，Accuracy：{}".format(train_loss[i], train_accuracy[i]))

    scheduler.step()  # 更新学习率

    mnist_model.eval()  # 置为测试状态
    with torch.no_grad():  # 测试集不进行训练
        for data in test_dataloader:  # 一轮为一整个测试集经过一次网络正向
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)  # 取数据并令其在gpu上运行
            outputs = mnist_model(imgs)  # 将图片输入模型
            loss = loss_fn(outputs, targets)  # 计算损失值
            test_loss[i] += loss.item()  # 计算每一轮的总损失值
            accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
            test_accuracy[i] += accuracy.item()
        test_accuracy[i] = test_accuracy[i]/test_data_size
        test_accuracy[i] = "{:.4f}".format(test_accuracy[i].item())
        print("整体测试集上的Loss：{}，Accuracy：{}".format(test_loss[i], test_accuracy[i]))  # 输出正确率和损失值
    test_step += 1

epo = np.arange(1, epoch+1, 1)
plt.figure(1)
plt.plot(epo, train_accuracy, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的正确率')
plt.plot(epo, test_accuracy, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的正确率')
plt.axis([0, 30, 0, 1])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('正确率')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0.65, 0.15))
plt.savefig('./Accuracy.png')
plt.show()

plt.figure(2)
plt.plot(epo, train_loss, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的损失值')
plt.plot(epo, test_loss, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的损失值')
plt.axis([0, 30, 0, 1000])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('总损失值')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0, 1.0))
plt.savefig('./Loss.png')
plt.show()

# 将训练轮数、test_accuracy和train_accuracy转换为一维数组
epo = epo.ravel()
test_accuracy = test_accuracy.ravel()
train_accuracy = train_accuracy.ravel()
train_time = train_time.ravel()

# 将训练轮数、test_accuracy和train_accuracy转换为DataFrame
result_df = pd.DataFrame({'训练轮数': epo, '测试集准确率': test_accuracy, '训练集准确率': train_accuracy, '所花时间': train_time})

# 创建一个Excel写入器
writer = pd.ExcelWriter('Result.xlsx')

# 将DataFrame写入Excel文件的指定sheet中
result_df.to_excel(writer, 'page_1', index=False)

# 保存Excel文件
writer.save()
writer.close()

其中，BasicBlock块为：

import torch
import torch.nn as nn


class BasicBlock(nn.Module):
    """搭建BasicBlock模块"""
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size=3, padding=1, stride=stride)
        self.bn1 = nn.BatchNorm2d(out_channel)    # BN层, BN层放在conv层和relu层中间使用
        self.conv2 = nn.Conv2d(out_channel, out_channel, kernel_size=3, padding=1, stride=1)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.Downsample = nn.Conv2d(in_channel, out_channel, kernel_size=1, padding=0, stride=1)
        if stride != 1:    # 保证原始输入X的size与主分支卷积后的输出size叠加时维度相同
            self.Downsample = nn.Conv2d(in_channel, out_channel, kernel_size=1, padding=0, stride=2)

    # 前向传播
    def forward(self, X):
        identity = X
        Y = self.relu(self.bn1(self.conv1(X)))  # 8*8*64->8*8*64->8*8*64
        Y = self.bn2(self.conv2(Y))
        identity = self.bn2(self.Downsample(X))

        return self.relu(Y + identity)

利用上述模型进行训练：

import torchvision
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from pandas import DataFrame
from model import*
import time
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import pandas as pd

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

train_dataset_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(28, padding=6),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # 有gpu在gpu上跑，否则在cpu上跑
train_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=True, transform=train_dataset_transform, download=True)  # 下载训练数据并做转化
test_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=False, transform=torchvision.transforms.ToTensor(), download=True)  # 下载测试数据并做转化


train_data_size = len(train_data)  # 输出训练集和测试集数据长度
test_data_size = len(test_data)
print("训练集的长度为：{}".format(train_data_size))
print("测试集的长度为：{}".format(test_data_size))

train_dataloader = DataLoader(train_data, batch_size=64)  # 一批放入64张图片
test_dataloader = DataLoader(test_data, batch_size=64)

mnist_model = ResNet18()  # 卷积神经网络模型
mnist_model = mnist_model.to(device)  # 模型在gpu上跑

loss_fn = nn.CrossEntropyLoss()  # 定义损失函数为交叉熵
loss_fn = loss_fn.to(device)  # 损失函数在gpu上跑

learning_rate = 0.1  # 学习率
optimizer = torch.optim.SGD(mnist_model.parameters(), lr=learning_rate)  # 定义模型参数优化器
scheduler = StepLR(optimizer, step_size=15, gamma=0.1)  # 自适应改变学习率，每经过10轮训练，学习率减少90%

train_step = 0  # 定义训练次数
test_step = 0
epoch = 30  # 定义训练轮数
train_accuracy = np.zeros((epoch, 1), float)
test_accuracy = np.zeros((epoch, 1), float)  # 定义损失值和正确率
train_loss = np.zeros((epoch, 1), float)
test_loss = np.zeros((epoch, 1), float)
train_time = np.zeros((epoch, 1), float)
for i in range(epoch):  # i为训练轮数
    start_time = time.time()  # 开始时间
    print("第{}轮训练开始".format(i+1))
    mnist_model.train()  # 置为训练状态
    for data in train_dataloader:  # 取数据并令其在gpu上运行  一轮为一整个训练集经过一次网络
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)  # 将图片输入模型
        outputs = mnist_model(imgs)
        loss = loss_fn(outputs, targets)  # 计算模型输出值和目标值的交叉熵
        train_loss[i] += loss.item()  # 计算每一轮的总损失值
        accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
        train_accuracy[i] += accuracy.item()
        optimizer.zero_grad()  # 优化器梯度置零
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        train_step += 1
    end_time = time.time()
    train_accuracy[i] = train_accuracy[i] / train_data_size
    train_accuracy[i] = "{:.4f}".format(train_accuracy[i].item())
    train_time[i] = "{:.3f}".format(end_time - start_time)
    print("---------第{}轮训练的结果：所用时间：{}---------".format(i+1, train_time[i]))
    print("整体训练集上的Loss：{}，Accuracy：{}".format(train_loss[i], train_accuracy[i]))

    scheduler.step()  # 更新学习率

    mnist_model.eval()  # 置为测试状态
    with torch.no_grad():  # 测试集不进行训练
        for data in test_dataloader:  # 一轮为一整个测试集经过一次网络正向
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)  # 取数据并令其在gpu上运行
            outputs = mnist_model(imgs)  # 将图片输入模型
            loss = loss_fn(outputs, targets)  # 计算损失值
            test_loss[i] += loss.item()  # 计算每一轮的总损失值
            accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
            test_accuracy[i] += accuracy.item()
        test_accuracy[i] = test_accuracy[i]/test_data_size
        test_accuracy[i] = "{:.4f}".format(test_accuracy[i].item())
        print("整体测试集上的Loss：{}，Accuracy：{}".format(test_loss[i], test_accuracy[i]))  # 输出正确率和损失值
    test_step += 1

epo = np.arange(1, epoch+1, 1)
plt.figure(1)
plt.plot(epo, train_accuracy, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的正确率')
plt.plot(epo, test_accuracy, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的正确率')
plt.axis([0, 30, 0, 1])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('正确率')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0.65, 0.15))
plt.savefig('./Accuracy.png')
plt.show()

plt.figure(2)
plt.plot(epo, train_loss, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的损失值')
plt.plot(epo, test_loss, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的损失值')
plt.axis([0, 30, 0, 1000])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('总损失值')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0, 1.0))
plt.savefig('./Loss.png')
plt.show()

# 将训练轮数、test_accuracy和train_accuracy转换为一维数组
epo = epo.ravel()
test_accuracy = test_accuracy.ravel()
train_accuracy = train_accuracy.ravel()
train_time = train_time.ravel()

# 将训练轮数、test_accuracy和train_accuracy转换为DataFrame
result_df = pd.DataFrame({'训练轮数': epo, '测试集准确率': test_accuracy, '训练集准确率': train_accuracy, '所花时间': train_time})

# 创建一个Excel写入器
writer = pd.ExcelWriter('Result.xlsx')

# 将DataFrame写入Excel文件的指定sheet中
result_df.to_excel(writer, 'page_1', index=False)

# 保存Excel文件
writer.save()
writer.close()

ResNet-18模型在部分轮数下训练集和测试集上的准确率以及训练耗时如下表所示：

数据集/训练轮数	1	5	10	15	20	25	30
训练集	0.7003	0.8663	0.8928	0.8915	0.9245	0.9276	0.9315
测试集	0.7504	0.8748	0.8898	0.8915	0.9199	0.9197	0.9208
训练耗时（s）	33.444	31.056	29.554	29.742	29.517	29.559	29.460

模型在训练集和测试集上的准确率随训练轮数变化图像如下所示：

模型在训练集和测试集上的总损失值随训练轮数变化图像如下所示：

ResNet-18模型在Fashion MNIST数据集下的损失函数在训练早期急剧下降，从第10轮开始下降速度放缓，在第15轮时调整学习率使得损失值陡然下降，而后损失值缓慢下降，准确率缓慢上升。最后模型在测试集上的准确率达到92.08%。由于ResNet-18网络模型深度较深，测试集准确率在训练后期均小于训练集准确率，出现微弱的过拟合，但对于测试集准确度影响不大。

每一轮训练所花时间的平均值为32.0945s，相比于MobileNet-V1耗时稍长，30轮训练所耗时间多6min，但准确率仅仅提升了0.15%，提升效果相较于耗费时间来说不够显著。

2.4 VGG 16算法

2.4.1 VGG16 模型简介

VGG（Visual Geometry Group）是一种深度卷积神经网络架构，由牛津大学的研究人员于2014年提出。VGG主要是通过增加网络的深度，使得其在图像分类和识别任务上取得了更好的性能。VGG主要包括VGG11、VGG13、VGG16、VGG19几种卷积神经网络，上述四种VGG网络的差异主要在于各自的网络深度不同。下图为几种VGG16的网络结构图（假设数据输入为224*224*3的3通道RGB图像）：

操作/步长,padding	特征提取器参数(kernel_size*channel)	输入大小(HWchannel)
Conv/s=1,p=1	3364	2242243
MaxPool/s=2,p=0	2*2	22422464
Conv/s=1,p=1	33128	11211264
Conv/s=1,p=1	33128	112112128
MaxPool/s=2,p=0	2*2	112112128
Conv /s=1,p=1	33256	5656128
Conv/s=1,p=1	33256	5656256
Conv/s=1,p=1	33256	5656256
MaxPool/s=2,p=0	2*2	5656256
Conv /s=1,p=1	33512	2828256
Conv/s=1p=1	33512	2828512
Conv/s=1p=1	33512	2828512
MaxPool/s=2,p=0	2*2	2828512
Conv /s=1,p=1	33512	1414512
Conv/s=1,p=1	33512	1414512
Conv/s=1,p=1	33512	1414512
MaxPool/s=2,p=0	2*2	1414512
FC	4096	77512
FC	4096	114096
FC	1000	114096
Softmax	Classfier	114096

在实际应用中，VGG16和VGG19卷积神经网络使用更为广泛，且在使用VGG19训练Fashion-Mnist数据集时的效果与VGG16相差不大，故本文采用VGG16对数据集进行训练和测试

VGG网络采用了相对较小的卷积核（通常为3x3），并使用深层卷积层的堆叠来构建网络。VGG卷积神经网络中包括卷积层、池化层和全连接层。VGG的卷积层由一系列3x3大小的卷积核组成，VGG模型的主要特点：

（1）VGG采用了相对简单的卷积核大小和池化操作，使得网络结构更易理解和调整；

（2）VGG通过增加网络的深度来提高性能，网络参数多、计算成本高；

（3）VGG在大型数据集上表现良好，但对于小数据集可能会过拟合。

2.4.1 VGG16 模型训练和数据分析

为方便Fashion MNIST数据集更好的在VGG-16模型中进行训练，添加一层padding=3的卷积层，将其图像更改为3通道的32*32图像其在28*28单通道图像下的网络结构及参数如下表所示：

操作/步长,padding	特征提取器参数(kernel_size*channel)	输入大小(HWchannel)
Conv/s=1,p=3	333	28281
Conv/s=1,p=1	3364	323264
MaxPool/s=2,p=0	2*2	323264
Conv/s=1,p=1	33128	161664
Conv/s=1,p=1	33128	1616128
MaxPool/s=2,p=0	2*2	1616128
Conv /s=1,p=1	33256	88128
Conv/s=1,p=1	33256	88256
Conv/s=1,p=1	33256	88256
MaxPool/s=2,p=0	2*2	88256
Conv /s=1,p=1	33512	44256
Conv/s=1p=1	33512	44512
Conv/s=1p=1	33512	44512
MaxPool/s=2,p=0	2*2	44512
Conv /s=1,p=1	33512	22512
Conv/s=1,p=1	33512	22512
Conv/s=1,p=1	33512	22512
MaxPool/s=2,p=0	2*2	22512
FC	512*256	11512
FC	256*10	11256
Softmax	Classfier	10

VGG-16模型适用于多数据分类任务，其卷积结构简单但层数多，在Fashin-MINST这种较为简单的数据集上容易过拟合。因此，在训练前将图像增加6层padding、随机裁剪至28*28大小、并进行水平翻转。设置训练轮数为30轮，初始学习率为0.1，轮数每增加15轮，学习率下降为原来的0.1倍。

搭建VGG 16模型如下所示：

# 搭建vgg16模型
from torch import nn


class mnist_Model(nn.Module):
    def __init__(self):
        super(mnist_Model, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 64, 3, 1, 3),  # 28*28*1->32*32*64
            nn.BatchNorm2d(num_features=64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, 1, 1),  # 32*32*64->32*32*64
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)  # 32*32*64->16*16*64
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(64, 128, 3, 1, 1),  # 16*16*64->16*16*128
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3, 1, 1),  # 16*16*128->16*16*128
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)  # 16*16*128->8*8*128
        )
        self.layer3 = nn.Sequential(
            nn.Conv2d(128, 256, 3, 1, 1),  # 8*8*128->8*8*256
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3, 1, 1),  # 8*8*256->8*8*256
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3, 1, 1),  # 8*8*256->8*8*256
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)  # 8*8*256->4*4*256
        )
        self.layer4 = nn.Sequential(
            nn.Conv2d(256, 512, 3, 1, 1),  # 4*4*256->4*4*512
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, 1, 1),  # 4*4*512->4*4*512
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, 1, 1),  # 4*4*512->4*4*512
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)  # 4*4*512->2*2*512
        )
        self.layer5 = nn.Sequential(
            nn.Conv2d(512, 512, 3, 1, 1),  # 2*2*512->2*2*512
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, 1, 1),  # 2*2*512->2*2*512
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, 1, 1),  # 2*2*512->2*2*512
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)  # 2*2*512->1*1*512
        )
        self.layer6 = nn.Sequential(
            nn.Flatten(),
            nn.Linear(512, 256),
            nn.Linear(256, 256),
            nn.Dropout2d(p=0.5),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.Dropout2d(p=0.5),
            nn.ReLU(),
        )
        self.layer7 = nn.Sequential(
            nn.Linear(256, 10)
        )
        self.model = nn.Sequential(
            self.layer1,
            self.layer2,
            self.layer3,
            self.layer4,
            self.layer5,
            self.layer6,
            self.layer7,
        )

    def forward(self, x):
        x = self.model(x)
        return x

利用上述模型进行训练：

import torchvision
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from pandas import DataFrame
from model import*
import time
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import pandas as pd

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

show_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=True, download=True)
for i in range(64):
    plt.subplot(8, 8, i+1)
    img, target = show_data[i]

    plt.imshow(img)
    plt.axis('off')
plt.savefig('./data.png')
plt.show()

train_dataset_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(28, padding=4),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # 有gpu在gpu上跑，否则在cpu上跑
train_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=True, transform=train_dataset_transform, download=True)  # 下载训练数据并做转化
test_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=False, transform=torchvision.transforms.ToTensor(), download=True)  # 下载测试数据并做转化


train_data_size = len(train_data)  # 输出训练集和测试集数据长度
test_data_size = len(test_data)
print("训练集的长度为：{}".format(train_data_size))
print("测试集的长度为：{}".format(test_data_size))

train_dataloader = DataLoader(train_data, batch_size=64)  # 一批放入64张图片
test_dataloader = DataLoader(test_data, batch_size=64)

mnist_model = mnist_Model()  # 卷积神经网络模型
mnist_model = mnist_model.to(device)  # 模型在gpu上跑

loss_fn = nn.CrossEntropyLoss()  # 定义损失函数为交叉熵
loss_fn = loss_fn.to(device)  # 损失函数在gpu上跑

learning_rate = 0.1  # 学习率
optimizer = torch.optim.SGD(mnist_model.parameters(), lr=learning_rate)  # 定义模型参数优化器
scheduler = StepLR(optimizer, step_size=15, gamma=0.1)  # 自适应改变学习率，每经过10轮训练，学习率减少90%

train_step = 0  # 定义训练次数
test_step = 0
epoch = 30  # 定义训练轮数
train_accuracy = np.zeros((epoch, 1), float)
test_accuracy = np.zeros((epoch, 1), float)  # 定义损失值和正确率
train_loss = np.zeros((epoch, 1), float)
test_loss = np.zeros((epoch, 1), float)
train_time = np.zeros((epoch, 1), float)
for i in range(epoch):  # i为训练轮数
    start_time = time.time()  # 开始时间
    print("第{}轮训练开始".format(i+1))
    mnist_model.train()  # 置为训练状态
    for data in train_dataloader:  # 取数据并令其在gpu上运行  一轮为一整个训练集经过一次网络
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)  # 将图片输入模型
        outputs = mnist_model(imgs)
        loss = loss_fn(outputs, targets)  # 计算模型输出值和目标值的交叉熵
        train_loss[i] += loss.item()  # 计算每一轮的总损失值
        accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
        train_accuracy[i] += accuracy.item()
        optimizer.zero_grad()  # 优化器梯度置零
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        train_step += 1
    end_time = time.time()
    print(type(train_accuracy[i]))
    train_accuracy[i] = train_accuracy[i]/train_data_size
    train_accuracy[i] = "{:.4f}".format(train_accuracy[i].item())
    train_time[i] = "{:.3f}".format(end_time - start_time)
    print("---------第{}轮训练的结果：所用时间：{}---------".format(i+1, train_time[i]))
    print("整体训练集上的Loss：{}，Accuracy：{}".format(train_loss[i], train_accuracy[i]))

    scheduler.step()  # 更新学习率

    mnist_model.eval()  # 置为测试状态
    with torch.no_grad():  # 测试集不进行训练
        for data in test_dataloader:  # 一轮为一整个测试集经过一次网络正向
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)  # 取数据并令其在gpu上运行
            outputs = mnist_model(imgs)  # 将图片输入模型
            loss = loss_fn(outputs, targets)  # 计算损失值
            test_loss[i] += loss.item()  # 计算每一轮的总损失值
            accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
            test_accuracy[i] += accuracy.item()
        test_accuracy[i] = test_accuracy[i]/test_data_size
        test_accuracy[i] = "{:.4f}".format(test_accuracy[i].item())
        print("整体测试集上的Loss：{}，Accuracy：{}".format(test_loss[i], test_accuracy[i]))  # 输出正确率和损失值
    test_step += 1

epo = np.arange(1, epoch+1, 1)
plt.figure(1)
plt.plot(epo, train_accuracy, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的正确率')
plt.plot(epo, test_accuracy, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的正确率')
plt.axis([0, 30, 0, 1])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('正确率')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0.65, 0.15))
plt.savefig('./Accuracy.png')
plt.show()

plt.figure(2)
plt.plot(epo, train_loss, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的损失值')
plt.plot(epo, test_loss, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的损失值')
plt.axis([0, 30, 0, 1000])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('总损失值')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0, 1.0))
plt.savefig('./Loss.png')
plt.show()

# 将训练轮数、test_accuracy和train_accuracy转换为一维数组
epo = epo.ravel()
test_accuracy = test_accuracy.ravel()
train_accuracy = train_accuracy.ravel()
train_time = train_time.ravel()

# 将训练轮数、test_accuracy和train_accuracy转换为DataFrame
result_df = pd.DataFrame({'训练轮数': epo, '测试集准确率': test_accuracy, '训练集准确率': train_accuracy, '所花时间': train_time})

# 创建一个Excel写入器
writer = pd.ExcelWriter('Result.xlsx')

# 将DataFrame写入Excel文件的指定sheet中
result_df.to_excel(writer, 'page_1', index=False)

# 保存Excel文件
writer.save()
writer.close()

模型在部分轮数下训练集和测试集上的准确率以及训练耗时如下表所示：

数据集/训练轮数	1	5	10	15	20	25	30
训练集	0.7095	0.8868	0.9171	0.9304	0.9527	0.9549	0.9587
测试集	0.8249	0.8994	0.9210	0.9277	0.9413	0.9423	0.9433
训练耗时（s）	80.844	61.353	61.106	61.038	60.738	60.926	60.956

模型在训练集和测试集上的准确率随训练轮数变化图像如下所示：

模型在训练集和测试集上的总损失值随训练轮数变化图像如下所示：

VGG-16模型在Fashion MNIST数据集下的损失函数在训练早期急剧下降，在测试集上的准确率先急剧上升后有小幅下降，在第11轮和第15轮均有急剧下降后又马上恢复，从第10轮开始下降速度放缓，在第15轮时调整学习率使得损失值陡然下降，而后损失值缓慢下降，准确率缓慢上升。最后模型在测试集上的准确率达到94.33%。由于VGG-16的网络模型深度更深，且在模型后端拥有多个全连接层，导致模型过度依赖于训练数据集出现过拟合情况，但对于测试集准确度影响不大。

每一轮训练所花时间的平均值为60.8092s，相比于前述两种模型所花时间长的多，30轮训练所耗时间比ResNet-18多15min，但准确率提升了2.25%，提升效果较为显著但所耗时间太长。

2.5 DLA-34算法

2.5.1 DLA-34算法简介

DLA（Deep Layer Aggregation）是一种深层深度学习模型架构，旨在处理多尺度特征信息，其在计算机视觉领域应用广泛。DLA模型包括多个IDA（Iterative Deep Aggregation迭代深层聚合）和HAD（Hierachical Deep Aggregation分层深层聚合）的堆叠，IDA根据底层网络由浅到深的提取特征的同时进行跨分辨率和跨尺度的特征融合，将不同分辨率的特征融合到同一尺度上，以此进一步融合深层次的特征。IDA虽然可以有效的融合不同阶段的特征，但是并没有对不同阶段内部的特征进行融合，所以利用HAD将每一层特征图聚合成不同层级的模块，然后再继续融合各个模块。

Name	Block	Stage1	Stage2	Stage3	Stage4	Stage5	Stage6
DLA-34	BASIC	16	32	1-64	2-128	2-256	1-512
DLA-46-C	Bottleneck	16	32	1-64	2-64	2-128	1-256
DLA-60	Bottleneck	16	32	1-128	2-256	3-512	1-1024
DLA-102	Bottleneck	16	32	1-128	3-256	4-512	1-1024
DLA-169	Bottleneck	16	32	2-128	3-256	5-512	1-1024
DLA-X-46-C	Split	16	32	1-64	2-64	2-128	1-256
DLA-X-60-C	Split	16	32	1-64	2-64	3-128	1-256
DLA-X-60	Split	16	32	1-128	2-256	3-512	1-1024
DLA-X-102	Split	16	32	1-128	3-256	4-512	1-1024

下图所示为IDA结构和HAD结构：

IDA结构图

HDA结构图

图中，每个Block包含多个Layer层，一个卷积神经网络模型包含多个Stage。DLA网络通过多尺度特征聚合可以更好地融合图像特征和空间特征以提高模型性能，从而使预测更加准确。

DLA-34模型是一种深度网络特征融合方法，其在训练过程中耗费时间长，且容易过拟合。训练前将图像增加6层padding、随机裁剪至28*28大小、并进行水平翻转，每次从训练集中随机取数据。设置训练轮数为30轮，初始学习率为0.1，轮数每增加1轮，学习率下降为原来的0.9倍。并在全连接层增加一层Dropout以减少过拟合情况的发生。

搭建DLA-34模型如下所示：


import torch
import torch.nn as nn
import torch.nn.functional as F

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Root(nn.Module):  # Root(2*64,64)
    def __init__(self, in_channels, out_channels, kernel_size=1):
        super(Root, self).__init__()
        self.conv = nn.Conv2d(
            in_channels, out_channels, kernel_size,
            stride=1, padding=(kernel_size - 1) // 2)
        self.bn = nn.BatchNorm2d(out_channels)

    def forward(self, xs):
        x = torch.cat(xs, 1)  # 将xs中的两个张量进行水平拼接
        out = F.relu(self.bn(self.conv(x)))
        return out


class Tree(nn.Module):  # Tree(block,  64, 128, level=2, stride=2)
    def __init__(self, block, in_channels, out_channels, level=1, stride=1):
        super(Tree, self).__init__()
        self.root = Root(2*out_channels, out_channels)
        if level == 1:
            self.left_tree = block(in_channels, out_channels, stride=stride)
            self.right_tree = block(out_channels, out_channels, stride=1)
        else:
            self.left_tree = Tree(block, in_channels,
                                  out_channels, level=level-1, stride=stride)
            self.right_tree = Tree(block, out_channels,
                                   out_channels, level=level-1, stride=1)

    def forward(self, x):
        out1 = self.left_tree(x)
        out2 = self.right_tree(out1)
        out = self.root([out1, out2])
        return out


class SimpleDLA(nn.Module):
    def __init__(self, block=BasicBlock, num_classes=10):
        super(SimpleDLA, self).__init__()
        self.base = nn.Sequential(
            nn.Conv2d(1, 3, kernel_size=3, stride=1, padding=3),  # 28*28*1->32*32*3
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),  # 32*32*3->32*32*16
            nn.BatchNorm2d(16),
            nn.ReLU(True)
        )

        self.layer1 = nn.Sequential(
            nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1),  # 32*32*16->32*32*16
            nn.BatchNorm2d(16),
            nn.ReLU(True)
        )

        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),  # 32*32*16->32*32*32
            nn.BatchNorm2d(32),
            nn.ReLU(True)
        )

        self.layer3 = Tree(block,  32,  64, level=1, stride=1)
        self.layer4 = Tree(block,  64, 128, level=2, stride=2)
        self.layer5 = Tree(block, 128, 256, level=2, stride=2)
        self.layer6 = Tree(block, 256, 512, level=1, stride=2)
        self.linear = nn.Sequential(
            nn.Linear(512, 256),
            nn.Dropout2d(p=0.5),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        out = self.base(x)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = self.layer6(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out


def test():
    net = SimpleDLA()
    print(net)
    x = torch.randn(1, 1, 28, 28)
    y = net(x)
    print(y.size())


if __name__ == '__main__':
    test()

利用上述模型进行训练和测试：

import torchvision
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from model import*
import time
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

train_dataset_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(28, padding=6),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
])
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # 有gpu在gpu上跑，否则在cpu上跑
train_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=True, transform=train_dataset_transform, download=True)  # 下载训练数据并做转化
test_data = torchvision.datasets.FashionMNIST("./dataset/mnist", train=False, transform=torchvision.transforms.ToTensor(), download=True)  # 下载测试数据并做转化


train_data_size = len(train_data)  # 输出训练集和测试集数据长度
test_data_size = len(test_data)
print("训练集的长度为：{}".format(train_data_size))
print("测试集的长度为：{}".format(test_data_size))

train_dataloader = DataLoader(train_data, batch_size=64, shuffle=True)  # 一批放入64张图片
test_dataloader = DataLoader(test_data, batch_size=64)

mnist_model = SimpleDLA()  # 卷积神经网络模型
mnist_model = mnist_model.to(device)  # 模型在gpu上跑

loss_fn = nn.CrossEntropyLoss()  # 定义损失函数为交叉熵
loss_fn = loss_fn.to(device)  # 损失函数在gpu上跑

learning_rate = 0.1  # 学习率
optimizer = torch.optim.SGD(mnist_model.parameters(), lr=learning_rate)  # 定义模型参数优化器
scheduler = StepLR(optimizer, step_size=1, gamma=0.9)  # 自适应改变学习率，每经过10轮训练，学习率减少90%

train_step = 0  # 定义训练次数
test_step = 0
epoch = 40  # 定义训练轮数
train_accuracy = np.zeros((epoch, 1), float)
test_accuracy = np.zeros((epoch, 1), float)  # 定义损失值和正确率
train_loss = np.zeros((epoch, 1), float)
test_loss = np.zeros((epoch, 1), float)
train_time = np.zeros((epoch, 1), float)
for i in range(epoch):  # i为训练轮数
    start_time = time.time()  # 开始时间
    print("第{}轮训练开始".format(i+1))
    mnist_model.train()  # 置为训练状态
    for data in train_dataloader:  # 取数据并令其在gpu上运行  一轮为一整个训练集经过一次网络
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)  # 将图片输入模型
        outputs = mnist_model(imgs)
        loss = loss_fn(outputs, targets)  # 计算模型输出值和目标值的交叉熵
        train_loss[i] += loss.item()  # 计算每一轮的总损失值
        accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
        train_accuracy[i] += accuracy.item()
        optimizer.zero_grad()  # 优化器梯度置零
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        train_step += 1
    end_time = time.time()
    train_accuracy[i] = train_accuracy[i] / train_data_size
    train_accuracy[i] = "{:.4f}".format(train_accuracy[i].item())
    train_time[i] = "{:.3f}".format(end_time-start_time)
    print("---------第{}轮训练的结果：所用时间：{}---------".format(i+1, train_time[i]))
    print("整体训练集上的Loss：{}，Accuracy：{}".format(train_loss[i], train_accuracy[i]))

    scheduler.step()  # 更新学习率

    mnist_model.eval()  # 置为测试状态
    with torch.no_grad():  # 测试集不进行训练
        for data in test_dataloader:  # 一轮为一整个测试集经过一次网络正向
            imgs, targets = data
            imgs = imgs.to(device)
            targets = targets.to(device)  # 取数据并令其在gpu上运行
            outputs = mnist_model(imgs)  # 将图片输入模型
            loss = loss_fn(outputs, targets)  # 计算损失值
            test_loss[i] += loss.item()  # 计算每一轮的总损失值
            accuracy = (outputs.argmax(1) == targets).sum()  # 输出10个数中的最大值是否对应targets，是则为1，否则为0，最后总和
            test_accuracy[i] += accuracy.item()
        test_accuracy[i] = test_accuracy[i]/test_data_size
        test_accuracy[i] = "{:.4f}".format(test_accuracy[i].item())
        print("整体测试集上的Loss：{}，Accuracy：{}".format(test_loss[i], test_accuracy[i]))  # 输出正确率和损失值
    test_step += 1

epo = np.arange(1, epoch+1, 1)
plt.figure(1)
plt.plot(epo, train_accuracy, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的正确率')
plt.plot(epo, test_accuracy, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的正确率')
plt.axis([0, 40, 0, 1])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('正确率')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0.65, 0.15))
plt.savefig('./Accuracy.png')
plt.show()

plt.figure(2)
plt.plot(epo, train_loss, linestyle='--', marker='o', color='k', markersize=5, label='训练集上的损失值')
plt.plot(epo, test_loss, linestyle='-', marker='*', color='r', markersize=5, label='测试集上的损失值')
plt.axis([0, 40, 0, 1000])
plt.xlabel('训练轮数')  # x轴上的名字
plt.ylabel('总损失值')  # y轴上的名字
plt.legend(loc='upper left', bbox_to_anchor=(0, 1.0))
plt.savefig('./Loss.png')
plt.show()

# 将训练轮数、test_accuracy和train_accuracy转换为一维数组
epo = epo.ravel()
test_accuracy = test_accuracy.ravel()
train_accuracy = train_accuracy.ravel()
train_time = train_time.ravel()

# 将训练轮数、test_accuracy和train_accuracy转换为DataFrame
result_df = pd.DataFrame({'训练轮数': epo, '测试集准确率': test_accuracy, '训练集准确率': train_accuracy, '所花时间': train_time})

# 创建一个Excel写入器
writer = pd.ExcelWriter('Result.xlsx')

# 将DataFrame写入Excel文件的指定sheet中
result_df.to_excel(writer, 'page_1', index=False)

# 保存Excel文件
writer.save()
writer.close()

2.5.2 DLA-34模型训练和数据分析

模型在部分轮数下训练集和测试集上的准确率以及训练耗时如下表所示：

数据集/训练轮数	1	5	10	15	20	25	30	35	40
训练集	0.7873	0.9141	0.9328	0.9435	0.9616	0.9664	0.9700	0.9711	0.9734
测试集	0.8313	0.9150	0.9217	0.9302	0.9460	0.9461	0.9466	0.9478	0.9490
训练耗时（s）	187.50	181.41	181.48	182.56	182.00	182.46	181.64	181.35	181.43

模型在训练集和测试集上的准确率随训练轮数变化图像如下所示：

模型在训练集和测试集上的总损失值随训练轮数变化图像如下所示：

图3-9 DLA-34在训练集和测试集上的损失值

DLA-34模型在Fashion MNIST数据集下的损失函数在训练早期急剧下降，由于在DLA-34模型中，学习率每一轮都在减少，故而损失值基本稳步下降，训练集准确率稳步上升，测试集准确率在部分轮数有小幅下降，原因在于该轮训练出来的模型更依赖于训练集，或者测试集与训练集数据存在偏差。最后模型在测试集上的准确率达到94.90 %。尽管对数据进行预处理时增加了多个padding并进行随机裁剪，且在模型中增加了Dropout层，训练后期测试集准确率仍低于训练集准确率1%~2%左右，模型参数多、层数深，出现轻微过拟合。

每一轮训练所花时间的平均值为181.396s，相比于前述三种模型所花时间更长，平均每轮训练所耗时间比VGG16多2min，准确率提升了0.33%，提升效果较为显著但所耗时间太长。

http://链接：https://pan.baidu.com/s/1Gh3GMx1cHOyBlaJwHC0uGA 提取码：1234

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

自动化提示词生成工具盘点

腾讯云开发者社区

怎么用电脑兼职赚钱，普通人可做的6个副业项目（非常详细）零基础入门到精通，收藏这篇就够了

腾讯云开发者社区

AI PPT免费使用技巧盘点：如何快速制作专业PPT？

腾讯云开发者社区

所有评论(0)

查看更多评论

月下逢759

@qq_55834618

已为社区贡献1条内容

操作/步长,padding	特征提取器参数(kernel_size*channel)	输入大小(HWchannel)
Conv/s=2,p=3	3332	2242243
Conv dw/s=1,p=1	3332dw	11211232
Conv/s=1,p=0	1132*64	11211232
Conv dw/s=2,p=1	3364dw	11211264
Conv/s=1,p=0	11128	565664
Conv dw/s=1,p=1	33128dw	5656128
Conv/s=1,p=0	11128	5656128
Conv dw/s=2,p=1	33128dw	5656128
Conv/s=1,p=0	11256	5656128
Conv dw/s=1,p=1	33256dw	5656256
Conv/s=1,p=0	11256	5656256
Conv dw/s=2,p=1	33256dw	5656256
Conv/s=1,p=0	11512	5656256
5* Conv dw/s=1,p=1 5*Conv/s=1,p=0	33512dw 11512	1414512 1414512
Conv dw/s=2,p=1	33512dw	1414512
Conv/s=1,p=0	111024	77512
Conv dw/s=2,p=1	331024dw	771024
Conv/s=1,p=0	111024	771024
Avg Pool/s=1,p=0	Pool 7*7	771024
FC	1024*1000	771024
Softmax	Classfier	1000