深度学习：基于Keras框架，使用神经网络模型对葡萄酒类型进行预测分析

在本专栏中不仅包含一些适合初学者的最新机器学习项目，每个项目都处理一组不同的问题，包括监督和无监督学习、分类、回归和聚类，而且涉及创建深度学习模型、处理非结构化数据以及指导复杂的模型，如卷积神经网络、门控递归单元、大型语言模型和强化学习模型。本文旨在使用 Keras 等深度学习库，并熟悉神经网络的基础，您可以从免费提供的UCI机器学习存储库中找到葡萄酒质量数据集。

清波渚

1994人浏览 · 2024-04-22 23:20:03

清波渚 · 2024-04-22 23:20:03 发布

在这里插入图片描述

前言

系列专栏:【深度学习：算法项目实战】✨︎
涉及医疗健康、财经金融、商业零售、食品饮料、运动健身、交通运输、环境科学、社交媒体以及文本和图像处理等诸多领域，讨论了各种复杂的深度神经网络思想，如卷积神经网络、循环神经网络、生成对抗网络、门控循环单元、长短期记忆、自然语言处理、深度强化学习、大型语言模型和迁移学习。

本文旨在使用Keras等深度学习库，并熟悉神经网络的基础。

1. 数据集介绍

您可以从免费提供的UCI机器学习存储库中找到葡萄酒质量数据集。数据集由数据中包含的 12 个变量组成。其中少数如下——

固定酸度： 总酸度分为两组：挥发性酸和非挥发性或固定酸。此变量的值在数据集中以 gm/dm3 表示。
挥发性酸度： 挥发性酸度是葡萄酒变成醋的过程。在该数据集中，挥发性酸度以 gm/dm3 表示。
柠檬酸： 柠檬酸是葡萄酒中的固定酸之一。它在数据集中以 g/dm3 表示。
残糖： 残糖是发酵停止或停止后剩余的糖。它在数据集中以 g/dm3 表示。
氯化物： 它可能是葡萄酒咸味的重要因素。此变量的值在数据集中以 gm/dm3 表示。
游离二氧化硫： 它是添加到葡萄酒中的二氧化硫的一部分。此变量的值在数据集中以 gm/dm3 表示。
总二氧化硫： 它是结合二氧化硫和游离二氧化硫的总和。此变量的值在数据集中以 gm/dm3 表示。

1.1 获取数据

# Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Read in white wine data
white = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep =';')

# Read in red wine data
red = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep =';')

1.2 红酒前五行数据

# First rows of `red`
red.head()

输出
在这里插入图片描述

1.3 白酒末五行数据

# Last rows of `white`
white.tail()

输出
在这里插入图片描述
取五行红酒的数据样本

# Take a sample of five rows of `red`
red.sample(5)

输出
在这里插入图片描述

1.4 数据描述

# Describe `white`
white.describe()

输出
在这里插入图片描述

1.5 检查红酒中的空值

# Double check for null values in `red`
pd.isnull(red)

输出
在这里插入图片描述

2. 酒精的分布（可视化探索性分析）

2.1 创建直方图

# Create Histogram
fig, ax = plt.subplots(1, 2)

ax[0].hist(red.alcohol, 10, facecolor ='red',
			alpha = 0.5, label ="Red wine")

ax[1].hist(white.alcohol, 10, facecolor ='white',
		ec ="black", lw = 0.5, alpha = 0.5,
		label ="White wine")

fig.subplots_adjust(left = 0, right = 1, bottom = 0, 
			top = 0.5, hspace = 0.05, wspace = 1)

ax[0].set_ylim([0, 1000])
ax[0].set_xlabel("Alcohol in % Vol")
ax[0].set_ylabel("Frequency")
ax[1].set_ylim([0, 1000])
ax[1].set_xlabel("Alcohol in % Vol")
ax[1].set_ylabel("Frequency")

fig.suptitle("Distribution of Alcohol in % Vol")
plt.show()

输出
在这里插入图片描述

2.2 拆分数据集来进行训练和验证

# Add `type` column to `red` with price one
red['type'] = 1

# Add `type` column to `white` with price zero
white['type'] = 0

# Concat `white` with `red`
wines = pd.concat([red,white], ignore_index = True)

# Import `train_test_split` from `sklearn.model_selection`
from sklearn.model_selection import train_test_split
X = wines.iloc[:, 0:11]
y = np.ravel(wines.type)

# Splitting the data set for training and validating 
X_train, X_test, y_train, y_test = train_test_split(
		X, y, test_size = 0.34, random_state = 45)

3. 创建神经网络模型

3.1 定义网络结构

# Initialize the constructor
model = Sequential()
model.add(Input(shape = (11, )))
model.add(Dense(16, activation ='relu'))
model.add(Dense(8, activation ='relu'))
model.add(Dense(1, activation ='sigmoid'))

# Model config
model.get_config()

# List all weight tensors
model.get_weights()

3.2 定义损失与优化器

model.compile(loss ='binary_crossentropy', optimizer ='adam', metrics =['accuracy'])

model.summary()

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 16)                  │             192 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 8)                   │             136 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 1)                   │               9 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 337 (1.32 KB)
 Trainable params: 337 (1.32 KB)
 Non-trainable params: 0 (0.00 B)

4. 模型训练

# Training Model
history = model.fit(X_train, y_train, epochs = 50, batch_size = 10, verbose = 1, validation_data = (X_test, y_test))

4.1 绘制Loss曲线

history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['loss']], "#BDE2E2", label='Training loss')
plt.plot(history_df.loc[:, ['val_loss']],"#C2C4E2", label='Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc="best")

plt.show()

Loss曲线

4.2 绘制Accuracy曲线

history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['accuracy']], "#BDE2E2", label='Training accuracy')
plt.plot(history_df.loc[:, ['val_accuracy']], "#C2C4E2", label='Validation accuracy')

plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

请添加图片描述

5. 模型评估

计算测试集的预测结果

# Predicting the test set results
predictions_prob = model.predict(X_test)
predictions = (predictions_prob > 0.5)

5.1 绘制混淆矩阵

cf_matrix = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cf_matrix,)
disp.plot()

请添加图片描述

5.2 绘制ROC-AUC曲线

# 计算FPR, TPR, thresholds
fpr, tpr, thresholds = roc_curve(y_test, predictions_prob)
# 计算AUC值
roc_auc = auc(fpr, tpr)
display = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc,
                          estimator_name='ROC-AUC')
display.plot()

请添加图片描述

腾讯云开发者社区

腾讯云面向开发者汇聚海量精品云计算使用和开发经验，营造开放的云计算技术生态圈。

更多推荐

终极指南：Flink SQL连接器版本管理从混乱到有序的升级之路

Apache Flink作为流处理领域的佼佼者，其SQL连接器的版本管理一直是开发者面临的核心挑战。本文将系统讲解Flink SQL连接器版本管理的最佳实践，帮助你轻松应对版本兼容性问题，实现从混乱到有序的升级之旅。## 连接器版本管理的常见痛点 😫在Flink应用开发中，连接器版本管理常常让开发者头疼不已。不同版本的连接器可能导致各种兼容性问题，例如API变更、功能差异甚至运行时错误。

腾讯云开发者社区

Elasticsearch复杂数据类型终极指南：从入门到精通

Elasticsearch作为功能强大的搜索引擎，支持多种复杂数据类型，让开发者能够灵活处理各种结构化和非结构化数据。本文将带你全面了解Elasticsearch中的复杂数据类型，从基础概念到实际应用，助你轻松掌握数据建模的核心技巧。## 内部对象：构建层级化数据结构在Elasticsearch中，对象类型（Object）是最基础的复杂数据类型之一，用于表示具有嵌套关系的数据。例如，我们可

腾讯云开发者社区

如何快速搭建Neon无服务器PostgreSQL：面向初学者的完整指南

Neon是一款革命性的无服务器PostgreSQL解决方案，它通过分离存储和计算层，实现了自动扩缩容、类代码式数据库分支以及零级扩展能力。本指南将帮助你从零开始搭建Neon开发环境，体验这款创新数据库的强大功能。## 准备工作：环境要求与依赖项在开始搭建Neon环境前，请确保你的系统满足以下要求：- Linux操作系统（推荐Ubuntu 20.04+或Debian 11+）- Git