什么是余弦退火

余弦退火算法是一种优化算法,通常用于训练神经网络等模型。它的主要思想是动态调整学习率,使得模型可以更快地收敛并获得更好的性能。

余弦退火算法的原理比较简单,其核心在于使用余弦函数来动态地调整学习率。具体地,算法将学习率从初始值逐步降低到最小值,并在此后保持不变。通过这种方式,余弦退火算法可以在训练过程中缓慢降低学习率,从而避免在训练过程中出现梯度下降过快导致的震荡现象,进而提高模型的训练稳定性和泛化能力。

余弦退火的优点

余弦退火算法的优点主要包括以下几个方面:

更快的收敛速度:与传统的固定学习率相比,余弦退火算法可以更快地收敛,因为它可以在训练后期使用更小的学习率来微调模型参数。

更好的泛化能力:余弦退火算法可以避免训练过程中出现梯度下降过快导致的震荡现象,从而可以提高模型的训练稳定性和泛化能力。

更少的超参数调整:与其他优化算法相比,余弦退火算法只需要设置学习率、最大迭代次数和最小学习率等少量的超参数,因此可以更容易地调整和优化模型。

可以应用于各种模型:余弦退火算法不仅适用于神经网络,还可以应用于其他类型的模型,例如支持向量机、回归和聚类等。

总之,余弦退火算法是一种简单而有效的优化算法,可以帮助我们更快地训练模型并提高其性能。

核心代码

# Define the optimizer with cosine annealing learning rate scheduler
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs, eta_min=1e-6)

在此例中,我们选择了基于随机梯度下降(SGD)优化器的余弦退火算法。optim.SGD()方法创建了一个用于执行梯度下降的优化器对象。其中,model.parameters()表示将所有可训练参数传递给优化器,并将其初始化为指定的学习率learning_rate。momentum参数设置动量因子,用于加速收敛过程。

项目完整代码

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision
 import datasets, transforms

# Set the hyperparameters
batch_size = 128
learning_rate = 0.1
epochs = 50

# Load and preprocess the MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Define the model architecture
class NN(nn.Module):
    def __init__(self):
        super(NN, self).__init__()
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x = x.view(-1, 784)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = NN().cuda()

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs, eta_min=1e-6)

# Train the model
for epoch in range(epochs):
    # Set the model to training mode
    model.train()

    # Loop over the training set
    for i, (images, labels) in enumerate(train_loader):
        # Move the data to the GPU
        images = images.cuda()
        labels = labels.cuda()

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    # Update the learning rate
    scheduler.step()
    lr = scheduler.get_lr()[0]   # Get the current learning rate

    # Print the loss and learning rate at the end of each epoch
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}, Learning Rate: {lr:.8f}')

# Evaluate the model on the test set
model.eval()

with torch.no_grad():
    correct = 0
    total = 0

    for images, labels in test_loader:
        # Move the data to the GPU
        images = images.cuda()
        labels = labels.cuda()

        # Forward pass
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)

        # Compute the accuracy
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    accuracy = correct / total
    print(f'The accuracy of the model is {accuracy:.2f}')

Output

Epoch [1/50], Loss: 0.0548, Learning Rate: 0.07285561
Epoch [2/50], Loss: 0.1027, Learning Rate: 0.02929003
Epoch [3/50], Loss: 0.0285, Learning Rate: 0.00429028
Epoch [4/50], Loss: 0.0534, Learning Rate: 0.00000100
Epoch [5/50], Loss: 0.0237, Learning Rate: 0.02929003
Epoch [6/50], Loss: 0.0175, Learning Rate: 0.17070997
Epoch [7/50], Loss: 0.0317, Learning Rate: 0.14571022
Epoch [8/50], Loss: 0.0373, Learning Rate: 0.11715712
Epoch [9/50], Loss: 0.0273, Learning Rate: 0.07285561
Epoch [10/50], Loss: 0.0039, Learning Rate: 0.02929003
Epoch [11/50], Loss: 0.0049, Learning Rate: 0.00429028
Epoch [12/50], Loss: 0.0021, Learning Rate: 0.00000100
Epoch [13/50], Loss: 0.0009, Learning Rate: 0.02929003
Epoch [14/50], Loss: 0.0008, Learning Rate: 0.17070997
Epoch [15/50], Loss: 0.0023, Learning Rate: 0.14571022
Epoch [16/50], Loss: 0.0003, Learning Rate: 0.11715712
Epoch [17/50], Loss: 0.0634, Learning Rate: 0.07285561
Epoch [18/50], Loss: 0.0021, Learning Rate: 0.02929003
Epoch [19/50], Loss: 0.0002, Learning Rate: 0.00429028
Epoch [20/50], Loss: 0.0005, Learning Rate: 0.00000100
Epoch [21/50], Loss: 0.0006, Learning Rate: 0.02929003
Epoch [22/50], Loss: 0.0002, Learning Rate: 0.17070997
Epoch [23/50], Loss: 0.0004, Learning Rate: 0.14571022
Epoch [24/50], Loss: 0.0002, Learning Rate: 0.11715712
Epoch [25/50], Loss: 0.0004, Learning Rate: 0.07285561
Epoch [26/50], Loss: 0.0001, Learning Rate: 0.02929003
Epoch [27/50], Loss: 0.0003, Learning Rate: 0.00429028
Epoch [28/50], Loss: 0.0002, Learning Rate: 0.00000100
Epoch [29/50], Loss: 0.0001, Learning Rate: 0.02929003
Epoch [30/50], Loss: 0.0001, Learning Rate: 0.17070997
Epoch [31/50], Loss: 0.0005, Learning Rate: 0.14571022
Epoch [32/50], Loss: 0.0000, Learning Rate: 0.11715712
Epoch [33/50], Loss: 0.0002, Learning Rate: 0.07285561
Epoch [34/50], Loss: 0.0001, Learning Rate: 0.02929003
Epoch [35/50], Loss: 0.0001, Learning Rate: 0.00429028
Epoch [36/50], Loss: 0.0003, Learning Rate: 0.00000100
Epoch [37/50], Loss: 0.0001, Learning Rate: 0.02929003
Epoch [38/50], Loss: 0.0001, Learning Rate: 0.17070997
Epoch [39/50], Loss: 0.0001, Learning Rate: 0.14571022
Epoch [40/50], Loss: 0.0001, Learning Rate: 0.11715712
Epoch [41/50], Loss: 0.0001, Learning Rate: 0.07285561
Epoch [42/50], Loss: 0.0000, Learning Rate: 0.02929003
Epoch [43/50], Loss: 0.0000, Learning Rate: 0.00429028
Epoch [44/50], Loss: 0.0001, Learning Rate: 0.00000100
Epoch [45/50], Loss: 0.0002, Learning Rate: 0.02929003
Epoch [46/50], Loss: 0.0001, Learning Rate: 0.17070997
Epoch [47/50], Loss: 0.0000, Learning Rate: 0.14571022
Epoch [48/50], Loss: 0.0000, Learning Rate: 0.11715712
Epoch [49/50], Loss: 0.0001, Learning Rate: 0.07285561
Epoch [50/50], Loss: 0.0001, Learning Rate: 0.02929003
The accuracy of the model is 0.98
Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐