14.14. Kaggle 上的狗品种识别(ImageNet Dogs)¶ 在 SageMaker Studio Lab 中打开 Notebook
在本节中,我们将在Kaggle上练习解决狗品种识别问题。本次竞赛的网址是 https://www.kaggle.com/c/dog-breed-identification
在这次比赛中,我们将识别120个不同品种的狗。事实上,这个比赛的数据集是ImageNet数据集的一个子集。与 14.13节 中CIFAR-10数据集中的图像不同,ImageNet数据集中的图像在不同维度上都更高更宽。 图 14.14.1 显示了竞赛网页上的信息。你需要一个Kaggle账户来提交你的结果。

图 14.14.1 狗品种识别竞赛网站。点击“Data”标签可以获取比赛数据集。¶
import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l
import os
from mxnet import autograd, gluon, init, npx
from mxnet.gluon import nn
from d2l import mxnet as d2l
npx.set_np()
14.14.1. 获取和组织数据集¶
竞赛数据集分为训练集和测试集,分别包含10222张和10357张三通道(彩色)的JPEG图像。在训练数据集中,有120个品种的狗,如拉布拉多、贵宾犬、腊肠犬、萨摩耶、哈士奇、吉娃娃和约克夏梗。
14.14.1.1. 下载数据集¶
登录Kaggle后,你可以点击 图 14.14.1 所示的竞赛网页上的“Data”标签,然后点击“Download All”按钮下载数据集。在 ../data
目录解压下载的文件后,你会在以下路径中找到整个数据集:
../data/dog-breed-identification/labels.csv
../data/dog-breed-identification/sample_submission.csv
../data/dog-breed-identification/train
../data/dog-breed-identification/test
你可能已经注意到,上述结构与 14.13节 中的CIFAR-10竞赛相似,其中 train/
和 test/
文件夹分别包含训练和测试的狗图像,labels.csv
包含训练图像的标签。同样,为了更容易上手,我们提供了上述数据集的一个小样本:train_valid_test_tiny.zip
。如果你要使用Kaggle竞赛的完整数据集,你需要将下面的 demo
变量改为 False
。
#@save
d2l.DATA_HUB['dog_tiny'] = (d2l.DATA_URL + 'kaggle_dog_tiny.zip',
'0cb91d09b814ecdc07b50f31f8dcad3e81d6a86d')
# If you use the full dataset downloaded for the Kaggle competition, change
# the variable below to `False`
demo = True
if demo:
data_dir = d2l.download_extract('dog_tiny')
else:
data_dir = os.path.join('..', 'data', 'dog-breed-identification')
Downloading ../data/kaggle_dog_tiny.zip from http://d2l-data.s3-accelerate.amazonaws.com/kaggle_dog_tiny.zip...
#@save
d2l.DATA_HUB['dog_tiny'] = (d2l.DATA_URL + 'kaggle_dog_tiny.zip',
'0cb91d09b814ecdc07b50f31f8dcad3e81d6a86d')
# If you use the full dataset downloaded for the Kaggle competition, change
# the variable below to `False`
demo = True
if demo:
data_dir = d2l.download_extract('dog_tiny')
else:
data_dir = os.path.join('..', 'data', 'dog-breed-identification')
Downloading ../data/kaggle_dog_tiny.zip from http://d2l-data.s3-accelerate.amazonaws.com/kaggle_dog_tiny.zip...
14.14.1.2. 整理数据集¶
我们可以像 14.13节 中那样整理数据集,即从原始训练集中分出一个验证集,并将图像移入按标签分组的子文件夹中。
下面的 reorg_dog_data
函数读取训练数据标签,分出验证集,并整理训练集。
def reorg_dog_data(data_dir, valid_ratio):
labels = d2l.read_csv_labels(os.path.join(data_dir, 'labels.csv'))
d2l.reorg_train_valid(data_dir, labels, valid_ratio)
d2l.reorg_test(data_dir)
batch_size = 32 if demo else 128
valid_ratio = 0.1
reorg_dog_data(data_dir, valid_ratio)
def reorg_dog_data(data_dir, valid_ratio):
labels = d2l.read_csv_labels(os.path.join(data_dir, 'labels.csv'))
d2l.reorg_train_valid(data_dir, labels, valid_ratio)
d2l.reorg_test(data_dir)
batch_size = 32 if demo else 128
valid_ratio = 0.1
reorg_dog_data(data_dir, valid_ratio)
14.14.2. 图像增广¶
回想一下,这个狗品种数据集是ImageNet数据集的一个子集,其图像比 14.13节 中的CIFAR-10数据集的图像更大。下面列出了一些对相对较大的图像可能有用的图像增广操作。
transform_train = torchvision.transforms.Compose([
# Randomly crop the image to obtain an image with an area of 0.08 to 1 of
# the original area and height-to-width ratio between 3/4 and 4/3. Then,
# scale the image to create a new 224 x 224 image
torchvision.transforms.RandomResizedCrop(224, scale=(0.08, 1.0),
ratio=(3.0/4.0, 4.0/3.0)),
torchvision.transforms.RandomHorizontalFlip(),
# Randomly change the brightness, contrast, and saturation
torchvision.transforms.ColorJitter(brightness=0.4,
contrast=0.4,
saturation=0.4),
# Add random noise
torchvision.transforms.ToTensor(),
# Standardize each channel of the image
torchvision.transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
transform_train = gluon.data.vision.transforms.Compose([
# Randomly crop the image to obtain an image with an area of 0.08 to 1 of
# the original area and height-to-width ratio between 3/4 and 4/3. Then,
# scale the image to create a new 224 x 224 image
gluon.data.vision.transforms.RandomResizedCrop(224, scale=(0.08, 1.0),
ratio=(3.0/4.0, 4.0/3.0)),
gluon.data.vision.transforms.RandomFlipLeftRight(),
# Randomly change the brightness, contrast, and saturation
gluon.data.vision.transforms.RandomColorJitter(brightness=0.4,
contrast=0.4,
saturation=0.4),
# Add random noise
gluon.data.vision.transforms.RandomLighting(0.1),
gluon.data.vision.transforms.ToTensor(),
# Standardize each channel of the image
gluon.data.vision.transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
在预测时,我们只使用不带随机性的图像预处理操作。
transform_test = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
# Crop a 224 x 224 square area from the center of the image
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
transform_test = gluon.data.vision.transforms.Compose([
gluon.data.vision.transforms.Resize(256),
# Crop a 224 x 224 square area from the center of the image
gluon.data.vision.transforms.CenterCrop(224),
gluon.data.vision.transforms.ToTensor(),
gluon.data.vision.transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
14.14.3. 读取数据集¶
和在 14.13节 中一样,我们可以读取由原始图像文件组成的整理好的数据集。
train_ds, train_valid_ds = [torchvision.datasets.ImageFolder(
os.path.join(data_dir, 'train_valid_test', folder),
transform=transform_train) for folder in ['train', 'train_valid']]
valid_ds, test_ds = [torchvision.datasets.ImageFolder(
os.path.join(data_dir, 'train_valid_test', folder),
transform=transform_test) for folder in ['valid', 'test']]
train_ds, valid_ds, train_valid_ds, test_ds = [
gluon.data.vision.ImageFolderDataset(
os.path.join(data_dir, 'train_valid_test', folder))
for folder in ('train', 'valid', 'train_valid', 'test')]
下面我们以与 14.13节 中相同的方式创建数据迭代器实例。
train_iter, train_valid_iter = [torch.utils.data.DataLoader(
dataset, batch_size, shuffle=True, drop_last=True)
for dataset in (train_ds, train_valid_ds)]
valid_iter = torch.utils.data.DataLoader(valid_ds, batch_size, shuffle=False,
drop_last=True)
test_iter = torch.utils.data.DataLoader(test_ds, batch_size, shuffle=False,
drop_last=False)
train_iter, train_valid_iter = [gluon.data.DataLoader(
dataset.transform_first(transform_train), batch_size, shuffle=True,
last_batch='discard') for dataset in (train_ds, train_valid_ds)]
valid_iter = gluon.data.DataLoader(
valid_ds.transform_first(transform_test), batch_size, shuffle=False,
last_batch='discard')
test_iter = gluon.data.DataLoader(
test_ds.transform_first(transform_test), batch_size, shuffle=False,
last_batch='keep')
14.14.4. 微调预训练模型¶
再次强调,本次竞赛的数据集是ImageNet数据集的一个子集。因此,我们可以使用 14.2节 中讨论的方法,选择一个在完整ImageNet数据集上预训练的模型,并用它来提取图像特征,然后馈送到一个自定义的小规模输出网络。深度学习框架的高级API提供了大量在ImageNet数据集上预训练的模型。在这里,我们选择一个预训练的ResNet-34模型,我们只简单地重用该模型输出层的输入(即提取的特征)。然后我们可以用一个可以训练的小型自定义输出网络替换原始的输出层,例如堆叠两个全连接层。与 14.2节 中的实验不同,下面不会重新训练用于特征提取的预训练模型。这减少了训练时间和存储梯度的内存。
回想一下,我们使用完整ImageNet数据集的三个RGB通道的均值和标准差来对图像进行标准化。事实上,这也与在ImageNet上预训练的模型所做的标准化操作是一致的。
def get_net(devices):
finetune_net = nn.Sequential()
finetune_net.features = torchvision.models.resnet34(pretrained=True)
# Define a new output network (there are 120 output categories)
finetune_net.output_new = nn.Sequential(nn.Linear(1000, 256),
nn.ReLU(),
nn.Linear(256, 120))
# Move the model to devices
finetune_net = finetune_net.to(devices[0])
# Freeze parameters of feature layers
for param in finetune_net.features.parameters():
param.requires_grad = False
return finetune_net
def get_net(devices):
finetune_net = gluon.model_zoo.vision.resnet34_v2(pretrained=True)
# Define a new output network
finetune_net.output_new = nn.HybridSequential(prefix='')
finetune_net.output_new.add(nn.Dense(256, activation='relu'))
# There are 120 output categories
finetune_net.output_new.add(nn.Dense(120))
# Initialize the output network
finetune_net.output_new.initialize(init.Xavier(), ctx=devices)
# Distribute the model parameters to the CPUs or GPUs used for computation
finetune_net.collect_params().reset_ctx(devices)
return finetune_net
在计算损失之前,我们首先获取预训练模型输出层的输入,即提取的特征。然后我们将此特征作为我们小型自定义输出网络的输入来计算损失。
loss = nn.CrossEntropyLoss(reduction='none')
def evaluate_loss(data_iter, net, devices):
l_sum, n = 0.0, 0
for features, labels in data_iter:
features, labels = features.to(devices[0]), labels.to(devices[0])
outputs = net(features)
l = loss(outputs, labels)
l_sum += l.sum()
n += labels.numel()
return l_sum / n
loss = gluon.loss.SoftmaxCrossEntropyLoss()
def evaluate_loss(data_iter, net, devices):
l_sum, n = 0.0, 0
for features, labels in data_iter:
X_shards, y_shards = d2l.split_batch(features, labels, devices)
output_features = [net.features(X_shard) for X_shard in X_shards]
outputs = [net.output_new(feature) for feature in output_features]
ls = [loss(output, y_shard).sum() for output, y_shard
in zip(outputs, y_shards)]
l_sum += sum([float(l.sum()) for l in ls])
n += labels.size
return l_sum / n
14.14.5. 定义训练函数¶
我们将根据模型在验证集上的表现来选择模型和调整超参数。模型训练函数 train
只迭代小型自定义输出网络的参数。
def train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,
lr_decay):
# Only train the small custom output network
net = nn.DataParallel(net, device_ids=devices).to(devices[0])
trainer = torch.optim.SGD((param for param in net.parameters()
if param.requires_grad), lr=lr,
momentum=0.9, weight_decay=wd)
scheduler = torch.optim.lr_scheduler.StepLR(trainer, lr_period, lr_decay)
num_batches, timer = len(train_iter), d2l.Timer()
legend = ['train loss']
if valid_iter is not None:
legend.append('valid loss')
animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
legend=legend)
for epoch in range(num_epochs):
metric = d2l.Accumulator(2)
for i, (features, labels) in enumerate(train_iter):
timer.start()
features, labels = features.to(devices[0]), labels.to(devices[0])
trainer.zero_grad()
output = net(features)
l = loss(output, labels).sum()
l.backward()
trainer.step()
metric.add(l, labels.shape[0])
timer.stop()
if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
animator.add(epoch + (i + 1) / num_batches,
(metric[0] / metric[1], None))
measures = f'train loss {metric[0] / metric[1]:.3f}'
if valid_iter is not None:
valid_loss = evaluate_loss(valid_iter, net, devices)
animator.add(epoch + 1, (None, valid_loss.detach().cpu()))
scheduler.step()
if valid_iter is not None:
measures += f', valid loss {valid_loss:.3f}'
print(measures + f'\n{metric[1] * num_epochs / timer.sum():.1f}'
f' examples/sec on {str(devices)}')
def train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,
lr_decay):
# Only train the small custom output network
trainer = gluon.Trainer(net.output_new.collect_params(), 'sgd',
{'learning_rate': lr, 'momentum': 0.9, 'wd': wd})
num_batches, timer = len(train_iter), d2l.Timer()
legend = ['train loss']
if valid_iter is not None:
legend.append('valid loss')
animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
legend=legend)
for epoch in range(num_epochs):
metric = d2l.Accumulator(2)
if epoch > 0 and epoch % lr_period == 0:
trainer.set_learning_rate(trainer.learning_rate * lr_decay)
for i, (features, labels) in enumerate(train_iter):
timer.start()
X_shards, y_shards = d2l.split_batch(features, labels, devices)
output_features = [net.features(X_shard) for X_shard in X_shards]
with autograd.record():
outputs = [net.output_new(feature)
for feature in output_features]
ls = [loss(output, y_shard).sum() for output, y_shard
in zip(outputs, y_shards)]
for l in ls:
l.backward()
trainer.step(batch_size)
metric.add(sum([float(l.sum()) for l in ls]), labels.shape[0])
timer.stop()
if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
animator.add(epoch + (i + 1) / num_batches,
(metric[0] / metric[1], None))
if valid_iter is not None:
valid_loss = evaluate_loss(valid_iter, net, devices)
animator.add(epoch + 1, (None, valid_loss))
measures = f'train loss {metric[0] / metric[1]:.3f}'
if valid_iter is not None:
measures += f', valid loss {valid_loss:.3f}'
print(measures + f'\n{metric[1] * num_epochs / timer.sum():.1f}'
f' examples/sec on {str(devices)}')
14.14.6. 训练和验证模型¶
现在我们可以训练和验证模型了。下面的超参数都是可调的。例如,可以增加迭代次数。因为 lr_period
和 lr_decay
分别设置为2和0.9,所以优化算法的学习率将在每2个迭代周期后乘以0.9。
devices, num_epochs, lr, wd = d2l.try_all_gpus(), 10, 1e-4, 1e-4
lr_period, lr_decay, net = 2, 0.9, get_net(devices)
train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,
lr_decay)
train loss 1.240, valid loss 1.545
577.5 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]
devices, num_epochs, lr, wd = d2l.try_all_gpus(), 10, 5e-3, 1e-4
lr_period, lr_decay, net = 2, 0.9, get_net(devices)
net.hybridize()
train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,
lr_decay)
train loss 0.956, valid loss 0.958
251.1 examples/sec on [gpu(0), gpu(1)]
14.14.7. 对测试集进行分类并提交Kaggle结果¶
与 14.13节 的最后一步类似,最后所有的标注数据(包括验证集)都将用于训练模型和对测试集进行分类。我们将使用训练好的自定义输出网络进行分类。
net = get_net(devices)
train(net, train_valid_iter, None, num_epochs, lr, wd, devices, lr_period,
lr_decay)
preds = []
for data, label in test_iter:
output = torch.nn.functional.softmax(net(data.to(devices[0])), dim=1)
preds.extend(output.cpu().detach().numpy())
ids = sorted(os.listdir(
os.path.join(data_dir, 'train_valid_test', 'test', 'unknown')))
with open('submission.csv', 'w') as f:
f.write('id,' + ','.join(train_valid_ds.classes) + '\n')
for i, output in zip(ids, preds):
f.write(i.split('.')[0] + ',' + ','.join(
[str(num) for num in output]) + '\n')
train loss 1.217
742.7 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]
net = get_net(devices)
net.hybridize()
train(net, train_valid_iter, None, num_epochs, lr, wd, devices, lr_period,
lr_decay)
preds = []
for data, label in test_iter:
output_features = net.features(data.as_in_ctx(devices[0]))
output = npx.softmax(net.output_new(output_features))
preds.extend(output.asnumpy())
ids = sorted(os.listdir(
os.path.join(data_dir, 'train_valid_test', 'test', 'unknown')))
with open('submission.csv', 'w') as f:
f.write('id,' + ','.join(train_valid_ds.synsets) + '\n')
for i, output in zip(ids, preds):
f.write(i.split('.')[0] + ',' + ','.join(
[str(num) for num in output]) + '\n')
train loss 0.848
294.4 examples/sec on [gpu(0), gpu(1)]
上述代码将生成一个 submission.csv
文件,该文件将以 5.7节 中描述的相同方式提交到Kaggle。
14.14.8. 小结¶
ImageNet数据集中的图像比CIFAR-10图像更大(尺寸各异)。我们可能需要针对不同数据集上的任务修改图像增广操作。
要对ImageNet数据集的子集进行分类,我们可以利用在完整ImageNet数据集上预训练的模型来提取特征,并只训练一个自定义的小规模输出网络。这将减少计算时间和内存成本。