9.6. 循环神经网络的简洁实现¶ 在 SageMaker Studio Lab 中打开 Notebook
与我们从零开始的大多数实现一样,第 9.5 节旨在让读者深入了解每个组件的工作原理。但是,当您每天使用循环神经网络或编写生产代码时,您会更希望依赖那些能够减少实现时间(通过为常用模型和函数提供库代码)和计算时间(通过极力优化这些库实现)的库。本节将向您展示如何使用深度学习框架提供的高级API更有效地实现相同的语言模型。我们像之前一样,首先加载《时间机器》数据集。
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
from mxnet import np, npx
from mxnet.gluon import nn, rnn
from d2l import mxnet as d2l
npx.set_np()
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
import tensorflow as tf
from d2l import tensorflow as d2l
9.6.1. 定义模型¶
我们使用高级API实现的循环神经网络定义以下类。
class RNN(d2l.Module): #@save
"""The RNN model implemented with high-level APIs."""
def __init__(self, num_inputs, num_hiddens):
super().__init__()
self.save_hyperparameters()
self.rnn = nn.RNN(num_inputs, num_hiddens)
def forward(self, inputs, H=None):
return self.rnn(inputs, H)
具体来说,为了初始化隐藏状态,我们调用成员方法begin_state
。它返回一个列表,其中包含小批量中每个样本的初始隐藏状态,其形状是(隐藏层数、批量大小、隐藏单元数)。对于稍后将介绍的某些模型(例如,长短期记忆网络),此列表还将包含其他信息。
class RNN(d2l.Module): #@save
"""The RNN model implemented with high-level APIs."""
def __init__(self, num_hiddens):
super().__init__()
self.save_hyperparameters()
self.rnn = rnn.RNN(num_hiddens)
def forward(self, inputs, H=None):
if H is None:
H, = self.rnn.begin_state(inputs.shape[1], ctx=inputs.ctx)
outputs, (H, ) = self.rnn(inputs, (H, ))
return outputs, H
截至目前,Flax尚未提供用于简洁实现普通循环神经网络的RNNCell。Flax的linen
API中提供了更高级的RNN变体,如LSTM和GRU。
class RNN(nn.Module): #@save
"""The RNN model implemented with high-level APIs."""
num_hiddens: int
@nn.compact
def __call__(self, inputs, H=None):
raise NotImplementedError
class RNN(d2l.Module): #@save
"""The RNN model implemented with high-level APIs."""
def __init__(self, num_hiddens):
super().__init__()
self.save_hyperparameters()
self.rnn = tf.keras.layers.SimpleRNN(
num_hiddens, return_sequences=True, return_state=True,
time_major=True)
def forward(self, inputs, H=None):
outputs, H = self.rnn(inputs, H)
return outputs, H
下面的RNNLM
类继承自第 9.5 节中的RNNLMScratch
类,定义了一个完整的基于循环神经网络的语言模型。请注意,我们需要创建一个独立的全连接输出层。
class RNNLM(d2l.RNNLMScratch): #@save
"""The RNN-based language model implemented with high-level APIs."""
def init_params(self):
self.linear = nn.LazyLinear(self.vocab_size)
def output_layer(self, hiddens):
return self.linear(hiddens).swapaxes(0, 1)
class RNNLM(d2l.RNNLMScratch): #@save
"""The RNN-based language model implemented with high-level APIs."""
def init_params(self):
self.linear = nn.Dense(self.vocab_size, flatten=False)
self.initialize()
def output_layer(self, hiddens):
return self.linear(hiddens).swapaxes(0, 1)
class RNNLM(d2l.RNNLMScratch): #@save
"""The RNN-based language model implemented with high-level APIs."""
training: bool = True
def setup(self):
self.linear = nn.Dense(self.vocab_size)
def output_layer(self, hiddens):
return self.linear(hiddens).swapaxes(0, 1)
def forward(self, X, state=None):
embs = self.one_hot(X)
rnn_outputs, _ = self.rnn(embs, state, self.training)
return self.output_layer(rnn_outputs)
class RNNLM(d2l.RNNLMScratch): #@save
"""The RNN-based language model implemented with high-level APIs."""
def init_params(self):
self.linear = tf.keras.layers.Dense(self.vocab_size)
def output_layer(self, hiddens):
return tf.transpose(self.linear(hiddens), (1, 0, 2))
9.6.2. 训练与预测¶
在训练模型之前,让我们用一个随机权重的模型进行一次预测。鉴于我们还没有训练网络,它会产生无意义的预测。
data = d2l.TimeMachine(batch_size=1024, num_steps=32)
rnn = RNN(num_inputs=len(data.vocab), num_hiddens=32)
model = RNNLM(rnn, vocab_size=len(data.vocab), lr=1)
model.predict('it has', 20, data.vocab)
'it hasoadd dd dd dd dd dd '
data = d2l.TimeMachine(batch_size=1024, num_steps=32)
rnn = RNN(num_hiddens=32)
model = RNNLM(rnn, vocab_size=len(data.vocab), lr=1)
model.predict('it has', 20, data.vocab)
[22:52:51] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU
'it hasxlxlxlxlxlxlxlxlxlxl'
data = d2l.TimeMachine(batch_size=1024, num_steps=32)
rnn = RNN(num_hiddens=32)
model = RNNLM(rnn, vocab_size=len(data.vocab), lr=1)
model.predict('it has', 20, data.vocab)
'it hasretsnrnrxnrnrgczntgq'
接下来,我们利用高级API来训练我们的模型。
trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1)
trainer.fit(model, data)
trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1)
trainer.fit(model, data)
with d2l.try_gpu():
trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1)
trainer.fit(model, data)
与第 9.5 节相比,该模型达到了相当的困惑度,但由于优化的实现,运行速度更快。和之前一样,我们可以根据指定的前缀字符串生成预测的词元。
model.predict('it has', 20, data.vocab, d2l.try_gpu())
'it has and the trave the t'
model.predict('it has', 20, data.vocab, d2l.try_gpu())
'it has and the time the ti'
model.predict('it has', 20, data.vocab)
'it has and the pas an and '
9.6.3. 小结¶
深度学习框架中的高级API提供了标准循环神经网络的实现。这些库可以帮助您避免浪费时间重新实现标准模型。此外,框架的实现通常是高度优化的,与从零开始的实现相比,可以显著提升(计算)性能。