(pytorch-深度学习系列)pytorch避免过拟合-dropout丢弃法的实现-学习笔记
pytorch避免过拟合-dropout丢弃法的实现
对于一个单隐藏层的多层感知机,其中输入个数为4,隐藏单元个数为5,且隐藏单元
h
i
h_i
hi(
i
=
1
,
…
,
5
i=1, \ldots, 5
i=1,…,5)的计算表达式为:
h
i
=
ϕ
(
x
1
w
1
i
+
x
2
w
2
i
+
x
3
w
3
i
+
x
4
w
4
i
+
b
i
)
h_i = \phi\left(x_1 w_{1i} + x_2 w_{2i} + x_3 w_{3i} + x_4 w_{4i} + b_i\right)
hi=ϕ(x1w1i+x2w2i+x3w3i+x4w4i+bi)
这里
ϕ
\phi
ϕ是**函数,
x
1
,
…
,
x
4
x_1, \ldots, x_4
x1,…,x4是输入,隐藏单元
i
i
i的权重参数为
w
1
i
,
…
,
w
4
i
w_{1i}, \ldots, w_{4i}
w1i,…,w4i,偏差参数为
b
i
b_i
bi。当对该隐藏层使用丢弃法时,该层的隐藏单元将有一定概率被丢弃掉。设丢弃概率为
p
p
p,那么有
p
p
p的概率
h
i
h_i
hi会被清零,有
1
−
p
1-p
1−p的概率
h
i
h_i
hi会除以
1
−
p
1-p
1−p做拉伸。丢弃概率是丢弃法的超参数。具体来说,设随机变量
ξ
i
\xi_i
ξi为0和1的概率分别为
p
p
p和
1
−
p
1-p
1−p。使用丢弃法时我们计算新的隐藏单元
h
i
′
h_i'
hi′
h
i
′
=
ξ
i
1
−
p
h
i
h_i' = \frac{\xi_i}{1-p} h_i
hi′=1−pξihi
(这个公式就表示
h
i
′
h_i'
hi′有
1
−
p
1-p
1−p的概率为
h
i
h_i
hi)
由于 E ( ξ i ) = 1 − p E(\xi_i) = 1-p E(ξi)=1−p,因此
E
(
h
i
′
)
=
E
(
ξ
i
)
1
−
p
h
i
=
h
i
E(h_i') = \frac{E(\xi_i)}{1-p}h_i = h_i
E(hi′)=1−pE(ξi)hi=hi
即丢弃法不改变其输入的期望值。我们对隐藏层使用丢弃法,一种可能的结果是
h
2
h_2
h2和
h
5
h_5
h5被清零。这时输出值的计算不再依赖
h
2
h_2
h2和
h
5
h_5
h5,在反向传播时,与这两个隐藏单元相关的权重的梯度均为0。由于在训练中隐藏层神经元的丢弃是随机的,即
h
1
,
…
,
h
5
h_1, \ldots, h_5
h1,…,h5都有可能被清零,输出层的计算无法过度依赖
h
1
,
…
,
h
5
h_1, \ldots, h_5
h1,…,h5中的任一个,从而在训练模型时起到正则化的作用,并可以用来应对过拟合。在测试模型时,我们为了拿到更加确定性的结果,一般不使用丢弃法。
开始实现drop丢弃法避免过拟合
定义dropout函数:
%matplotlib inline
import torch
import torch.nn as nn
import numpy as np
def dropout(X, drop_prob):
X = X.float()
assert 0 <= drop_prob <= 1
keep_prob = 1 - drop_prob
# 这种情况下把全部元素都丢弃
if keep_prob == 0:
return torch.zeros_like(X)
mask = (torch.rand(X.shape) < keep_prob).float()
return mask * X / keep_prob
定义模型参数:
num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256
W1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)), dtype=torch.float, requires_grad=True)
b1 = torch.zeros(num_hiddens1, requires_grad=True)
W2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)), dtype=torch.float, requires_grad=True)
b2 = torch.zeros(num_hiddens2, requires_grad=True)
W3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), dtype=torch.float, requires_grad=True)
b3 = torch.zeros(num_outputs, requires_grad=True)
params = [W1, b1, W2, b2, W3, b3]
定义模型将全连接层和**函数ReLU串起来,并对每个**函数的输出使用丢弃法。分别设置各个层的丢弃概率。通常的建议是把靠近输入层的丢弃概率设得小一点。在这个实验中,我们把第一个隐藏层的丢弃概率设为0.2,把第二个隐藏层的丢弃概率设为0.5。我们可以通过参数is_training来判断运行模式为训练还是测试,并只在训练模式下使用丢弃法。
drop_prob1, drop_prob2 = 0.2, 0.5
def net(X, is_training=True):
X = X.view(-1, num_inputs)
H1 = (torch.matmul(X, W1) + b1).relu()
if is_training: # 只在训练模型时使用丢弃法
H1 = dropout(H1, drop_prob1) # 在第一层全连接后添加丢弃层
H2 = (torch.matmul(H1, W2) + b2).relu()
if is_training:
H2 = dropout(H2, drop_prob2) # 在第二层全连接后添加丢弃层
return torch.matmul(H2, W3) + b3
def evaluate_accuracy(data_iter, net):
acc_sum, n = 0.0, 0
for X, y in data_iter:
if isinstance(net, torch.nn.Module):
net.eval() # 评估模式, 这会关闭dropout
acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
net.train() # 改回训练模式
else: # 自定义的模型
if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数
# 将is_training设置成False
acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item()
else:
acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
n += y.shape[0]
return acc_sum / n
训练和测试模型:
num_epochs, lr, batch_size = 5, 100.0, 256
loss = torch.nn.CrossEntropyLoss()
def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):
"""Download the fashion mnist dataset and then load into memory."""
trans = []
if resize:
trans.append(torchvision.transforms.Resize(size=resize))
trans.append(torchvision.transforms.ToTensor())
transform = torchvision.transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)
if sys.platform.startswith('win'):
num_workers = 0 # 0表示不用额外的进程来加速读取数据
else:
num_workers = 4
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)
return train_iter, test_iter
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
params=None, lr=None, optimizer=None):
for epoch in range(num_epochs):
train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
for X, y in train_iter:
y_hat = net(X)
l = loss(y_hat, y).sum()
# 梯度清零
if optimizer is not None:
optimizer.zero_grad()
elif params is not None and params[0].grad is not None:
for param in params:
param.grad.data.zero_()
l.backward()
if optimizer is None:
sgd(params, lr, batch_size)
else:
optimizer.step() # “softmax回归的简洁实现”一节将用到
train_l_sum += l.item()
train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
n += y.shape[0]
test_acc = evaluate_accuracy(test_iter, net)
print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
train_iter, test_iter = load_data_fashion_mnist(batch_size)
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)
上一篇: 防止过拟合-Dropout2d
下一篇: [深度学习] DNN中防止过拟合的方法