欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

PyTorch实现L1,L2正则化以及Dropout

程序员文章站 2022-03-06 21:23:58
...

了解知道Dropout原理

Dropout可以看做是一种模型平均,所谓模型平均,顾名思义,就是把来自不同模型的估计或者预测通过一定的权重平均起来,在一些文献中也称为模型组合,它一般包括组合估计和组合预测。

Dropout中哪里体现了“不同模型”;这个奥秘就是我们随机选择忽略隐层节点,在每个批次的训练过程中,由于每次随机忽略的隐层节点都不同,这样就使每次训练的网络都是不一样的,每次训练都可以单做一个“新”的模型;此外,隐含节点都是以一定概率随机出现,因此不能保证每2个隐含节点每次都同时出现,这样权值的更新不再依赖于有固定关系隐含节点的共同作用,阻止了某些特征仅仅在其它特定特征下才有效果的情况。

这样dropout过程就是一个非常有效的神经网络模型平均方法,通过训练大量的不同的网络,来平均预测概率。不同的模型在不同的训练集上训练(每个批次的训练数据都是随机选择),最后在每个模型用相同的权重来“融合”,介个有点类似boosting算法。

用代码实现正则化(L1、L2、Dropout)

L1

regularization_loss = 0
for param in model.parameters():
    regularization_loss += torch.sum(abs(param))

calssify_loss = criterion(pred,target)
loss = classify_loss + lamda * regularization_loss

optimizer.zero_grad()
loss.backward()
optimizer.step()

L2

optimizer = torch.optim.SGD(model.parameters(),lr=0.01,weight_decay=0.001)

Dropout

torch.manual_seed(1)    # Sets the seed for generating random numbers.reproducible

N_SAMPLES = 20
N_HIDDEN = 300

# training data
x = torch.unsqueeze(torch.linspace(-1, 1, N_SAMPLES), 1)
print('x.size()',x.size())

# torch.normal(mean, std, out=None) → Tensor
y = x + 0.3*torch.normal(torch.zeros(N_SAMPLES, 1), torch.ones(N_SAMPLES, 1))

# test data
test_x = torch.unsqueeze(torch.linspace(-1, 1, N_SAMPLES), 1)
test_y = test_x + 0.3*torch.normal(torch.zeros(N_SAMPLES, 1), torch.ones(N_SAMPLES, 1))

# show data
plt.scatter(x.data.numpy(), y.data.numpy(), c='magenta', s=50, alpha=0.5, label='train')
plt.scatter(test_x.data.numpy(), test_y.data.numpy(), c='cyan', s=50, alpha=0.5, label='test')
plt.legend(loc='upper left')
plt.ylim((-2.5, 2.5))
plt.show()
x.size() torch.Size([20, 1])


net_overfitting = torch.nn.Sequential(
    torch.nn.Linear(1,N_HIDDEN),
    torch.nn.ReLU(),
    torch.nn.Linear(N_HIDDEN,N_HIDDEN),
    torch.nn.ReLU(),
    torch.nn.Linear(N_HIDDEN,1),
)

net_dropped = torch.nn.Sequential(
    torch.nn.Linear(1,N_HIDDEN),
    torch.nn.Dropout(0.5), # 0.5的概率失活
    torch.nn.ReLU(),
    torch.nn.Linear(N_HIDDEN,N_HIDDEN),
    torch.nn.Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(N_HIDDEN,1),
)



optimizer_ofit = torch.optim.Adam(net_overfitting.parameters(),lr=0.001)
optimizer_drop = torch.optim.Adam(net_dropped.parameters(),lr=0.01)
loss = torch.nn.MSELoss()

for epoch in range(500):
    pred_ofit= net_overfitting(x)
    pred_drop= net_dropped(x)

    loss_ofit = loss(pred_ofit,y)
    loss_drop = loss(pred_drop,y)

    optimizer_ofit.zero_grad()
    optimizer_drop.zero_grad()

    loss_ofit.backward()
    loss_drop.backward()

    optimizer_ofit.step()
    optimizer_drop.step()

    if epoch%50 ==0 :
        net_overfitting.eval() # 将神经网络转换成测试形式,此时不会对神经网络dropout
        net_dropped.eval() # 此时不会对神经网络dropout

        test_pred_ofit = net_overfitting(test_x)
        test_pred_drop = net_dropped(test_x)

        # show data
        plt.scatter(x.data.numpy(), y.data.numpy(), c='magenta', s=50, alpha=0.5, label='train')
        plt.scatter(test_x.data.numpy(), test_y.data.numpy(), c='cyan', s=50, alpha=0.5, label='test')
        plt.plot(test_x.data.numpy(), test_pred_ofit.data.numpy(), 'r-', lw=3, label='overfitting')
        plt.plot(test_x.data.numpy(), test_pred_drop.data.numpy(), 'b--', lw=3, label='dropout(50%)')
        plt.text(0, -1.2, 'overfitting loss=%.4f' % loss(test_pred_ofit, test_y).data.numpy(), fontdict={'size': 20, 'color':  'red'})
        plt.text(0, -1.5, 'dropout loss=%.4f' % loss(test_pred_drop, test_y).data.numpy(), fontdict={'size': 20, 'color': 'blue'})
        plt.legend(loc='upper left')
        plt.ylim((-2.5, 2.5))
        plt.pause(0.1)


        net_overfitting.train()
        net_dropped.train()

plt.ioff()
plt.show()

Dropout的numpy实现

p = 0.5 # **神经元的概率. p值更高 = 随机失活更弱

def train_step(X):
    # 3层neural network的前向传播
    H1 = np.maximum(0, np.dot(W1, X) + b1)
    U1 = (np.random.rand(*H1.shape) < p) / p # 第一个dropout mask. 注意/p!
    H1 *= U1 # drop!
    H2 = np.maximum(0, np.dot(W2, H1) + b2)
    U2 = (np.random.rand(*H2.shape) < p) / p # 第二个dropout mask. 注意/p!
    H2 *= U2 # drop!
    out = np.dot(W3, H2) + b3
    # 反向传播:计算梯度... (略)
    # 进行参数更新... (略)

def predict(X):
# 前向传播时模型集成
H1 = np.maximum(0, np.dot(W1, X) + b1) # 不用数值范围调整了
H2 = np.maximum(0, np.dot(W2, H1) + b2)
out = np.dot(W3, H2) + b3

PyTorch中实现dropout

class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        # 定义多层神经网络
        self.fc1 = torch.nn.Linear(8,6)
        self.fc2 = torch.nn.Linear(6,4)
        self.fc3 = torch.nn.Linear(4,1)
        
    def forward(self,x):
        x = F.relu(self.fc1(x))            # 8->6
        x = F.dropout(x,p=0.5)             #dropout 1 此处为dropout
        x = F.relu(self.fc2(x))            #-6->4
        x = F.dropout(x,p=0.5)             # dropout 2  #此处为drouout
        y_pred = torch.sigmoid(self.fc3(x))         # 4->1 ->sigmoid 
        # warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead."
        return y_pred

参考资料

PyTorch 中文文档
机器学习——Dropout原理介绍 「深度学习思考者」 CSDN
https://blog.csdn.net/u010402786/article/details/46812677
PyTorch1.0实现L1,L2正则化及Dropout (附dropout原理的python实现) 知乎 勿用
https://zhuanlan.zhihu.com/p/62393636

相关标签: pytorch Dropout