pytorch模型(nn.Module)初始化的影响
程序员文章站
2022-06-12 22:44:06
...
pytorch在定义模型的时候,是继承(nn.Module)类,一般是我们在初始化的时候,将不同的模块都定义好,在forward函数中进行调用,有的时候,在初始化的时候,会初始化一些没有用到的模块,而没有删除,或者在forward函数中没有用到,这样会影响网络的收敛速度。举个例子
第一种:没有将self.attention和self.decoder删掉,forward中也没有使用
这样收敛速度会变慢
class Bert_Ocr(nn.Module):
def __init__(self, cfg):
super().__init__()
self.transformer = Transformer(cfg, cfg.attention_layers)
self.attention = Parallel_Attention(cfg)
self.decoder = Two_Stage_Decoder(cfg)
def forward(self, x, mask):
x1 = self.transformer(x, mask)
# x_atten = self.attention(x1)
# glimpses = torch.bmm(x_atten.permute(0, 2, 1), x)
# res1 = self.decoder(glimpses)
return x1
>>>
>[500/300000] valid loss: 0.88969 accuracy: 0.000, norm_ED: 29.00
>[1000/300000] valid loss: 0.30434 accuracy: 39.773, norm_ED: 9.86
>[1500/300000] valid loss: 0.14993 accuracy: 70.455, norm_ED: 4.29
第二种:及时在init中删掉不用的模块
class Bert_Ocr(nn.Module):
def __init__(self, cfg):
super().__init__()
self.transformer = Transformer(cfg, cfg.attention_layers)
# self.attention = Parallel_Attention(cfg)
# self.decoder = Two_Stage_Decoder(cfg)
def forward(self, x, mask):
x1 = self.transformer(x, mask)
# x_atten = self.attention(x1)
# glimpses = torch.bmm(x_atten.permute(0, 2, 1), x)
# res1 = self.decoder(glimpses)
return x1
>>>
>[500/300000] valid loss: 0.46041 accuracy: 27.273, norm_ED: 14.14
>[1000/300000] valid loss: 0.03799 accuracy: 93.182, norm_ED: 0.86
总结:及时在init中删掉不用的模块