DropBlock: A regularization method for convolutional networks
一. 论文简介
正则化卷积层,防止过拟合
主要做的贡献如下(可能之前有人已提出):
- 正则化卷积层的模块(正则化Conv层),类似dropout(正则化FC层)
二. 模块详解
2.1 论文思路简介
- 正常的DropOut是对FC层做随机失活,如何对卷积层做随机失活?
- 按照DropOut的思路,直接对卷积层的feature做随机失活,如下图(b)所示,试验效果并不理想,作者猜测是由于卷积层对局部敏感,而随机失活导致局部某些信息得以保留,造成效果不好。
- DropOut的思想融合到卷积之中,局部块随机失活,试验效果随机块失活明显优于随机点失活
2.2 具体实现
2.2.1 具体实现
其实按照上面的分析,我们就可以大概猜到怎么做了。
需要哪些参数:
- 随机失活块的大小,这里按照卷积一样,\(Kernel=K*K\)
- 创建一块 \(Mask\) 符合 \(Bernoulli\) 分布
- 循环 \(Mask\) 对于每一个 \(M_{ij}=0\) 的点,使得其周围 \(Kernel\) 个点也为0
- 价格 \(Mask\) 作用于 \(feature\) 上,\(feature=M*Feature\)
- 归一化特征图:\(feature=feature*count(M)/count\_ones(M)\)
其实上面的公式很简单,就是2.1节说的那样,安装随机块失活即可,什么方法都可以。以下代码主要使用\(maxpooling\)进行\(block\)的操作,其它地方都一样。
import torch
import torch.nn.functional as F
from torch import nn
class DropBlock2D(nn.Module):
r"""Randomly zeroes 2D spatial blocks of the input tensor.
As described in the paper
`DropBlock: A regularization method for convolutional networks`_ ,
dropping whole blocks of feature map allows to remove semantic
information as compared to regular dropout.
Args:
drop_prob (float): probability of an element to be dropped.
block_size (int): size of the block to drop
Shape:
- Input: `(N, C, H, W)`
- Output: `(N, C, H, W)`
.. _DropBlock: A regularization method for convolutional networks:
https://arxiv.org/abs/1810.12890
"""
def __init__(self, drop_prob, block_size):
super(DropBlock2D, self).__init__()
self.drop_prob = drop_prob
self.block_size = block_size
def forward(self, x):
# shape: (bsize, channels, height, width)
assert x.dim() == 4, \
"Expected input with 4 dimensions (bsize, channels, height, width)"
if not self.training or self.drop_prob == 0.:
return x
else:
# get gamma value
gamma = self._compute_gamma(x)
# sample mask
mask = (torch.rand(x.shape[0], *x.shape[2:]) < gamma).float()
# place mask on input device
mask = mask.to(x.device)
# compute block mask
block_mask = self._compute_block_mask(mask)
# apply block mask
out = x * block_mask[:, None, :, :]
# scale output
out = out * block_mask.numel() / block_mask.sum() # 归一化
return out
def _compute_block_mask(self, mask):
# 使用maxpooling代替block计算
block_mask = F.max_pool2d(input=mask[:, None, :, :],
kernel_size=(self.block_size, self.block_size),
stride=(1, 1),
padding=self.block_size // 2) # 由于使用padding,边界概率计算不准确
if self.block_size % 2 == 0:
block_mask = block_mask[:, :, :-1, :-1]
block_mask = 1 - block_mask.squeeze(1)
return block_mask
def _compute_gamma(self, x):
return self.drop_prob / (self.block_size ** 2)
class DropBlock3D(DropBlock2D):
r"""Randomly zeroes 3D spatial blocks of the input tensor.
An extension to the concept described in the paper
`DropBlock: A regularization method for convolutional networks`_ ,
dropping whole blocks of feature map allows to remove semantic
information as compared to regular dropout.
Args:
drop_prob (float): probability of an element to be dropped.
block_size (int): size of the block to drop
Shape:
- Input: `(N, C, D, H, W)`
- Output: `(N, C, D, H, W)`
.. _DropBlock: A regularization method for convolutional networks:
https://arxiv.org/abs/1810.12890
"""
def __init__(self, drop_prob, block_size):
super(DropBlock3D, self).__init__(drop_prob, block_size)
def forward(self, x):
# shape: (bsize, channels, depth, height, width)
assert x.dim() == 5, \
"Expected input with 5 dimensions (bsize, channels, depth, height, width)"
if not self.training or self.drop_prob == 0.:
return x
else:
# get gamma value
gamma = self._compute_gamma(x)
# sample mask
mask = (torch.rand(x.shape[0], *x.shape[2:]) < gamma).float()
# place mask on input device
mask = mask.to(x.device)
# compute block mask
block_mask = self._compute_block_mask(mask)
# apply block mask
out = x * block_mask[:, None, :, :, :]
# scale output
out = out * block_mask.numel() / block_mask.sum()
return out
def _compute_block_mask(self, mask):
block_mask = F.max_pool3d(input=mask[:, None, :, :, :],
kernel_size=(self.block_size, self.block_size, self.block_size),
stride=(1, 1, 1),
padding=self.block_size // 2)
if self.block_size % 2 == 0:
block_mask = block_mask[:, :, :-1, :-1, :-1]
block_mask = 1 - block_mask.squeeze(1)
return block_mask
def _compute_gamma(self, x):
return self.drop_prob / (self.block_size ** 3)
if __name__ == "__main__":
x = torch.ones(size=(10,256,64,64),dtype=torch.float32)
layer = DropBlock2D(0.1, 5)
y = layer(x)
三. 参考文献
- 原始论文
- code