ShuffleNet

程序员文章站 2022-03-17 14:13:45

...

ShuffleNet

1.概述

ShuffleNetv1

ShuffleNet是一个专门为移动设备设计的CNN模型，主要有两个特性：

1.pointwise( 1 × 1 1\times1 1×1) group convolution

2.channel shuffle

它能够在减少计算量的同时保持精度。

剪枝（pruning），压缩（compressing)，低精度表示（low-bit representing）

使用pointwise group convolution来降低 1 × 1 1\times1 1×1卷积的计算复杂度。为了克服分组卷积（group conv）的副作用（side effect）,提出了一种高效的操作——通道混洗（channel shuffle）来提高信息在特征层面的流动。
ShuffleNet

fig.1. （a)分组卷积（GConv）；（b）、（c)通道混洗

1.1通道混洗

通道混洗：通道混洗可以直接通过换轴来实现。????

ShuffleNet

为什么要通道混洗

在ResNeXt中，只有 3 × 3 3\times3 3×3的卷积使用了分组卷积，导致最终pointwise conv，也就是 1 × 1 1\times1 1×1卷积占用了 93.4 % 93.4\% 93.4%的的乘加操作。在小型网络中，pointwise conv将会限制网络的复杂度，最终导致精度降低。为了解决这个问题，当然可以直接对 1 × 1 1\times1 1×1卷积使用分组卷积甚至通道分离卷积。但是，这将会带来一个副作用：当多个分组卷积堆叠到一起时，后面的卷积层的某个通道的输入将只会来源于一小部分浅层网络的输出。这将会削弱通道层面的信息流通，弱化模型表征能力。

1.2 ShuffleNet Unit

ShuffleNet

fig.2 （a）使用分离卷积的瓶颈层（DWConv）,（b）使用pointwise group卷积和通道混洗的ShuffleNet层；（c）步长为2的ShuffleNet层。

（a）是一个残差块， 3 × 3 3\times3 3×3使用分离卷积（ShuffleNet不使用）；（b）是将 1 × 1 1\times1 1×1卷积换为分组卷积，后面跟着一个通道混洗的残差块；（c）是步长为2加了AVG Pool的最后将两个输出cat到一起的单元(dense block？)，这里特征图H,W变为一半。而对于瓶颈层，将通道设为每个ShuffleNet单元输出通道的1/4;
在这里插入图片描述

根据上面的信息，我们可以得到ShuffleNet设计模式的基本信息：

1️⃣卷积中， 1 × 1 1\times1 1×1 使用分组卷积，而 3 × 3 3\times3 3×3使用通道分离卷积，即groups=inchannel/outchannel;

2️⃣‘add‘的时候由于使用的是残差结构，所有特征图大小输入等于输出，即stride=1,padding=1，’cat‘的时候由于特征图变为原来的一半，stride=2,padding=1;所以残差结构是用来提取特征，而’cat’的结构是用来下采样的。

3️⃣通道混洗前 1 × 1 1\times1 1×1分组卷积后面是BN+ReLU，剩下的都只是BN，而ReLU都是在’add’和‘cat’之后。

4️⃣ 3 × 3 3\times3 3×3卷积输入输出通道是该stage输出通道的1/4；

特征图大小计算公式： o u t = i n − k s i z e + 2 p s + 1 out=\frac{in-ksize+2p}{s}+1 out=sin−ksize+2p+1向下取整。

2.ShuffleNet构建

2.1构建ShuffleNet的基本单元

1️⃣构建 1 × 1 1\times1 1×1卷积

1 × 1 1\times1 1×1卷积有使用ReLU**的情况，也有没有，但都有BN层。在有ReLU**过的情况下后面有通道混洗。

class conv1x1(nn.Module):
    '''1x1 conv wiht bn or bn_relu'''

    def __init__(self, inchannel, outchannel, group, relu=True, bias=False):
        super(conv1x1, self).__init__()
        self.relu=relu
        self.group=group
        if self.relu:
            self.conv1x1 = nn.Sequential(
                nn.Conv2d(in_channels=inchannel, out_channels=outchannel,
                          kernel_size=1, stride=1, bias=bias, groups=self.group),
                nn.BatchNorm2d(outchannel),
                nn.ReLU()
            )
        else:
            self.conv1x1 = nn.Sequential(
                nn.Conv2d(in_channels=inchannel, out_channels=outchannel,
                          kernel_size=1, stride=1, bias=bias, groups=self.group),
                nn.BatchNorm2d(outchannel),
            )

    def forward(self, x):
        if self.relu:
            out=self.conv1x1(x)
            return channel_shuffle(out,self.group)
        return self.conv1x1(x)

2️⃣channel shuffle

直接使用pytorch自带的。

def channel_shuffle(x, groups):
    # type: (torch.Tensor, int) -> torch.Tensor
    batchsize, num_channels, height, width = x.data.size()
    channels_per_group = num_channels // groups

    # reshape
    x = x.view(batchsize, groups,
               channels_per_group, height, width)

    x = torch.transpose(x, 1, 2).contiguous()

    # flatten
    x = x.view(batchsize, -1, height, width)

    return x

3️⃣构建 3 × 3 3\times3 3×3卷积

都不使用ReLU**；
步长有2种数值；
padding=1；
都是通道分离卷积，即groups=channel；
输入通道和输出通道相等，但是是变化的。

class conv3x3(nn.Module):
    '''3x3卷积'''
    def __init__(self,channel,stride,bias=False):
        super(conv3x3, self).__init__()
        self.conv3x3=nn.Sequential(
            nn.Conv2d(in_channels=channel,out_channels=channel,kernel_size=3
                      ,stride=stride,padding=1,groups=channel,bias=bias),
            nn.BatchNorm2d(channel)
        )
    def forward(self,x):
        return self.conv3x3(x)

4️⃣构建 S h u f f l e N e t ShuffleNet ShuffleNet单元

当’add’的时候是残差块，此时第一个1x1卷积的输入通道就是输出通道；
当’cat‘的时候conv3x3 s=2,输入进行AVG Pool，此时该块后面的’add’块的第一个1x1的卷积是上一个stage输出通道和该stage输出通道之和；
而这个时候3x3的卷积的通道是stage给的通道的1/4

class ShuffleUnit(nn.Module):
    '''2 ShuffleUnit
    1.cat unit
    2.add unit
    '''

    def __init__(self,channel,group,commbine):
        '''commbine='add' or 'cat' '''
        super(ShuffleUnit, self).__init__()
        self.channel=channel
        self.bottleneck_channel=channel//4
        self.group=group
        self.commbine=commbine
        if self.commbine=='add':
            self.shuffle_unit=self._make_shuffle_unit(stride=1)
        elif self.commbine=='cat':
            self.shuffle_unit =self._make_shuffle_unit(stride=2)
        else:
            raise ValueError("the commbie value is 'add' or 'cat'  ,not '{}' =.=b !".format(self.commbine) )

    def forward(self,x):
        residual=x
        if self.commbine=='add':
            return nn.functional.relu(residual+self.shuffle_unit(x))
        if self.commbine=='cat':
            residual=nn.functional.avg_pool2d(residual,kernel_size=3,stride=2,padding=1)
            out=self.shuffle_unit(x)
            out=torch.cat((residual,out),dim=1)#[N,C,H,W],C cat
            return nn.functional.relu(out)

    def _make_shuffle_unit(self,stride):
        shuffle_unit=nn.Sequential(
            conv1x1(self.channel,self.bottleneck_channel,self.group),
            conv3x3(self.bottleneck_channel,stride=stride),
            conv1x1(self.bottleneck_channel,self.channel,self.group,relu=False)
        )
        return shuffle_unit

2.2构建ShuffleNet

我构建ShuffleNet的动机是代替原本YOLOv3的骨干网络——DarkNet53的，所以我构建的ShuffleNet和原论文是不同的。????

具体来说实现方法如下：

step1:使用下采样将原图从416x416变为208x208；

step2:使用一次ShuffleUnit将208x208变为104x104,repeat 3次；

step3:使用一次ShuffleUnit将104x104变为52x52,repeat 7次；输出52x52特征图；

step4:使用一次ShuffleUnit将52x52变为26x26,repeat 7次；输出26x26特征图；

step5:使用一次ShuffleUnit将26x26变为13x13,repeat 3次；输出13x13特征图；

对于ShuffleUnit来说有3层网络，网络总共有73层。

不过这里遇到一个坑，就是’cat’层的输出通道应该是该stage的输出通道减去上个stage的输出通道，这样cat出来的才是该层的输出通道（有点绕????）,不过只有这样才能保证下一层的’add’层残差和输出的通道数相同。

class ShuffleNet(nn.Module):
    '''
    i will use shufflenet to be the backbone of yolov3.
    input:416*416 imgs
	output:52*52,26*26,13*13 feature map
    '''
    def __init__(self):
        super(ShuffleNet, self).__init__()
        self.stage_repeat=[-1,3,7,7,3]
        self.output_channles=[24,120,240,480,960]
        self.groups=3
        self.conv1=nn.Conv2d(3,self.output_channles[0],kernel_size=3,stride=2,padding=1,bias=False)
        self.stage1=self._make_stage(1)
        self.stage2=self._make_stage(2)#=>output 52*52 feature map
        self.stage3=self._make_stage(3)#=>output 26*26 feature map
        self.stage4=self._make_stage(4)#=>output 13*13 feature map

    def forward(self,x):
        out=self.conv1(x)
        out=self.stage1(out)
        out52=self.stage2(out)
        out26=self.stage3(out52)
        out13=self.stage4(out26)

        return out52,out26,out13

    def _make_stage(self,stage):
        module=OrderedDict()
        repeat_shuffle=self.stage_repeat[stage]
        stage_name = 'shuffule_unit[{}]'.format(stage)
        if stage==0 or stage>4:
            raise ValueError('stage name shloud be one of [1,2,3,4],but got {}'.format(repeat_shuffle))
        head_module=ShuffleUnit(
            in_channel=self.output_channles[stage-1],
 			#this code make sure the next layer's residual and output channel be the same，then can be added
            out_channel=self.output_channles[stage]-self.output_channles[stage-1],
            group=self.groups,
            commbine='cat'
        )
        head_name=stage_name+'[downsample]'
        module[head_name]=head_module
        for i in range(repeat_shuffle):
            repeat_name=stage_name+'[shuffle_{}]'.format(i)
            repeat_module=ShuffleUnit(
                in_channel=self.output_channles[stage],
                out_channel=self.output_channles[stage],
                group=self.groups,
                commbine='add'
            )
            module[repeat_name]=repeat_module
        return nn.Sequential(module)

最后测试一下：

model=ShuffleNet()
data=torch.randn(1,3,416,416)
print(model)
_52,_26,_13=model(data)
print(_52.shape)
print(_26.shape)
print(_13.shape)
-->
torch.Size([1, 240, 52, 52])
torch.Size([1, 480, 26, 26])
torch.Size([1, 960, 13, 13])

可以用。

ShuffleNet

ShuffleNet

1.概述

1.1通道混洗

1.2 ShuffleNet Unit

2.ShuffleNet构建

2.1构建ShuffleNet的基本单元

2.2构建ShuffleNet

神经网络压缩 剪枝 量化 嵌入式计算优化NCNN mobilenet squeezenet shufflenet

ShuffleNet

神经网络压缩剪枝量化嵌入式计算优化NCNN mobilenet squeezenet shufflenet