【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

程序员文章站 2022-03-05 09:37:11

2020的CVPR，Salience-Guided Cascaded Suppression Network for Person Re-identification[1]，很棒的工作，我就暂且翻译为显著性导向的级联抑制网络吧。文章提出了Salience-Guided Cascaded Suppression Network，其通过级联策略在每个阶段（stage）逐次提取不同的潜在特征，每个stage整和这些特征给最终的representation。模型在四个benchmark上都达到了新的SOTA。论文...

论文一览：

痛点

global feature主要是关注外观变化和位置变化，而注意力机制的出现弥补了global feature对细节信息local feature关注的缺失，这是“有无”的问题。因而很多工作使用注意力机制结合local feature和global feature来解决ReID，但是ReID可能取决于在不同情况下最明显的特征所掩盖的各种线索（不容易注意到的特征），比如鞋子，衣服。即网络学习了最显著的特征，可能会忽略学习其他重要的特征，或者更偏心于某个特征，这就不是“有无”的问题了，是在有的基础上去控制特征提取的“程度”的问题了，凡事都有个度嘛（这样的写作洞见是很妙的），因而：

1）文章提出了显著性导向的级联抑制网络Salience-guided Cascaded Suppression Network (以后简称SCSN)，能挖掘不同的显著特征，并通过级联的方式整合这些特征到最终表征（final representation），做到高效提取不同的显著性特征，且以最有利于ReID任务的方向合理整合这些特征。

2）文章提出了Salient Feature Extraction(SFE) unit（显著性特征提取单元），能够抑制前一个级联stage学习到的显著性特征，自适应地提取其他潜在显著性特征，来获取行人的不同线索。

3）文章设计了一种高效的特征聚合策略，包含残差双注意力模块Residual Dual Attention Module (RDAM)和Non-local多阶段特征融合块Non-local Multi-stage Feature Fusion(NMFF) block，能够完全提高网络对显著性特征的容量。

模型

SCSN和特征聚合策略，其中特征聚合策略包含残差双注意力模块Residual Dual Attention Module (RDAM)，一个Non-local多阶段特征融合块Non-local Multi-stage Feature Fusion(NMFF) block来聚合低级和高级特征，一个显著性特征提取单元SFE unit，因此具有SFE单元的级联抑制头，可以通过级联抑制更新来提取显着特征。

SCSN示意图如下图1。为了在特征抑制机制中增加信息流，在certain stage学到的显著性特征，首先和global feature 整合，以提高该stage的特征辨别力，然后它（上一个stage的显著性特征）就会被抑制，为下一个stage得到一个无显著性信息的input feature。

训练阶段每个stage都被loss的梯度所引导。测试阶段，每个stage的feature都会被融合成一个最终的表征，由于每个stage都会提取不同的显著性特征，下一个stage会抑制之前意境提取过的显著性特征，转而提取其他，所以最终融合的特征包含了行人的多种信息。

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

设t阶段的输入特征为 $X^{t}$ ，提升的特征图为 $Y^{t}$ ，t+1阶段中被抑制的输入特征图为 $X^{t+1}$

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

残差双注意力模块 Residual Dual Attention Module

可以看到Residual Dual Attention Module (RDAM)包含channe attantion module，和有残差链接的spatial attantion module，用来提取通道注意力和空间注意力。由于是比较基础的东西就不赘述了。

channel attantion没有变化，有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

W1和W2是FC的参数， $\sigma$ 是sigmoid函数，而 $\delta$ 是ReLU。

spatial attantion基本没有变化，而当avg pooling map和max pooling map提取的map经过conv之后得到spatial feature时，文章将此时上一个stage的这个spatial feature跟这个stage的spatial feature做一个特殊的残差连接，再输入给sigmoid，得到spatial attantion map。

跨stage的spatial feature map残差链接有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

此处β设为1， $A^{s}_{0}$ 初设为0。

多阶段non-local特征融合 Nonlocal Multistage Feature Fusion

NMFF考虑了两种Non-local的信息源，如下图5，

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

高级特征 $F_{h}\in\mathbb{R}^{C_{h}\times H_{h}\times W_{h}}$ 和 $F_{l}\in\mathbb{R}^{C_{l}\times H_{l}\times W_{l}}$ ，后通过1x1conv $\psi_{q}$ ，
$\psi_{v}$ 和 $\psi_{k}$ 有:

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

其中

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

S表示金字塔池化pyramid average pooling pixels。pyramid average pooling pixels原理可见下图3：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

之后有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

得到一个stage的NMFF输出，如果是多阶段，则最终的融合特征有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

$\phi$ 为1x1conv。

显著性特征提取单元 Salient Feature Extraction Unit

如下图4所示，先将feature map切分成K个stripe，每一个stripe为（C，H/K，W），每一个stripe之后经过Conv+BN+ReLU，得到简洁的特征描述子feature descriptor（1，H/K，W），后经过global average pooling（GAP），因为有K个stripe，所以K个stripe经过GAP之后聚合得到（K，1）的vector。

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

得到（K，1）vector后通过softmax得到channelwise的显著性描述——显著性敏感权值W，将其与input $X^{t}$ 做elementwise multiplication，有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

这里在stage t被强调的 $Sal(X^{t})$ 将在stage t+1被抑制

显著性导向的级联抑制网络 Salience-Guided Cascaded Suppression Network

模型backbone采用了ResNet50，其stage3和stage4的downsample stride改为了1，在得到了backbone的feature map之后输给SCSN，求显著性提升特征有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

为了发掘其他显著性特征，文章使用显著性mask作用于stage t的输出来抑制 $Sal(X^{t})$ ，得到stage t+1的输出：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

其中 $\Beta$ 为二值mask，这样便可以在下一个stage 缓解 $Sal(X^{t})$ 的影响。

下图6展示了每个stage的注意力区域：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

实验

在market1501上的SOTA实验如下：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

在DukeMTMC-ReID上的SOTA实验如下：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

在CUHK03的SOTA实验如下：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

在MSMT17上的SOTA实验如下：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

各个组件在Duke上的分离实验有：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

CAM是channel attantion module，RSAM是Residual spatial attantion，SAM是普通的spatial attantion，B&A是部署了residual dual attention attention module的backbone

NMFF的特征融合策略的分离实验：

【ReID】Salience-Guided Cascaded Suppression Network for Person Re-identification

其中N表示NMFF block，B表示没有特征融合的baseline，组合的数量代表ResNet50不同阶段的融合特征

写作

“（abstract句1）Employing attention mechanisms to model both global and local features as a final pedestrian representation has become a trend for person re-identification (Re-ID) algorithms. A potential limitation of these methods is that they focus on the most salient features, but the re-identification of a person may rely on diverse clues masked by the most salient features in different situations, e.g., body, clothes or even shoes. ”

我愿称之为牛逼，写得多好。

“（introduction第三段句1）Nevertheless, one crucial limitation of these global-local methods, including attention-based and part-based, is the lack of exploration of how to effectively extract discriminative potential salience features of different pedestrians.”

就正如痛点里说的，抽丝剥茧，鲜明指出这个工作在一众注意力机制工作里怎么摆放自己的位置，针对的是什么问题。这没有丰富的阅历和经验（学术水平），是写不出来的。

参考文献

[1] Chen X, Fu C, Zhao Y, et al. Salience-Guided Cascaded Suppression Network for Person Re-Identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 3300-3310.

本文地址：https://blog.csdn.net/fisherish/article/details/107876325

相关标签： ReID 人工智能机器学习深度学习

上一篇： EasyStack at OpenInfra:信创机遇下，打造ARM架构产品化最佳实践

下一篇： Android源码分析之Handler消息机制