欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

程序员文章站 2024-01-09 23:34:40
...

最近要更新文本检测的训练模型,所以看了PSE的升级版PAN;

论文:https://arxiv.org/abs/1908.05900v1

代码:https://github.com/WenmuZhou/PAN.pytorch

目录

1.该论文的主要方法介绍

1.1该方法的主体框架

1.2 分块介绍主体功能

1.2.0 Fr作用

1.2.1 FPEM模块(Feature Pyramid Enhancement Module)

1.2.2 FFM模块(Feature Fusion Module)

1.2.3 Ff作用

1.3 loss计算部分

1.3.1 Pixel Aggregation

1.3.2 LossFunction

2.PAN与PSE的异同

3.PAN方法代码实现细读

4.训练PAN的实战经验


1.该论文的主要方法介绍

1.1该方法的主体框架

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

该方法主体框架fig.3所示,功能上大概包括以下几部分(b)Lightweight Backbone;(c)Fr,(d)~(e)FPEM,FFM,(f)Ff,loss计算 ;

作者的论文中的主干网络使用的是resnet18,由于主干网络使用的是resnet18,模型推理速度得到了提升,但是网络小的时候,最终的感受野也比较小,从而特征表征能力也比较差,所以作者提出了segmentation head,高效利用轻量级网络(resnet18)的特征;segmentation head包括FPEM(Feature Pyramid Enhancement Module)和FFM(Feature Fusion Module);

FPEM:特点就是低计算消耗; 通过级联操作,将输入的不同尺度特征更深,更强的表达能力(more expressive);

FFM:就是用于融合FPEMs产生的不同深度的特征;

PAN预测得到文本区域(text region,fig.3),预测核(kernels, fig.3(h))用于区分不同的文本实例,同时也预测每个文本像素的相似矢量(similarity vector),希望相同文本实例的像素和kernel之间的距离越小越好;

1.2 分块介绍主体功能

1.2.0 Fr作用

fig.3中从(b)的轻量级主干网络得到的卷积特征conv2,conv3,conv4和conv5,与输入图片尺寸相比,尺寸变小了4倍,8倍,16倍,32倍(所以输入样本尺寸最好是32的倍数),通过Fr作用就是使用1*1的卷积,将所有的conv2,... ... ,conv5的通道数都变为128,使用代码操作:

conv_out = 128
# reduce layers
self.reduce_conv_c2 = nn.Sequential(
nn.Conv2d(in_channels=backbone_out_channels[0], out_channels=conv_out, kernel_size=1),
nn.BatchNorm2d(conv_out),
nn.ReLU()
)

疑问1,这个128可不可以变化了,变小,变大会带来什么影响?

文中作者是使用128,是为了得到一个瘦的特征金字塔Fr,其实(b)处c2,c3,c4,c5通道数分别是64,128,256,512;

1.2.1 FPEM模块(Feature Pyramid Enhancement Module)

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

FPEM主要包括功能模块:up-scale enchancemet和down-scale enchancement; FPEM中使用了可分离卷积,总言之FPEM就有两个优点:1)级联不同尺度的特征,加强了低维和高维特征融合;2)计算代价小;

论文中FPEM相同操作,操作了两次;

代码实现上,分为两部分:

第一部分(up-scale enchancemet):

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

 第二部分(down-scale enchancement):

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

实现代码:

        fpem_repeat = 2
        for i in range(fpem_repeat):
            self.fpems.append(FPEM(conv_out))
        # FPEM
        for i, fpem in enumerate(self.fpems):
            # print(fpem)
            c2, c3, c4, c5 = fpem(c2, c3, c4, c5)
            if i == 0:
                c2_ffm = c2
                c3_ffm = c3
                c4_ffm = c4
                c5_ffm = c5
            else:
                c2_ffm += c2
                c3_ffm += c3
                c4_ffm += c4
                c5_ffm += c5
class FPEM(nn.Module):
    def __init__(self, in_channels=128):
        super().__init__()
        self.up_add1 = SeparableConv2d(in_channels, in_channels, 1)
        self.up_add2 = SeparableConv2d(in_channels, in_channels, 1)
        self.up_add3 = SeparableConv2d(in_channels, in_channels, 1)
        self.down_add1 = SeparableConv2d(in_channels, in_channels, 2)
        self.down_add2 = SeparableConv2d(in_channels, in_channels, 2)
        self.down_add3 = SeparableConv2d(in_channels, in_channels, 2)

    def forward(self, c2, c3, c4, c5):
        # up阶段
        c4 = self.up_add1(self._upsample_add(c5, c4))
        c3 = self.up_add2(self._upsample_add(c4, c3))
        c2 = self.up_add3(self._upsample_add(c3, c2))

        # down 阶段
        c3 = self.down_add1(self._upsample_add(c3, c2))
        c4 = self.down_add2(self._upsample_add(c4, c3))
        c5 = self.down_add3(self._upsample_add(c5, c4))
        return c2, c3, c4, c5

    def _upsample_add(self, x, y):
        return F.interpolate(x, size=y.size()[2:], mode='bilinear') + y

 

1.2.2 FFM模块(Feature Fusion Module)

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

FFM的作用主要是进行了upsample+concat,实现代码:

       # FFM
        c5 = F.interpolate(c5_ffm, c2_ffm.size()[-2:], mode='bilinear')
        c4 = F.interpolate(c4_ffm, c2_ffm.size()[-2:], mode='bilinear')
        c3 = F.interpolate(c3_ffm, c2_ffm.size()[-2:], mode='bilinear')
        Fy = torch.cat([c2_ffm, c3, c4, c5], dim=1)

1.3 loss计算部分

1.3.1 Pixel Aggregation(PA)

解决问题1:同一个文本实例的像素和kernel之间的距离越小越好;

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

 (1)式中,N:文本实例个数,Ti:第i个文本实例;p:文本像素;Ki:kernel

(2)式中,F(p):pixel similarity vector;g(Ki):kernel similarity vector,【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Networkagg是个常量,值为0.5,用于过滤容易样本;

解决问题2:不同文本实例的kernel之间保持一定的距离;

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Ldis保证kernels之间的距离不要小于【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Networkdis,其值文中为3; 

1.3.2 LossFunction

文中的目标函数:

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Ltex是文本区域loss;Lker是kernels的loss;loss之间使用了参数【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network=0.5和【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network=0.25;

Ltex和Lker使用了dice loss,为了有效平衡文本区域和非文本区域 之间的大小;计算公式:

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

 Ptex(i)和Gtex(i)分别代表了第i个像素的分割结果和真实标签;真实标签使用的是0,1的二值mask;0:非文本,1:文本;Ltex使用了难例挖掘(OHEM);

2.PAN与PSE的异同

PAN的整体框架图:

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

PSENet的整体框架图:

【ocr文字检测】Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

1)主干网络

在PAN中,作者选用了更加轻量级的主干网络:resnet18,为了提升模型的速度;

2)特征处理

PAN中不加入FPEM和FFM的时候,特征处理阶段就比较类似与PSENet了;

3)loss计算

PAN的loss计算中加入了Lagg loss和Ldis loss;

4)后处理

PAN只用一个kernel,而PSENet有6个Kernel,如果场景数据测试性能允许的情况下,PSENet是不是也可以尝试使用1个kernel,达到提升模型速度的效果;

3.PAN方法代码实现细读

(后续补充)

4.训练PAN的实战经验

(后续补充)

相关标签: ocr