欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【因果学习】VC RCNN(CVPR 2020)代码

程序员文章站 2024-02-11 13:32:10
...

作者基于MaskRCNN框架(Detectron2的前身)开发。受Bottom-Up and Top-Down Attention for Image Captioning and VQA启发,使用Mask RCNN作为Bottom-Up的backbone,为Downstream任务例如Image Captioning、VQA等提供图片特征。

论文中提到,去掉了RPN,使用GT bbox作为输入,训练的损失修改为:

【因果学习】VC RCNN(CVPR 2020)代码

 测试阶段,则变为特征提取阶段,通过ROI_HEAD输出的特征,认为是VC Feature。

配置文件在:e2e_mask_rcnn_R_101_FPN_1x.yaml,相较于MaskRCNN,作者的BASE_LR从0.02修改为0.005,MAX_ITERS从90k修改为240k,同时作者是从头训练。主要涉及的文件在:ROI_BOX_HEAD中:FPN2MLPFeatureExtrator和FPNPredictor,其中前者是ROI Align和flatten + 两层fc+relu,输出1024维特征。后者FPNPredictor则是class预测和box回归。

 

在box_head.py的ROIBoxHead()中增加了causal_predictor和feature_save_path。

对于predictor(),去掉了box_regression部分,只对class进行分类。用class_logits和class_logits_causal_list送入loss_evaluator(),并在测试阶段,执行save_object_feature_gt_bu()。

在roi_box_predictors.py中增加了CausalPredictor()

@registry.ROI_BOX_PREDICTOR.register("CausalPredictor")
class CausalPredictor(nn.Module):
    def __init__(self, cfg, in_channels):
        super(CausalPredictor, self).__init__()

        num_classes = cfg.MODEL.ROI_BOX_HEAD.NUM_CLASSES
        self.embedding_size = cfg.MODEL.ROI_BOX_HEAD.EMBEDDING
        representation_size = in_channels

        self.causal_score = nn.Linear(2*representation_size, num_classes)
        self.Wy = nn.Linear(representation_size, self.embedding_size)
        self.Wz = nn.Linear(representation_size, self.embedding_size)

        nn.init.normal_(self.causal_score.weight, std=0.01)
        nn.init.normal_(self.Wy.weight, std=0.02)
        nn.init.normal_(self.Wz.weight, std=0.02)
        nn.init.constant_(self.Wy.bias, 0)
        nn.init.constant_(self.Wz.bias, 0)
        nn.init.constant_(self.causal_score.bias, 0)

        self.feature_size = representation_size
        self.dic = torch.tensor(np.load(cfg.DIC_FILE)[1:], dtype=torch.float)
        self.prior = torch.tensor(np.load(cfg.PRIOR_PROB), dtype=torch.float)

    def forward(self, x, proposals):
        device = x.get_device()
        dic_z = self.dic.to(device)
        prior = self.prior.to(device)

        box_size_list = [proposal.bbox.size(0) for proposal in proposals]
        feature_split = x.split(box_size_list)
        xzs = [self.z_dic(feature_pre_obj, dic_z, prior) for feature_pre_obj in feature_split]

        causal_logits_list = [self.causal_score(xz) for xz in xzs]


        return causal_logits_list


    def z_dic(self, y, dic_z, prior):
        """
        Please note that we computer the intervention in the whole batch rather than for one object in the main paper.
        """
        length = y.size(0)
        if length == 1:
            print('debug')
        attention = torch.mm(self.Wy(y), self.Wz(dic_z).t()) / (self.embedding_size ** 0.5)
        attention = F.softmax(attention, 1)
        z_hat = attention.unsqueeze(2) * dic_z.unsqueeze(0)
        z = torch.matmul(prior.unsqueeze(0), z_hat).squeeze(1)
        xz = torch.cat((y.unsqueeze(1).repeat(1, length, 1), z.unsqueeze(0).repeat(length, 1, 1)), 2).view(-1, 2*y.size(1))

        # detect if encounter nan
        if torch.isnan(xz).sum():
            print(xz)
        return xz

在loss.py中修改了FastRCNNLossComputation()中的__call__函数

    def __call__(self, class_logits, causal_logits_list, proposals):
        """
        Computes the loss for Faster R-CNN.
        This requires that the subsample method has been called beforehand.

        Arguments:
            class_logits (list[Tensor])
            box_regression (list[Tensor])

        Returns:
            classification_loss (Tensor)
            box_loss (Tensor)
        """

        class_logits = cat(class_logits, dim=0)
        device = class_logits.device

        labels = [proposal.get_field("labels").to(dtype=torch.int64) for proposal in proposals]
        labels_self = cat(labels, dim=0)

        # self predictor loss
        classification_loss = F.cross_entropy(class_logits, labels_self)

        # context predictor loss
        causal_loss = 0.
        for causal_logit, label in zip(causal_logits_list, labels):
            mask_label = label.unsqueeze(0).repeat(label.size(0), 1)
            mask = 1 - torch.eye(mask_label.size(0)).to(device)
            loss_causal = F.cross_entropy(causal_logit, mask_label.view(-1), reduction='none')


            loss_causal = loss_causal * mask.view(-1)
            causal_loss += torch.mean(loss_causal)

        return classification_loss, causal_loss

在box_head.py的ROIBoxHead()中增加函数,用于在测试中,保存feature

    def save_object_feature_gt_bu(self, x, result, targets):

        for i, image in enumerate(result):
            feature_pre_image = image.get_field("features").cpu().numpy()
            try:
                assert image.get_field("num_box")[0] == feature_pre_image.shape[0]
                image_id = str(image.get_field("image_id")[0].cpu().numpy())
                path = os.path.join(self.feature_save_path, image_id) +'.npy'
                np.save(path, feature_pre_image)
            except:
                print(image)

总的来说,作者去掉了和bbox相关的所有部分,本文使用的Mask R-CNN测试时需要提供bbox GT,某种程度上来说,它只执行了分类任务,并不包含任何的定位信息,因此不能单独使用,必须要加上Up-Down feature。

不同于Up-Down那篇论文的Faster RCNN是可以用于目标检测任务的。

【因果学习】VC RCNN(CVPR 2020)代码

从论文的测试结果也可以看出,Only VC的效果是比Origin要低的。

【因果学习】VC RCNN(CVPR 2020)代码

相关标签: 因果