Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

程序员文章站 2024-03-14 21:38:11

...

Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

[原文 - Caffe custom sigmoid cross entropy loss layer].

很清晰的一篇介绍，学习下.

1. Sigmoid Cross Entropy Loss 推导

Sigmoid Cross Entropy Loss 定义形式：

$L = t l n (P) + (1 - t) l n (1 - P) $

其中，

$t$ - target 或 label；
$P$ - Sigmoid Score， $P = \frac{1}{1 + e^{- x}}$

则有：

$L = t l n (\frac{1}{1 + e^{- x}}) + (1 - t) l n (1 - \frac{1}{1 + e^{- x}})$

公式推导有：

$L = t l n (\frac{1}{1 + e^{- x}}) + (1 - t) l n (\frac{e^{- x}}{1 + e^{- x}})$

$L = t l n (\frac{1}{1 + e^{- x}}) + l n (\frac{e^{- x}}{1 + e^{- x}}) - t l n (\frac{e^{- x}}{1 + e^{- x}})$

$L = t [l n 1 - l n (1 + e^{- x})] + [l n (e^{- 1}) - l n (1 + e^{- x})] - t [l n (e^{- x}) - l n (1 + e^{- x})]$

$L = [- t l n (1 + e^{- x})] + l n (e^{- x}) - l n (1 + e^{- x}) - t l n (e^{- x}) + [t l n (1 + e^{- x})]$

合并相关项：

$L = l n (e^{- x}) - l n (1 + e^{- x}) - t l n (e^{- x})$

$L = - x l n (e) - l n (1 + e^{- x}) + t x l n (e)$

$L = - x - l n (1 + e^{- x}) + x t$

即：

$L = x t - x - l n (1 + e^{- x})$ <1>

$e^{- x}$ (左) 和 $e^{x}$ (右) 的函数特点：
Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

$e^{- x}$ 随着 $x$ 值的增加而减小，当 $x$ 值为较大的负值时， $e^{- x}$ 值变得非常大，很容易引起溢出(overflow). 也就是说，函数需要避免出现这种数据类型.

因此，为了避免溢出，对损失函数 $L$ 进行改动. 即，当 $x < 0$ 时，采用 $e^{x}$ 进行修改损失函数：

原损失函数： $L = x t - x - l n (1 + e^{- x})$ <1>

有： $L = x t - x + l n (\frac{1}{1 + e^{- x}})$

最后一项乘以 $e^{x}$ ：

$L = x t - x + l n (\frac{1 * e^{x}}{(1 + e^{- x}) * e^{x}})$

$L = x t - x + l n (\frac{e^{x}}{1 + e^{x}})$

$L = x t - x + [l n (e^{x}) - l n (1 + e^{x})]$

$L = x t - x + x l n e - l n (1 + e^{x})$

有：

$L = x t - l n (1 + e^{x})$ <2>

根据 <1> 和 <2>，可以得到最终的损失函数：

$L = x t - x - l n (1 + e^{- x}) ， (x > 0)$

$L = x t - 0 - l n (1 + e^{x}) ， (x < 0)$

合二为一，有：

$L = x t - m a x (x, 0) - l n (1 + e^{- | x |}) ， f o r a l l x$

2. Sigmoid Cross Entropy Loss 求导计算

当 $x > 0$ 时， $L = x t - x - l n (1 + e^{- x})$ ，

有：

$\frac{\partial L}{\partial x} = \frac{\partial (x t - x - l n (1 + e^{- x}))}{\partial x}$

$\frac{\partial L}{\partial x} = \frac{\partial x t}{\partial x} - \frac{\partial x}{\partial x} - \frac{\partial (l n (1 + e^{- x}))}{\partial x}$

$\frac{\partial L}{\partial x} = t - 1 - \frac{1}{1 + e^{- x}} * \frac{\partial (1 + e^{- x})}{\partial x}$

$\frac{\partial L}{\partial x} = t - 1 - \frac{1}{1 + e^{- x}} * \frac{\partial (e^{- x})}{\partial x}$

$\frac{\partial L}{\partial x} = t - 1 + \frac{e^{- x}}{1 + e^{- x}}$

有：

$\frac{\partial L}{\partial x} = t - \frac{1}{1 + e^{- x}}$

第二项为 Sigmoid 函数 $P = \frac{1}{1 + e^{- x}}$ ，故，

$\frac{\partial L}{\partial x} = t - P$

当 $x < 0$ 时， $L = x t - l n (1 + e^{x})$ ，

$\frac{\partial L}{\partial x} = \frac{\partial (x t - l n (1 + e^{x}))}{\partial x}$

$\frac{\partial L}{\partial x} = \frac{\partial x t}{\partial x} - \frac{\partial (l n (1 + e^{x}))}{\partial x}$

$\frac{\partial L}{\partial x} = t - \frac{1}{1 + e^{x}} * \frac{\partial (e^{x})}{\partial x}$

$\frac{\partial L}{\partial x} = t - \frac{e^{x}}{1 + e^{x}}$

$\frac{\partial L}{\partial x} = t - \frac{e^{x} * e^{- x}}{(1 + e^{x}) (e^{- x})}$

$\frac{\partial L}{\partial x} = t - \frac{1}{1 + e^{- x}}$

第二项为 Sigmoid 函数 $P = \frac{1}{1 + e^{- x}}$ ，故，

$\frac{\partial L}{\partial x} = t - P$

可以看出，对于 $x > 0$ 和 $x < 0$ ，其求导的结果是一样的，都是 target 值与 Sigmoid 值的差值.

3. 基于 Python 定制 caffe loss layer

Caffe 官方给出了基于 Python 定制 EuclideanLossLayer 的 Demo.

这里，根据上面的公式推导，创建基于 Python 的 Caffe SigmoidCrossEntropyLossLayer.
Caffe 自带的是 C++ 实现 - SigmoidCrossEntropyLossLayer，可见 Caffe Loss层 - SigmoidCrossEntropyLossLayer.

假设 $L a b e l s \in {0, 1}$ .

3.1 SigmoidCrossEntropyLossLayer 实现

import caffe
import scipy

class CustomSigmoidCrossEntropyLossLayer(caffe.Layer):

    def setup(self, bottom, top):
        # check for all inputs
        if len(bottom) != 2:
            raise Exception("Need two inputs (scores and labels) to compute sigmoid crossentropy loss.")

    def reshape(self, bottom, top):
        # check input dimensions match between the scores and labels
        if bottom[0].count != bottom[1].count:
            raise Exception("Inputs must have the same dimension.")
        # difference would be the same shape as any input
        self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
        # layer output would be an averaged scalar loss
        top[0].reshape(1)

    def forward(self, bottom, top):
        score=bottom[0].data
        label=bottom[1].data

        first_term=np.maximum(score,0)
        second_term=-1*score*label
        third_term=np.log(1+np.exp(-1*np.absolute(score)))

        top[0].data[...]=np.sum(first_term+second_term+third_term)
        sig=scipy.special.expit(score)
        self.diff=(sig-label)
        if np.isnan(top[0].data):
                exit()

    def backward(self, top, propagate_down, bottom):
        bottom[0].diff[...]=self.diff

3.2 prototxt 中定义

layer {
  type: 'Python'
  name: 'loss'
  top: 'loss_opt'
  bottom: 'score'
  bottom: 'label'
  python_param {
    # the module name -- usually the filename -- that needs to be in $PYTHONPATH
    module: 'loss_layers'
    # the layer name -- the class name in the module
    layer: 'CustomSigmoidCrossEntropyLossLayer'
  }
  include {
        phase: TRAIN
  }
  # set loss weight so Caffe knows this is a loss layer.
  # since PythonLayer inherits directly from Layer, this isn't automatically
  # known to Caffe
  loss_weight: 1
}

Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现

1. Sigmoid Cross Entropy Loss 推导

2. Sigmoid Cross Entropy Loss 求导计算

3. 基于 Python 定制 caffe loss layer

3.1 SigmoidCrossEntropyLossLayer 实现

3.2 prototxt 中定义

4. Related

Caffe Loss 层 - SigmoidCrossEntropyLoss 推导与Python实现