pytorch loss function 总结

程序员文章站 2022-05-27 09:39:52

...

最近看了下 PyTorch 的损失函数文档，整理了下自己的理解，重新格式化了公式如下，以便以后查阅。

值得注意的是，很多的 loss 函数都有 size_average 和 reduce 两个布尔类型的参数，需要解释一下。因为一般损失函数都是直接计算 batch 的数据，因此返回的 loss 结果都是维度为 (batch_size, ) 的向量。

如果 reduce = False，那么 size_average 参数失效，直接返回向量形式的 loss；
如果 reduce = True，那么 loss 返回的是标量
- 如果 size_average = True，返回 loss.mean();
- 如果 size_average = True，返回 loss.sum();

所以下面讲解的时候，一般都把这两个参数设置成 False，这样子比较好理解原始的损失函数定义。

下面是常见的损失函数。

nn.L1Loss

loss (x i, y i) = | x i - y i | loss(xi,yi)=|xi-yi|

这里表述的还是不太清楚，其实要求 xx 个元素。

loss_fn = torch.nn.L1Loss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3,4))
target = torch.autograd.Variable(torch.randn(3,4))
loss = loss_fn(input, target)
print(input); print(target); print(loss)
print(input.size(), target.size(), loss.size())

nn.SmoothL1Loss

也叫作 Huber Loss，误差在 (-1,1) 上是平方损失，其他情况是 L1 损失。

loss (x i, y i) = {12 (x i - y i) 2 | x i - y i | - 12, if | x i - y i | < 1 otherwise loss(xi,yi)={12(xi-yi)2if |xi-yi|<1|xi-yi|-12,otherwise

这里很上面的 L1Loss 类似，都是 element-wise 的操作，下标 ii 个元素。

loss_fn = torch.nn.SmoothL1Loss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3,4))
target = torch.autograd.Variable(torch.randn(3,4))
loss = loss_fn(input, target)
print(input); print(target); print(loss)
print(input.size(), target.size(), loss.size())

nn.MSELoss

均方损失函数，用法和上面类似，这里 loss, x, y 的维度是一样的，可以是向量或者矩阵，ii

loss_fn = torch.nn.MSELoss(reduce=False, size_average=False)
input = torch.autograd.Variable(torch.randn(3,4))
target = torch.autograd.Variable(torch.randn(3,4))
loss = loss_fn(input, target)
print(input); print(target); print(loss)
print(input.size(), target.size(), loss.size())

nn.BCELoss

二分类用的交叉熵，用的时候需要在该层前面加上 Sigmoid 函数。交叉熵的定义参考 wikipedia 页面： Cross Entropy

因为离散版的交叉熵定义是 H(p,q)=−∑ipilogqiH(p,q)=−∑ipilog⁡qi 表示该项的权重大小。可以看出，loss, x, y, w 的维度都是一样的。

import torch.nn.functional as F
loss_fn = torch.nn.BCELoss(reduce=False, size_average=False)
input = Variable(torch.randn(3, 4))
target = Variable(torch.FloatTensor(3, 4).random_(2))
loss = loss_fn(F.sigmoid(input), target)
print(input); print(target); print(loss)

这里比较奇怪的是，权重的维度不是 2，而是和 x, y 一样，有时候遇到正负例样本不均衡的时候，可能要多写一句话

class_weight = Variable(torch.FloatTensor([1, 10])) # 这里正例比较少，因此权重要大一些
target = Variable(torch.FloatTensor(3, 4).random_(2))
weight = class_weight[target.long()] # (3, 4)
loss_fn = torch.nn.BCELoss(weight=weight, reduce=False, size_average=False)
# balabala...

其实这样子做的话，如果每次 batch_size 长度不一样，只能每次都定义 loss_fn 了，不知道有没有更好的解决方案。

nn.BCEWithLogitsLoss

上面的 nn.BCELoss 需要手动加上一个 Sigmoid 层，这里是结合了两者，这样做能够利用 log_sum_exp trick，使得数值结果更加稳定（numerical stability）。建议使用这个损失函数。

值得注意的是，文档里的参数只有 weight, size_average 两个，但是实际测试 reduce 参数也是可以用的。此外两个损失函数的 target 要求是 FloatTensor，而且不一样是只能取 0, 1 两种值，任意值应该都是可以的。

nn.CrossEntropyLoss

多分类用的交叉熵损失函数，用这个 loss 前面不需要加 Softmax 层。

这里损害函数的计算，按理说应该也是原始交叉熵公式的形式，但是这里限制了 target 类型为 torch.LongTensr，而且不是多标签意味着标签是 one-hot 编码的形式，即只有一个位置是 1，其他位置都是 0，那么带入交叉熵公式中化简后就成了下面的简化形式。参考 cs231n 作业里对 Softmax Loss 的推导。

loss (x, label) = - w label log e x label \sum N j = 1 e x j = w label [- x label + log \sum j = 1 N e x j] loss(x,label)=-wlabellogexlabel\sumj=1Nexj=wlabel[-xlabel+log\sumj=1Nexj]

这里的 x∈RNx∈RN 的向量，表示标签的权重，样本少的类别，可以考虑把权重设置大一点。

weight = torch.Tensor([1,2,1,1,10])
loss_fn = torch.nn.CrossEntropyLoss(reduce=False, size_average=False, weight=weight)
input = Variable(torch.randn(3, 5)) # (batch_size, C)
target = Variable(torch.FloatTensor(3).random_(5))
loss = loss_fn(input, target)
print(input); print(target); print(loss)

nn.NLLLoss

用于多分类的负对数似然损失函数（Negative Log Likelihood）

loss (x, label) = - x label loss(x,label)=-xlabel

在前面接上一个 nn.LogSoftMax 层就等价于交叉熵损失了。事实上，nn.CrossEntropyLoss 也是调用这个函数。注意这里的 xlabelxlabel 运算后的数值，

nn.NLLLoss2d

和上面类似，但是多了几个维度，一般用在图片上。现在的 pytorch 版本已经和上面的函数合并了。

input, (N, C, H, W)
target, (N, H, W)

比如用全卷积网络做 Semantic Segmentation 时，最后图片的每个点都会预测一个类别标签。

nn.KLDivLoss

KL 散度，又叫做相对熵，算的是两个分布之间的距离，越相似则越接近零。

loss (x, y) = 1 N \sum i = 1 N [y i * (log y i - x i)] loss(x,y)=1N\sumi=1N[yi*(logyi-xi)]

注意这里的 xixi 概率，刚开始还以为 API 弄错了。

nn.MarginRankingLoss

评价相似度的损失

loss (x 1, x 2, y) = max (0, - y * (x 1 - x 2) + margin) loss(x1,x2,y)=max(0,-y*(x1-x2)+margin)

这里的三个都是标量，y 只能取 1 或者 -1，取 1 时表示 x1 比 x2 要大；反之 x2 要大。参数 margin 表示两个向量至少要相聚 margin 的大小，否则 loss 非负。默认 margin 取零。

nn.MultiMarginLoss

多分类（multi-class）的 Hinge 损失，

loss (x, y) = 1 N \sum i = 1, i \neq y N max (0, (margin - x y + x i) p) loss(x,y)=1N\sumi=1,i\neqyNmax(0,(margin-xy+xi)p)

其中 1≤y≤N1≤y≤N 默认取 1，也可以取别的值。参考 cs231n 作业里对 SVM Loss 的推导。

nn.MultiLabelMarginLoss

多类别（multi-class）多分类（multi-classification）的 Hinge 损失，是上面 MultiMarginLoss 在多类别上的拓展。同时限定 p = 1，margin = 1.

loss (x, y) = 1 N \sum i = 1, i \neq y j n \sum j = 1 y j \neq 0 [max (0, 1 - (x y j - x i))] loss(x,y)=1N\sumi=1,i\neqyjn\sumj=1yj\neq0[max(0,1-(xyj-xi))]

这个接口有点坑，是直接从 Torch 那里抄过来的，见 MultiLabelMarginCriterion 的描述。而 Lua 的下标和 Python 不一样，前者的数组下标是从 1 开始的，所以用 0 表示占位符。有几个坑需要注意，

这里的 x,yx,y 那么就会被认为是属于类别 5 和 3，而 4 因为在零后面，因此会被忽略。
上面的公式和说明只是为了和文档保持一致，其实在调用接口的时候，用的是 -1 做占位符，而 0 是第一个类别。

举个梨子，

import torch
loss = torch.nn.MultiLabelMarginLoss()
x = torch.autograd.Variable(torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]]))
y = torch.autograd.Variable(torch.LongTensor([[3, 0, -1, 1]]))
print loss(x, y) # will give 0.8500

按照上面的理解，第 3, 0 个是正确的类，1, 2 不是，那么，

loss = 14 \sum i = 1, 2 \sum j = 3, 0 [max (0, 1 - (x j - x i))] = 14 [(1 - (0.8 - 0.2)) + (1 - (0.1 - 0.2)) + (1 - (0.8 - 0.4)) + (1 - (0.1 - 0.4))] = 14 [0.4 + 1.1 + 0.6 + 1.3] = 0.85 loss=14\sumi=1,2\sumj=3,0[max(0,1-(xj-xi))]=14[(1-(0.8-0.2))+(1-(0.1-0.2))+(1-(0.8-0.4))+(1-(0.1-0.4))]=14[0.4+1.1+0.6+1.3]=0.85

*注意这里推导的第二行，我为了简短，都省略了 max(0, x) 符号。

nn.SoftMarginLoss

多标签二分类问题，这 NN 的形式不同。

loss (x, y) = \sum i = 1 N log (1 + e - y i x i) loss(x,y)=\sumi=1Nlog(1+e-yixi)

nn.MultiLabelSoftMarginLoss

上面的多分类版本，根据最大熵的多标签 one-versue-all 损失，其中 yy

nn.CosineEmbeddingLoss

余弦相似度的损失，目的是让两个向量尽量相近。注意这两个向量都是有梯度的。

loss (x, y) = {

上一篇： Linux（ubuntu）操作系统下如何操作Mysql数据库 --------实例

下一篇： focal loss in pytorch

pytorch loss function 总结

nn.L1Loss

nn.SmoothL1Loss

nn.MSELoss

nn.BCELoss

nn.BCEWithLogitsLoss

nn.CrossEntropyLoss

nn.NLLLoss

nn.NLLLoss2d

nn.KLDivLoss

nn.MarginRankingLoss

nn.MultiMarginLoss

nn.MultiLabelMarginLoss

nn.SoftMarginLoss

nn.MultiLabelSoftMarginLoss

nn.CosineEmbeddingLoss

Pytorch练习--使用Axes3D的库绘制3D的Loss曲线

pytorch 1.3 激励函数(Activation Function)

pytorch TV loss代码分析

Pytorch数据读取之Dataset和DataLoader知识总结

PyTorch梯度裁剪避免训练loss nan的操作

pytorch学习：loss为什么要加item()

[loss] pytorch实现交叉熵损失函数及其变种

pytorch中reshape()、view()、permute()、transpose()总结

win10安装Pytorch经验总结

Python 常用模块系列学习--random模块常用function总结--简单应用--验证码生成