您现在的位置是: 首页


程序员文章站 2022-06-12 17:08:42



1. torch.nn.Linear(in_features, out_features, bias=True)

2. torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')

3. torch.nn.Sequential(*args)

4. torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

5. torch.nn.ReLU(inplace=False)

6. torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

7. torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

8. torch.nn.Tanh

9. torch.nn.Sigmoid



1. torch.nn.Linear(in_featuresout_featuresbias=True)


  • in_features – size of each input sample。输入样本大小
  • out_features – size of each output sample。输出样本大小
  • bias – If set to False, the layer will not learn an additive bias. Default: True
  • Input: (N,∗,in_features) where ∗ means any number of additional dimensions。
  • Output: (N,∗,out_features) where all but the last dimension are the same shape as the input.除了最后一维,其它维度与in_features相同。
    import torch.nn as nn
    m = nn.Linear(20, 30)
    input = torch.randn(128, 20)
    output = m(input)
    torch.Size([128, 30])


  • 2. torch.nn.MSELoss(size_average=Nonereduce=Nonereduction='mean')

mean squared error (squared L2 norm):均方误差。


  • size_average (booloptional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True。不宜用
  • reduce (booloptional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True。不宜用
  • reduction (stringoptional) – Specifies the reduction to apply to the output: ‘none’ | ‘mean’ | ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the sum of the output will be divided by the number of elements in the output, ‘sum’: the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: ‘mean’
import torch
import torch.nn as nn

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)

3. torch.nn.Sequential(*args)


# Example of using Sequential
model = nn.Sequential(

# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())

4. torch.nn.Conv2d(in_channelsout_channelskernel_sizestride=1padding=0dilation=1groups=1bias=True)


  • stride controls the stride for the cross-correlation, a single number or a tuple. 步长,单个数或元组。

  • padding controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension.上下左右两侧0填充的数量

  • dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

  • groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example, 控制输入与输出的连接

  • At groups=1, all inputs are convolved to all outputs. 所有输入卷积到所有输出
  • At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.该操作等价于并排使用两个conv层,每个conv层可以看到一半的输入通道,并生成一半的输出通道,然后将这两个conv层连接起来。
  • At groups= in_channels, each input channel is convolved with its own set of filters, of size:torch.nn



  • in_channels (int) – Number of channels in the input image
  • out_channels (int) – Number of channels produced by the convolution
  • kernel_size (int or tuple) – Size of the convolving kernel
  • stride (int or tupleoptional) – Stride of the convolution. Default: 1
  • padding (int or tupleoptional) – Zero-padding added to both sides of the input. Default: 0
  • dilation (int or tupleoptional) – Spacing between kernel elements. Default: 1
  • groups (intoptional) – Number of blocked connections from input channels to output channels. Default: 1
  • bias (booloptional) – If True, adds a learnable bias to the output. Default: True
>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)

5. torch.nn.ReLU(inplace=False)


>>> m = nn.ReLU()
>>> input = torch.randn(2)
>>> output = m(input)

6. torch.nn.MaxPool2d(kernel_sizestride=Nonepadding=0dilation=1return_indices=Falseceil_mode=False)


If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points. dilationcontrols the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of what dilation does.

The parameters kernel_sizestridepaddingdilation can either be:

  • a single int – in which case the same value is used for the height and width dimension
  • tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
  • kernel_size – the size of the window to take a max over
  • stride – the stride of the window. Default value is kernel_size
  • padding – implicit zero padding to be added on both sides
  • dilation – a parameter that controls the stride of elements in the window
  • return_indices – if True, will return the max indices along with the outputs. Useful for torch.nn.MaxUnpool2d later。返回最大值的索引
  • ceil_mode – when True, will use ceil instead of floor to compute the output shape
>>> # pool of square window of size=3, stride=2
>>> m = nn.MaxPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)

7. torch.nn.CrossEntropyLoss(weight=Nonesize_average=Noneignore_index=-100reduce=Nonereduction='mean')


This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

The losses are averaged across observations for each minibatch.

  • weight (Tensoroptional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C,C是类别数目,
  • size_average (booloptional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
  • ignore_index (intoptional) – Specifies a target value that is ignored and does not contribute to the input gradient. When size_average is True, the loss is averaged over non-ignored targets.
  • reduce (booloptional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
  • reduction (stringoptional) – Specifies the reduction to apply to the output: ‘none’ | ‘mean’ | ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the sum of the output will be divided by the number of elements in the output, ‘sum’: the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: ‘mean’
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

8. torch.nn.Tanh




>>> m = nn.Tanh()
>>> input = torch.randn(2)
>>> output = m(input)

9. torch.nn.Sigmoid



>>> m = nn.Sigmoid()
>>> input = torch.randn(2)
>>> output = m(input)