欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Pytorch官方教程学习笔记(2)

程序员文章站 2022-06-11 21:59:24
...

Autograd: Automatic Differentiation(自动求微分)

Central to all neural networks in PyTorch is the autograd package.
Let’s first briefly visit this, and we will then go to training our
first neural network.

autograd包对Tensors的所有操作都提供微分操作。采用的是运行时定义框架,意味着,反向传播操作取决于代码是如何运行的。

The autograd package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

Tensor

将一个torch.Tensor.requires_grad设置为True,那么该张量上的所有操作
将被追踪。在计算完成后,调用.backward()方法将进行梯度的计算。该张量上的梯度将
被存入.grad属性中。

调用.detach()方法将一个张量从计算历史中移除,该张量后面的计算将不会被追踪。

同样,也可以使用上下文管理器with torch.no_grad():阻止对某张量进行追踪(以及相关
内存的使用)。这种方法适用于对模型进行评估的过程,模型中存在可训练张量的requires_grad=True
但在评估过程中不需要进行梯度的计算。

There’s one more class which is very important for autograd
implementation - a Function.

每一个张量都存在一个.grad_fn 属性,该属性指明了产生该张量的函数类型(由程序员创造的张量的这一属性为None)。

调用.backward()方法计算Tensor的梯度,当张量为标量时,不需要对.backward()的参数
进行指定,否则,需要指定梯度参数gradient,该参数具有匹配的形状。

import torch

Create a tensor and set requires_grad=True to track computation with it

x = torch.ones(2, 2, requires_grad=True)
print(x)
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Do an operation of tensor:

y = x + 2
print(y)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward>)

y was created as a result of an operation, so it has a grad_fn.

print(y.grad_fn)
<AddBackward object at 0x000001C98CA1A748>

Do more operations on y

z = y * y * 3
out = z.mean()

print(z, out)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward>) tensor(27., grad_fn=<MeanBackward1>)

.requires_grad_( ... ) changes an existing Tensor’s requires_grad
flag in-place. The input flag defaults to False if not given.

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
print(b)
False
True
<SumBackward0 object at 0x000001C98CA1AE10>
tensor(4.0653, grad_fn=<SumBackward0>)

Gradients

Let’s backprop now
Because out contains a single scalar, out.backward() is
equivalent to out.backward(torch.tensor(1)).

out.backward()# 调用.backward()方法计算梯度,out为标量,所以不需要指定参数

print gradients d(out)/dx

print("out关于x的梯度为:\n", x.grad)
out关于x的梯度为:
 tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

You should have got a matrix of 4.5. Let’s call the out
Tensoroo”.
We have that o=14izio = \frac{1}{4}\sum_i z_i,
zi=3(xi+2)2z_i = 3(x_i+2)^2 and zixi=1=27z_i\bigr\rvert_{x_i=1} = 27.
Therefore,
oxi=32(xi+2)\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2), hence
oxixi=1=92=4.5\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5.

You can do many crazy things with autograd!

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)# y为张量,需要对gradients参数进行指定
y.backward(gradients)

print(x.grad)
# 需要对.backward()方法的gradient进行指定的例子

x1 = torch.ones([3, 3], requires_grad=True)# 这里需要将requires_grad属性设置为真,不然无法对该张量进行梯度计算,并且不能加入运算符
print(x1)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)
y1 = x1 ** 2
print(y1)
print(y1.grad_fn)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], grad_fn=<PowBackward0>)
<PowBackward0 object at 0x000001C98CA27978>
gradients1 = torch.ones([3, 3]) * 3
print(gradients1)

y1.backward(gradients1)
print(x1.grad)
tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])
tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]])

y1关于x1的梯度公式为2x1,当输入为3时,梯度为6

You can also stop autograd from tracking history on Tensors
with .requires_grad=True by wrapping the code block in
with torch.no_grad():

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)