欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【pytorch官方文档学习之三】autograd

程序员文章站 2022-06-11 22:07:01
...

本系列旨在通过阅读官方pytorch代码熟悉CNN各个框架的实现方式和流程。

【pytorch官方文档学习之三】autograd

  • 本文是对官方文档PyTorch: Tensors and autograd的详细注释和个人理解,欢迎交流。
  • 简述
    本系列的前两篇文章都是基于numpy和tensor的CNN人工实现,这种方式对于深度神经网络来说耗时费力,pytorch里面提供的autograd包可以很好地解决自动求导的问题。当调用autograd的时候,网络的前向传播将会定义一张computational graph(计算图):节点为tensor,边为functions that produce output tensors from input tensors(由输入tensor生成输出tensor的函数)。而相对应的,沿该计算图的反向传播会很轻易的计算梯度。如果x定义为x.requires_grad = Truetensor,则x.grad是另一个可以计算某个标量值梯度的tensor
  • 实例
    使用autograd而不是手动设置反向传播参数的方式定义两层神经网络。
# -*- coding: utf-8 -*-
import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10   # 定义batch size、输入维度、隐藏层维度和输出维度

# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, device=device, dtype=dtype) #从标准正态分布中随机采样以模拟数据集中的训练数据
y = torch.randn(N, D_out, device=device, dtype=dtype)#从标准正态分布中随机采样以模拟数据集中的gt数据

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True) # 初始化权重,并设置反向传播中的主动梯度计算
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6 # 定义lr
for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; these are exactly the same operations we used to compute the forward pass using Tensors, but we do not need to keep references to intermediate values since we are not implementing the backward pass by hand.tensors的反向传播的定义与前向传播类似,因此省去了对中间变量的操作。
    y_pred = x.mm(w1).clamp(min=0).mm(w2)    # 定义y的预测值计算函数,分别为卷积、relu和卷积。
 
    # Compute and print loss using operations on Tensors. Now loss is a Tensor of shape (1,) and loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item()) # dict.items() 以列表返回可遍历的(键, 值) 元组数组。

    # Use autograd to compute the backward pass. This call will compute the gradient of loss with respect to all Tensors with requires_grad=True. After this call, w1.grad and w2.grad will be Tensors holding the gradient of the loss with respect to w1 and w2 respectively.
    loss.backward()  # 自动求解w1和w2的loss的导数

    # Manually update weights using gradient descent. Wrap in torch.no_grad() because weights have requires_grad=True, but we don't need to track this in autograd. An alternative way is to operate on weight.data and weight.grad.data. Recall that tensor.data gives a tensor that shares the storage with tensor, but doesn't track history. You can also use torch.optim.SGD to achieve this.
    with torch.no_grad():   # 使用梯度下降方法自动更新权重
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights 更新权重后手动将梯度置零
        w1.grad.zero_()
        w2.grad.zero_()