学习笔记|Pytorch使用教程16(优化器(一))
程序员文章站
2022-06-06 11:21:55
...
学习笔记|Pytorch使用教程16
本学习笔记主要摘自“深度之眼”,做一个总结,方便查阅。
使用Pytorch版本为1.2
- 什么是优化器
- optimizer的属性
- optimizer的方法
一.什么是优化器
- pytorch的优化器:管理并更新模型中可学习参数的值,使得模型输出更接近真实标签
- 导数:函数在指定坐标轴上的变化率
- 方向导数:指定方向上的变化率
-
梯度:一个向量,方向为方向导数取得最大值的方向
二.optimizer的属性
基本属性:
- defaults: 优化器超参数
- state:参数的缓存,如momentum的缓存
- params_groups: 管理的参数组
- step_count:记录更新次数,学习率调整中使用
三.optimizer的方法
基本方法:
- zero_grad() :清空所管理参数的梯度
- step() :执行一步更新
- add_param_group():添加参数组
- state_dict() :获取优化器当前状态信息字典
- load_state_dict() :加载状态信息字典
pytorch特性:张量梯度不自动清零
完整代码见:学习笔记|Pytorch使用教程05(Dataloader与Dataset)
在下处进行debug:optimizer = optim.SGD(net.parameters(), lr=LR, momentum=0.9)
,并进入(step into)
进入(step into):super(SGD, self).__init__(params, defaults)
跳出,查看optimizer:
SGD (
Parameter Group 0
dampening: 0
lr: 0.01
momentum: 0.9
nesterov: False
weight_decay: 0
)
1.测试step函数
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
import torch
import torch.optim as optim
from tools.common_tools import set_seed
set_seed(1) # 设置随机种子
weight = torch.randn((2, 2), requires_grad=True)
weight.grad = torch.ones((2, 2))
optimizer = optim.SGD([weight], lr=0.1)
# ----------------------------------- step -----------------------------------
# flag = 0
flag = 1
if flag:
print("weight before step:{}".format(weight.data))
optimizer.step() # 修改lr=1 0.1观察结果
print("weight after step:{}".format(weight.data))
输出:
weight before step:tensor([[0.6614, 0.2669],
[0.0617, 0.6213]])
weight after step:tensor([[ 0.5614, 0.1669],
[-0.0383, 0.5213]])
2.测试zero_grad:
# ----------------------------------- zero_grad -----------------------------------
# flag = 0
flag = 1
if flag:
print("weight before step:{}".format(weight.data))
optimizer.step() # 修改lr=1 0.1观察结果
print("weight after step:{}".format(weight.data))
print("weight in optimizer:{}\nweight in weight:{}\n".format(id(optimizer.param_groups[0]['params'][0]), id(weight)))
print("weight.grad is {}\n".format(weight.grad))
optimizer.zero_grad()
print("after optimizer.zero_grad(), weight.grad is\n{}".format(weight.grad))
输出:
weight before step:tensor([[0.6614, 0.2669],
[0.0617, 0.6213]])
weight after step:tensor([[ 0.5614, 0.1669],
[-0.0383, 0.5213]])
weight in optimizer:1879538736088
weight in weight:1879538736088
weight.grad is tensor([[1., 1.],
[1., 1.]])
after optimizer.zero_grad(), weight.grad is
tensor([[0., 0.],
[0., 0.]])
3.测试add_param_group
# ----------------------------------- add_param_group -----------------------------------
# flag = 0
flag = 1
if flag:
print("optimizer.param_groups is\n{}".format(optimizer.param_groups))
w2 = torch.randn((3, 3), requires_grad=True)
optimizer.add_param_group({"params": w2, 'lr': 0.0001})
print("optimizer.param_groups is\n{}".format(optimizer.param_groups))
输出:
optimizer.param_groups is
[{'params': [tensor([[0.6614, 0.2669],
[0.0617, 0.6213]], requires_grad=True)], 'lr': 0.1, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}]
optimizer.param_groups is
[{'params': [tensor([[0.6614, 0.2669],
[0.0617, 0.6213]], requires_grad=True)], 'lr': 0.1, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}, {'params': [tensor([[-0.4519, -0.1661, -1.5228],
[ 0.3817, -1.0276, -0.5631],
[-0.8923, -0.0583, -0.1955]], requires_grad=True)], 'lr': 0.0001, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}]
4.测试state_dict
# ----------------------------------- state_dict -----------------------------------
# flag = 0
flag = 1
if flag:
optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
opt_state_dict = optimizer.state_dict()
print("state_dict before step:\n", opt_state_dict)
for i in range(10):
optimizer.step()
print("state_dict after step:\n", optimizer.state_dict())
torch.save(optimizer.state_dict(), os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))
输出:
state_dict before step:
{'state': {}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [1879518556552]}]}
state_dict after step:
{'state': {1879518556552: {'momentum_buffer': tensor([[6.5132, 6.5132],
[6.5132, 6.5132]])}}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [1879518556552]}]}
5.测试state_dict
# -----------------------------------load state_dict -----------------------------------
# flag = 0
flag = 1
if flag:
optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
state_dict = torch.load(os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))
print("state_dict before load state:\n", optimizer.state_dict())
optimizer.load_state_dict(state_dict)
print("state_dict after load state:\n", optimizer.state_dict())
输出:
state_dict before load state:
{'state': {}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [1879538735128]}]}
state_dict after load state:
{'state': {1879538735128: {'momentum_buffer': tensor([[6.5132, 6.5132],
[6.5132, 6.5132]])}}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [1879538735128]}]}
上一篇: mysql拷贝表的几种方式
下一篇: 床质量差劲