欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Python强化练习之PyTorch opp算法实现月球登陆器

程序员文章站 2022-06-24 19:16:04
目录ppo 算法actor-critic 算法gymlunarlander-v2ppo 算法实现月球登录器ppo概述从今天开始我们会开启一个新的篇章, 带领大家来一起学习 (卷进) 强化学习 (rei...

概述

从今天开始我们会开启一个新的篇章, 带领大家来一起学习 (卷进) 强化学习 (reinforcement learning). 强化学习基于环境, 分析数据采取行动, 从而最大化未来收益.

Python强化练习之PyTorch opp算法实现月球登陆器

强化学习算法种类

Python强化练习之PyTorch opp算法实现月球登陆器

on-policy vs off-policy:

  • on-policy: 训练数据由当前 agent 不断与环境交互得到
  • off-policy: 训练的 agent 和与环境交互的 agent 不是同一个 agent, 即别人与环境交互为我提供训练数据

ppo 算法

ppo (proximal policy optimization) 即近端策略优化. ppo 是一种 on-policy 算法, 通过实现小批量更新, 解决了训练过程中新旧策略的变化差异过大导致不易学习的问题.

Python强化练习之PyTorch opp算法实现月球登陆器

actor-critic 算法

actor-critic 算法共分为两部分. 第一部分为策略函数 actor, 负责生成动作并与环境交互; 第二部分为价值函数, 负责评估 actor 的表现.

Python强化练习之PyTorch opp算法实现月球登陆器

gym

gym 是一个强化学习会经常用到的包. gym 里收集了很多游戏的环境. 下面我们就会用 lunarlander-v2 来实现一个自动版的 “阿波罗登月”.

Python强化练习之PyTorch opp算法实现月球登陆器


安装:

pip install gym

如果遇到报错:

attributeerror: module 'gym.envs.box2d' has no attribute 'lunarlander'

解决办法:

pip install gym[box2d]

lunarlander-v2

lunarlander-v2 是一个月球登陆器. 着陆平台位于坐标 (0, 0). 坐标是状态向量的前两个数字, 从屏幕顶部移动到着陆台和零速度的奖励大约是 100 到 140分. 如果着陆器坠毁或停止, 则回合结束, 获得额外的 -100 或 +100点. 每脚接地为 +10, 点火主机每帧 -0.3分, 正解为200分.

Python强化练习之PyTorch opp算法实现月球登陆器

启动登陆器

代码:

import gym

# 创建环境
env = gym.make("lunarlander-v2")

# 重置环境
env.reset()

# 启动
for i in range(180):

    # 渲染环境
    env.render()

    # 随机移动
    observation, reward, done, info = env.step(env.action_space.sample())

    if i % 10 == 0:
        # 调试输出
        print("观察:", observation)
        print("得分:", reward)

输出结果:

观察: [ 0.00861025 1.4061487 0.42930993 -0.11858992 -0.00789343 -0.05729095
0. 0. ]
得分: 0.4097546298543773
观察: [ 0.04917412 1.3876126 0.41002613 -0.13066985 -0.06578191 -0.12604967
0. 0. ]
得分: -1.0858669952763478
观察: [ 0.08917055 1.3429415 0.43598312 -0.2890789 -0.17471936 -0.23913136
0. 0. ]
得分: -2.9339827504803666
观察: [ 0.1326253 1.2450166 0.44708318 -0.5567949 -0.32039645 -0.28250334
0. 0. ]
得分: -2.2779730990326357
观察: [ 0.18323365 1.1110108 0.615291 -0.61922276 -0.43743232 -0.2921057
0. 0. ]
得分: -3.107298313736037
观察: [ 0.24544087 0.94960684 0.66677517 -0.7835077 -0.5929364 -0.2968613
0. 0. ]
得分: -0.5472611013563438
观察: [ 0.3148238 0.75122666 0.7238519 -0.98458177 -0.72915816 -0.26130882
0. 0. ]
得分: -2.5665300894414416
观察: [ 0.38628978 0.49828076 0.74157137 -1.2624744 -0.85754734 -0.37227553
0. 0. ]
得分: -3.2562193227533087
观察: [ 0.46820658 0.18855602 0.92624503 -1.4677961 -1.08614 -0.4508995
0. 0. ]
得分: -4.017106927961208
观察: [ 0.57930076 -0.09440845 1.4345247 -0.693939 -2.0783656 -5.4039164
1. 0. ]
得分: -100
观察: [ 0.7383894 -0.08930686 1.4662493 -0.13461255 -3.653495 -3.109081
0. 0. ]
得分: -100
观察: [ 0.859124 -0.08471288 0.9377837 0.21408719 -3.8998525 0.10151418
0. 0. ]
得分: -100
观察: [ 9.3801367e-01 -4.6761338e-02 6.5999150e-01 1.4583524e-01
-3.9281998e+00 -4.7179851e-06 0.0000000e+00 1.0000000e+00]
得分: -100
观察: [ 0.9879366 -0.04012476 0.33624884 0.08859511 -4.253908 -1.0233303
0. 0. ]
得分: -100
观察: [ 1.0056045 -0.03840658 0.0733737 0.01812508 -4.6796274 -0.6103991
0. 0. ]
得分: -100
观察: [ 1.0112988 -0.03921754 0.07890484 -0.00624387 -4.845023 -0.17111658
0. 0. ]
得分: -100
观察: [ 1.0234139 -0.04488504 0.15701209 -0.0331554 -4.829875 0.07602684
0. 0. ]
得分: -100
观察: [ 1.0306002e+00 -4.8987642e-02 -1.1189224e-02 8.7506004e-04
-4.8712435e+00 -1.5446089e-01 0.0000000e+00 0.0000000e+00]
得分: -100

ppo 算法实现月球登录器

ppo

import torch
import torch.nn as nn
from torch.distributions import categorical

# 是否使用gpu加速
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)


class memory:
    def __init__(self):
        """初始化"""
        self.actions = []  # 行动(共4种)
        self.states = []  # 状态, 由8个数字组成
        self.logprobs = []  # 概率
        self.rewards = []  # 奖励
        self.is_terminals = []  # 游戏是否结束

    def clear_memory(self):
        """清除memory"""
        del self.actions[:]
        del self.states[:]
        del self.logprobs[:]
        del self.rewards[:]
        del self.is_terminals[:]


class actorcritic(nn.module):
    def __init__(self, state_dim, action_dim, n_latent_var):
        super(actorcritic, self).__init__()

        # 行动
        self.action_layer = nn.sequential(
            # [b, 8] => [b, 64]
            nn.linear(state_dim, n_latent_var),
            nn.tanh(),  # 激活

            # [b, 64] => [b, 64]
            nn.linear(n_latent_var, n_latent_var),
            nn.tanh(),  # 激活

            # [b, 64] => [b, 4]
            nn.linear(n_latent_var, action_dim),
            nn.softmax(dim=-1)
        )

        # 评判
        self.value_layer = nn.sequential(
            # [b, 8] => [8, 64]
            nn.linear(state_dim, n_latent_var),
            nn.tanh(),  # 激活

            # [b, 64] => [b, 64]
            nn.linear(n_latent_var, n_latent_var),
            nn.tanh(),

            # [b, 64] => [b, 1]
            nn.linear(n_latent_var, 1)
        )

    def forward(self):
        """前向传播, 由act替代"""

        raise notimplementederror

    def act(self, state, memory):
        """计算行动"""

        # 转成张量
        state = torch.from_numpy(state).float().to(device)

        # 计算4个方向概率
        action_probs = self.action_layer(state)

        # 通过最大概率计算最终行动方向
        dist = categorical(action_probs)
        action = dist.sample()

        # 存入memory
        memory.states.append(state)
        memory.actions.append(action)
        memory.logprobs.append(dist.log_prob(action))

        # 返回行动
        return action.item()

    def evaluate(self, state, action):
        """
        评估
        :param state: 状态, 2000个一组, 形状为 [2000, 8]
        :param action: 行动, 2000个一组, 形状为 [2000]
        :return:
        """

        # 计算行动概率
        action_probs = self.action_layer(state)
        dist = categorical(action_probs)  # 转换成类别分布

        # 计算概率密度, log(概率)
        action_logprobs = dist.log_prob(action)

        # 计算熵
        dist_entropy = dist.entropy()

        # 评判
        state_value = self.value_layer(state)
        state_value = torch.squeeze(state_value)  # [2000, 1] => [2000]

        # 返回行动概率密度, 评判值, 行动概率熵
        return action_logprobs, state_value, dist_entropy


class ppo:
    def __init__(self, state_dim, action_dim, n_latent_var, lr, betas, gamma, k_epochs, eps_clip):
        self.lr = lr  # 学习率
        self.betas = betas  # betas
        self.gamma = gamma  # gamma
        self.eps_clip = eps_clip  # 裁剪, 限制值范围
        self.k_epochs = k_epochs  # 迭代次数

        # 初始化policy
        self.policy = actorcritic(state_dim, action_dim, n_latent_var).to(device)
        self.policy_old = actorcritic(state_dim, action_dim, n_latent_var).to(device)
        self.policy_old.load_state_dict(self.policy.state_dict())

        self.optimizer = torch.optim.adam(self.policy.parameters(), lr=lr, betas=betas)  # 优化器
        self.mseloss = nn.mseloss()  # 损失函数

    def update(self, memory):
        """更新梯度"""

        # 蒙特卡罗预测状态回报
        rewards = []
        discounted_reward = 0
        for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)):
            # 回合结束
            if is_terminal:
                discounted_reward = 0

            # 更新削减奖励(当前状态奖励 + 0.99*上一状态奖励
            discounted_reward = reward + (self.gamma * discounted_reward)

            # 首插入
            rewards.insert(0, discounted_reward)

        # 标准化奖励
        rewards = torch.tensor(rewards, dtype=torch.float32).to(device)
        rewards = (rewards - rewards.mean()) / (rewards.std() + 1e-5)

        # 张量转换
        old_states = torch.stack(memory.states).to(device).detach()
        old_actions = torch.stack(memory.actions).to(device).detach()
        old_logprobs = torch.stack(memory.logprobs).to(device).detach()

        # 迭代优化 k 次:
        for _ in range(self.k_epochs):
            # 评估
            logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions)

            # 计算ratios
            ratios = torch.exp(logprobs - old_logprobs.detach())

            # 计算损失
            advantages = rewards - state_values.detach()
            surr1 = ratios * advantages
            surr2 = torch.clamp(ratios, 1 - self.eps_clip, 1 + self.eps_clip) * advantages
            loss = -torch.min(surr1, surr2) + 0.5 * self.mseloss(state_values, rewards) - 0.01 * dist_entropy

            # 梯度清零
            self.optimizer.zero_grad()

            # 反向传播
            loss.mean().backward()

            # 更新梯度
            self.optimizer.step()

        # 将新的权重赋值给旧policy
        self.policy_old.load_state_dict(self.policy.state_dict())

main

import gym
import torch
from ppo import memory, ppo

############## 超参数 ##############
env_name = "lunarlander-v2"  # 游戏名字
env = gym.make(env_name)
state_dim = 8  # 状态维度
action_dim = 4  # 行动维度
render = false  # 可视化
solved_reward = 230  # 停止循环条件 (奖励 > 230)
log_interval = 20  # print avg reward in the interval
max_episodes = 50000  # 最大迭代次数
max_timesteps = 300  # 最大单次游戏步数
n_latent_var = 64  # 全连接隐层维度
update_timestep = 2000  # 每2000步policy更新一次
lr = 0.002  # 学习率
betas = (0.9, 0.999)  # betas
gamma = 0.99  # gamma
k_epochs = 4  # policy迭代更新次数
eps_clip = 0.2  # ppo 限幅


#############################################

def main():
    # 实例化
    memory = memory()
    ppo = ppo(state_dim, action_dim, n_latent_var, lr, betas, gamma, k_epochs, eps_clip)

    # 存放
    total_reward = 0
    total_length = 0
    timestep = 0

    # 训练
    for i_episode in range(1, max_episodes + 1):

        # 环境初始化
        state = env.reset()  # 初始化(重新玩)

        # 迭代
        for t in range(max_timesteps):
            timestep += 1

            # 用旧policy得到行动
            action = ppo.policy_old.act(state, memory)

            # 行动
            state, reward, done, _ = env.step(action)  # 得到(新的状态,奖励,是否终止,额外的调试信息)

            # 更新memory(奖励/游戏是否结束)
            memory.rewards.append(reward)
            memory.is_terminals.append(done)

            # 更新梯度
            if timestep % update_timestep == 0:
                ppo.update(memory)

                # memory清零
                memory.clear_memory()

                # 累计步数清零
                timestep = 0

            # 累加
            total_reward += reward

            # 可视化
            if render:
                env.render()

            # 如果游戏结束, 退出
            if done:
                break

        # 游戏步长
        total_length += t

        # 如果达到要求(230分), 退出循环
        if total_reward >= (log_interval * solved_reward):
            print("########## solved! ##########")

            # 保存模型
            torch.save(ppo.policy.state_dict(), './ppo_{}.pth'.format(env_name))

            # 退出循环
            break

        # 输出log, 每20次迭代
        if i_episode % log_interval == 0:
            
            # 求20次迭代平均时长/收益
            avg_length = int(total_length / log_interval)
            running_reward = int(total_reward / log_interval)

            # 调试输出
            print('episode {} \t avg length: {} \t average_reward: {}'.format(i_episode, avg_length, running_reward))

            # 清零
            total_reward = 0
            total_length = 0

if __name__ == '__main__':
    main()

输出结果

episode 20 avg length: 93 reward: -243
episode 40 avg length: 92 reward: -172
episode 60 avg length: 79 reward: -192
episode 80 avg length: 85 reward: -164
episode 100 avg length: 90 reward: -179
episode 120 avg length: 100 reward: -201
episode 140 avg length: 91 reward: -175
episode 160 avg length: 101 reward: -141
episode 180 avg length: 86 reward: -153
episode 200 avg length: 93 reward: -189
episode 220 avg length: 96 reward: -221
episode 240 avg length: 105 reward: -140
episode 260 avg length: 94 reward: -121
episode 280 avg length: 91 reward: -131
episode 300 avg length: 91 reward: -122
episode 320 avg length: 90 reward: -113
episode 340 avg length: 100 reward: -110
episode 360 avg length: 110 reward: -92
episode 380 avg length: 110 reward: -75
episode 400 avg length: 119 reward: -76
episode 420 avg length: 162 reward: -77
episode 440 avg length: 194 reward: -91
episode 460 avg length: 144 reward: -28
episode 480 avg length: 192 reward: -8
episode 500 avg length: 244 reward: -25
episode 520 avg length: 239 reward: -1
episode 540 avg length: 269 reward: 21
episode 560 avg length: 289 reward: 27
episode 580 avg length: 270 reward: 65
episode 600 avg length: 264 reward: 86
episode 620 avg length: 256 reward: 66
episode 640 avg length: 278 reward: 75
episode 660 avg length: 235 reward: 11
episode 680 avg length: 244 reward: 84
episode 700 avg length: 253 reward: 73
episode 720 avg length: 292 reward: 63
episode 740 avg length: 293 reward: 104
episode 760 avg length: 279 reward: 109
episode 780 avg length: 246 reward: 86
episode 800 avg length: 260 reward: 124
episode 820 avg length: 276 reward: 131
episode 840 avg length: 269 reward: 121
episode 860 avg length: 194 reward: 67
episode 880 avg length: 241 reward: 94
episode 900 avg length: 259 reward: 98
episode 920 avg length: 211 reward: 83
episode 940 avg length: 260 reward: 105
episode 960 avg length: 194 reward: 65
episode 980 avg length: 202 reward: 68
episode 1000 avg length: 243 reward: 79
episode 1020 avg length: 260 reward: 66
episode 1040 avg length: 289 reward: 117
episode 1060 avg length: 252 reward: 94
episode 1080 avg length: 262 reward: 114
episode 1100 avg length: 272 reward: 112
episode 1120 avg length: 263 reward: 97
episode 1140 avg length: 256 reward: 93
episode 1160 avg length: 274 reward: 120
episode 1180 avg length: 256 reward: 117
episode 1200 avg length: 241 reward: 105
episode 1220 avg length: 238 reward: 103
episode 1240 avg length: 267 reward: 121
episode 1260 avg length: 283 reward: 124
episode 1280 avg length: 299 reward: 149
episode 1300 avg length: 281 reward: 126
episode 1320 avg length: 266 reward: 102
episode 1340 avg length: 282 reward: 128
episode 1360 avg length: 275 reward: 114
episode 1380 avg length: 285 reward: 105
episode 1400 avg length: 294 reward: 123
episode 1420 avg length: 293 reward: 132
episode 1440 avg length: 248 reward: 85
episode 1460 avg length: 281 reward: 115
episode 1480 avg length: 291 reward: 152
episode 1500 avg length: 279 reward: 130
episode 1520 avg length: 267 reward: 103
episode 1540 avg length: 270 reward: 137
episode 1560 avg length: 269 reward: 120
episode 1580 avg length: 260 reward: 113
episode 1600 avg length: 282 reward: 147
episode 1620 avg length: 259 reward: 125
episode 1640 avg length: 240 reward: 90
episode 1660 avg length: 284 reward: 125
episode 1680 avg length: 282 reward: 123
episode 1700 avg length: 274 reward: 123
episode 1720 avg length: 273 reward: 130
episode 1740 avg length: 260 reward: 117
episode 1760 avg length: 243 reward: 106
episode 1780 avg length: 241 reward: 90
episode 1800 avg length: 290 reward: 144
episode 1820 avg length: 258 reward: 131
episode 1840 avg length: 283 reward: 142
episode 1860 avg length: 262 reward: 100
episode 1880 avg length: 273 reward: 132
episode 1900 avg length: 255 reward: 92
episode 1920 avg length: 251 reward: 117
episode 1940 avg length: 220 reward: 103
episode 1960 avg length: 221 reward: 111
episode 1980 avg length: 205 reward: 83
episode 2000 avg length: 227 reward: 102
episode 2020 avg length: 251 reward: 123
episode 2040 avg length: 227 reward: 100
episode 2060 avg length: 255 reward: 135
episode 2080 avg length: 273 reward: 136
episode 2100 avg length: 256 reward: 126
episode 2120 avg length: 273 reward: 141
episode 2140 avg length: 280 reward: 109
episode 2160 avg length: 266 reward: 112
episode 2180 avg length: 249 reward: 88
episode 2200 avg length: 247 reward: 119
episode 2220 avg length: 270 reward: 143
episode 2240 avg length: 257 reward: 65
episode 2260 avg length: 250 reward: 30
episode 2280 avg length: 261 reward: 112
episode 2300 avg length: 270 reward: 139
episode 2320 avg length: 275 reward: 128
episode 2340 avg length: 290 reward: 149
episode 2360 avg length: 269 reward: 139
episode 2380 avg length: 272 reward: 137
episode 2400 avg length: 232 reward: 105
episode 2420 avg length: 242 reward: 127
episode 2440 avg length: 241 reward: 134
episode 2460 avg length: 249 reward: 113
episode 2480 avg length: 287 reward: 154
episode 2500 avg length: 289 reward: 149
episode 2520 avg length: 258 reward: 129
episode 2540 avg length: 250 reward: 101
episode 2560 avg length: 287 reward: 158
episode 2580 avg length: 271 reward: 145
episode 2600 avg length: 253 reward: 120
episode 2620 avg length: 255 reward: 127
episode 2640 avg length: 254 reward: 122
episode 2660 avg length: 238 reward: 123
episode 2680 avg length: 243 reward: 115
episode 2700 avg length: 241 reward: 93
episode 2720 avg length: 232 reward: 90
episode 2740 avg length: 215 reward: 83
episode 2760 avg length: 241 reward: 112
episode 2780 avg length: 273 reward: 129
episode 2800 avg length: 269 reward: 133
episode 2820 avg length: 246 reward: 91
episode 2840 avg length: 261 reward: 130
episode 2860 avg length: 261 reward: 136
episode 2880 avg length: 289 reward: 128
episode 2900 avg length: 271 reward: 131
episode 2920 avg length: 277 reward: 145
episode 2940 avg length: 251 reward: 117
episode 2960 avg length: 253 reward: 120
episode 2980 avg length: 270 reward: 133
episode 3000 avg length: 240 reward: 85
episode 3020 avg length: 284 reward: 141
episode 3040 avg length: 255 reward: 117
episode 3060 avg length: 299 reward: 134
episode 3080 avg length: 263 reward: 122
episode 3100 avg length: 259 reward: 126
episode 3120 avg length: 270 reward: 125
episode 3140 avg length: 299 reward: 150
episode 3160 avg length: 256 reward: 116
episode 3180 avg length: 264 reward: 124
episode 3200 avg length: 271 reward: 128
episode 3220 avg length: 259 reward: 122
episode 3240 avg length: 261 reward: 125
episode 3260 avg length: 271 reward: 129
episode 3280 avg length: 242 reward: 126
episode 3300 avg length: 218 reward: 93
episode 3320 avg length: 230 reward: 116
episode 3340 avg length: 223 reward: 109
episode 3360 avg length: 249 reward: 122
episode 3380 avg length: 224 reward: 104
episode 3400 avg length: 261 reward: 131
episode 3420 avg length: 280 reward: 140
episode 3440 avg length: 264 reward: 125
episode 3460 avg length: 247 reward: 105
episode 3480 avg length: 276 reward: 141
episode 3500 avg length: 282 reward: 149
episode 3520 avg length: 282 reward: 141
episode 3540 avg length: 290 reward: 152
episode 3560 avg length: 282 reward: 141
episode 3580 avg length: 291 reward: 151
episode 3600 avg length: 289 reward: 166
episode 3620 avg length: 266 reward: 142
episode 3640 avg length: 277 reward: 91
episode 3660 avg length: 272 reward: 114
episode 3680 avg length: 281 reward: 159
episode 3700 avg length: 287 reward: 160
episode 3720 avg length: 254 reward: 78
episode 3740 avg length: 296 reward: 174
episode 3760 avg length: 267 reward: 124
episode 3780 avg length: 273 reward: 148
episode 3800 avg length: 275 reward: 147
episode 3820 avg length: 276 reward: 145
episode 3840 avg length: 283 reward: 151
episode 3860 avg length: 275 reward: 142
episode 3880 avg length: 290 reward: 142
episode 3900 avg length: 290 reward: 154
episode 3920 avg length: 283 reward: 141
episode 3940 avg length: 273 reward: 145
episode 3960 avg length: 290 reward: 161
episode 3980 avg length: 268 reward: 145
episode 4000 avg length: 270 reward: 142
episode 4020 avg length: 283 reward: 156
episode 4040 avg length: 283 reward: 149
episode 4060 avg length: 299 reward: 172
episode 4080 avg length: 292 reward: 158
episode 4100 avg length: 274 reward: 143
episode 4120 avg length: 299 reward: 163
episode 4140 avg length: 290 reward: 153
episode 4160 avg length: 299 reward: 165
episode 4180 avg length: 290 reward: 160
episode 4200 avg length: 299 reward: 157
episode 4220 avg length: 299 reward: 171
episode 4240 avg length: 271 reward: 148
episode 4260 avg length: 265 reward: 139
episode 4280 avg length: 258 reward: 137
episode 4300 avg length: 280 reward: 137
episode 4320 avg length: 262 reward: 133
episode 4340 avg length: 255 reward: 110
episode 4360 avg length: 275 reward: 134
episode 4380 avg length: 282 reward: 154
episode 4400 avg length: 264 reward: 128
episode 4420 avg length: 299 reward: 150
episode 4440 avg length: 275 reward: 151
episode 4460 avg length: 257 reward: 116
episode 4480 avg length: 256 reward: 104
episode 4500 avg length: 263 reward: 134
episode 4520 avg length: 299 reward: 164
episode 4540 avg length: 265 reward: 137
episode 4560 avg length: 265 reward: 147
episode 4580 avg length: 283 reward: 138
episode 4600 avg length: 299 reward: 152
episode 4620 avg length: 281 reward: 154
episode 4640 avg length: 289 reward: 161
episode 4660 avg length: 264 reward: 143
episode 4680 avg length: 285 reward: 138
episode 4700 avg length: 291 reward: 143
episode 4720 avg length: 280 reward: 154
episode 4740 avg length: 284 reward: 125
episode 4760 avg length: 296 reward: 136
episode 4780 avg length: 254 reward: 127
episode 4800 avg length: 281 reward: 147
episode 4820 avg length: 282 reward: 143
episode 4840 avg length: 243 reward: 119
episode 4860 avg length: 280 reward: 139
episode 4880 avg length: 270 reward: 137
episode 4900 avg length: 278 reward: 150
episode 4920 avg length: 203 reward: 83
episode 4940 avg length: 272 reward: 153
episode 4960 avg length: 289 reward: 151
episode 4980 avg length: 289 reward: 157
episode 5000 avg length: 299 reward: 168
episode 5020 avg length: 292 reward: 136
episode 5040 avg length: 290 reward: 158
episode 5060 avg length: 286 reward: 157
episode 5080 avg length: 282 reward: 154
episode 5100 avg length: 278 reward: 121
episode 5120 avg length: 291 reward: 138
episode 5140 avg length: 297 reward: 143
episode 5160 avg length: 290 reward: 165
episode 5180 avg length: 290 reward: 157
episode 5200 avg length: 276 reward: 150
episode 5220 avg length: 278 reward: 149
episode 5240 avg length: 287 reward: 153
episode 5260 avg length: 274 reward: 145
episode 5280 avg length: 299 reward: 176
episode 5300 avg length: 299 reward: 173
episode 5320 avg length: 299 reward: 164
episode 5340 avg length: 271 reward: 157
episode 5360 avg length: 299 reward: 180
episode 5380 avg length: 279 reward: 156
episode 5400 avg length: 268 reward: 133
episode 5420 avg length: 279 reward: 136
episode 5440 avg length: 278 reward: 130
episode 5460 avg length: 268 reward: 137
episode 5480 avg length: 273 reward: 152
episode 5500 avg length: 299 reward: 168
episode 5520 avg length: 266 reward: 95
episode 5540 avg length: 294 reward: 146
episode 5560 avg length: 289 reward: 165
episode 5580 avg length: 288 reward: 139
episode 5600 avg length: 299 reward: 174
episode 5620 avg length: 291 reward: 168
episode 5640 avg length: 281 reward: 147
episode 5660 avg length: 270 reward: 126
episode 5680 avg length: 263 reward: 153
episode 5700 avg length: 283 reward: 161
episode 5720 avg length: 271 reward: 154
episode 5740 avg length: 281 reward: 154
episode 5760 avg length: 281 reward: 144
episode 5780 avg length: 272 reward: 145
episode 5800 avg length: 275 reward: 128
episode 5820 avg length: 290 reward: 159
episode 5840 avg length: 274 reward: 142
episode 5860 avg length: 243 reward: 122
episode 5880 avg length: 236 reward: 124
episode 5900 avg length: 255 reward: 139
episode 5920 avg length: 288 reward: 140
episode 5940 avg length: 271 reward: 140
episode 5960 avg length: 254 reward: 108
episode 5980 avg length: 299 reward: 149
episode 6000 avg length: 289 reward: 149
episode 6020 avg length: 258 reward: 109
episode 6040 avg length: 289 reward: 129
episode 6060 avg length: 238 reward: 94
episode 6080 avg length: 270 reward: 87
episode 6100 avg length: 268 reward: 96
episode 6120 avg length: 279 reward: 142
episode 6140 avg length: 233 reward: 112
episode 6160 avg length: 268 reward: 142
episode 6180 avg length: 260 reward: 133
episode 6200 avg length: 210 reward: 109
episode 6220 avg length: 248 reward: 111
episode 6240 avg length: 229 reward: 92
episode 6260 avg length: 210 reward: 98
episode 6280 avg length: 218 reward: 102
episode 6300 avg length: 225 reward: 117
episode 6320 avg length: 235 reward: 112
episode 6340 avg length: 259 reward: 124
episode 6360 avg length: 252 reward: 113
episode 6380 avg length: 239 reward: 119
episode 6400 avg length: 242 reward: 95
episode 6420 avg length: 249 reward: 111
episode 6440 avg length: 257 reward: 136
episode 6460 avg length: 259 reward: 123
episode 6480 avg length: 259 reward: 112
episode 6500 avg length: 259 reward: 129
episode 6520 avg length: 215 reward: 101
episode 6540 avg length: 249 reward: 137
episode 6560 avg length: 245 reward: 121
episode 6580 avg length: 259 reward: 127
episode 6600 avg length: 267 reward: 142
episode 6620 avg length: 257 reward: 86
episode 6640 avg length: 278 reward: 141
episode 6660 avg length: 255 reward: 92
episode 6680 avg length: 289 reward: 145
episode 6700 avg length: 259 reward: 133
episode 6720 avg length: 247 reward: 116
episode 6740 avg length: 243 reward: 56
episode 6760 avg length: 274 reward: 114
episode 6780 avg length: 279 reward: 133
episode 6800 avg length: 269 reward: 152
episode 6820 avg length: 252 reward: 105
episode 6840 avg length: 254 reward: 123
episode 6860 avg length: 253 reward: 98
episode 6880 avg length: 273 reward: 132
episode 6900 avg length: 249 reward: 108
episode 6920 avg length: 248 reward: 84
episode 6940 avg length: 250 reward: 107
episode 6960 avg length: 279 reward: 99
episode 6980 avg length: 279 reward: 140
episode 7000 avg length: 270 reward: 105
episode 7020 avg length: 250 reward: 109
episode 7040 avg length: 202 reward: 87
episode 7060 avg length: 188 reward: 56
episode 7080 avg length: 229 reward: 93
episode 7100 avg length: 248 reward: 105
episode 7120 avg length: 218 reward: 105
episode 7140 avg length: 213 reward: 77
episode 7160 avg length: 279 reward: 128
episode 7180 avg length: 247 reward: 110
episode 7200 avg length: 269 reward: 124
episode 7220 avg length: 217 reward: 64
episode 7240 avg length: 258 reward: 140
episode 7260 avg length: 279 reward: 116
episode 7280 avg length: 244 reward: 97
episode 7300 avg length: 245 reward: 104
episode 7320 avg length: 213 reward: 81
episode 7340 avg length: 268 reward: 126
episode 7360 avg length: 277 reward: 124
episode 7380 avg length: 251 reward: 122
episode 7400 avg length: 234 reward: 108
episode 7420 avg length: 267 reward: 127
episode 7440 avg length: 218 reward: 89
episode 7460 avg length: 199 reward: 80
episode 7480 avg length: 154 reward: 55
episode 7500 avg length: 228 reward: 114
episode 7520 avg length: 197 reward: 49
episode 7540 avg length: 147 reward: 59
episode 7560 avg length: 139 reward: 49
episode 7580 avg length: 181 reward: 74
episode 7600 avg length: 191 reward: 61
episode 7620 avg length: 176 reward: 78
episode 7640 avg length: 160 reward: 35
episode 7660 avg length: 159 reward: 50
episode 7680 avg length: 143 reward: 68
episode 7700 avg length: 227 reward: 103
episode 7720 avg length: 192 reward: 59
episode 7740 avg length: 248 reward: 118
episode 7760 avg length: 250 reward: 128
episode 7780 avg length: 261 reward: 110
episode 7800 avg length: 279 reward: 157
episode 7820 avg length: 249 reward: 153
episode 7840 avg length: 212 reward: 78
episode 7860 avg length: 249 reward: 144
episode 7880 avg length: 257 reward: 107
episode 7900 avg length: 271 reward: 136
episode 7920 avg length: 244 reward: 129
episode 7940 avg length: 262 reward: 145
episode 7960 avg length: 224 reward: 94
episode 7980 avg length: 247 reward: 110
episode 8000 avg length: 190 reward: 81
episode 8020 avg length: 157 reward: 67
episode 8040 avg length: 171 reward: 67
episode 8060 avg length: 203 reward: 96
episode 8080 avg length: 225 reward: 87
episode 8100 avg length: 166 reward: 84
episode 8120 avg length: 196 reward: 82
episode 8140 avg length: 249 reward: 120
episode 8160 avg length: 216 reward: 112
episode 8180 avg length: 178 reward: 97
episode 8200 avg length: 221 reward: 120
episode 8220 avg length: 265 reward: 122
episode 8240 avg length: 240 reward: 125
episode 8260 avg length: 266 reward: 146
episode 8280 avg length: 253 reward: 116
episode 8300 avg length: 233 reward: 129
episode 8320 avg length: 260 reward: 126
episode 8340 avg length: 264 reward: 138
episode 8360 avg length: 196 reward: 88
episode 8380 avg length: 189 reward: 60
episode 8400 avg length: 227 reward: 66
episode 8420 avg length: 257 reward: 114
episode 8440 avg length: 254 reward: 99
episode 8460 avg length: 268 reward: 127
episode 8480 avg length: 263 reward: 131
episode 8500 avg length: 246 reward: 107
episode 8520 avg length: 281 reward: 127
episode 8540 avg length: 273 reward: 146
episode 8560 avg length: 290 reward: 124
episode 8580 avg length: 261 reward: 103
episode 8600 avg length: 294 reward: 140
episode 8620 avg length: 236 reward: 110
episode 8640 avg length: 261 reward: 125
episode 8660 avg length: 284 reward: 108
episode 8680 avg length: 278 reward: 141
episode 8700 avg length: 256 reward: 124
episode 8720 avg length: 245 reward: 95
episode 8740 avg length: 258 reward: 136
episode 8760 avg length: 289 reward: 147
episode 8780 avg length: 229 reward: 98
episode 8800 avg length: 277 reward: 138
episode 8820 avg length: 237 reward: 129
episode 8840 avg length: 276 reward: 141
episode 8860 avg length: 224 reward: 102
episode 8880 avg length: 220 reward: 108
episode 8900 avg length: 277 reward: 137
episode 8920 avg length: 259 reward: 120
episode 8940 avg length: 242 reward: 124
episode 8960 avg length: 275 reward: 119
episode 8980 avg length: 256 reward: 140
episode 9000 avg length: 263 reward: 110
episode 9020 avg length: 247 reward: 101
episode 9040 avg length: 251 reward: 99
episode 9060 avg length: 266 reward: 128
episode 9080 avg length: 247 reward: 119
episode 9100 avg length: 227 reward: 95
episode 9120 avg length: 242 reward: 95
episode 9140 avg length: 234 reward: 120
episode 9160 avg length: 271 reward: 145
episode 9180 avg length: 234 reward: 106
episode 9200 avg length: 230 reward: 102
episode 9220 avg length: 217 reward: 111
episode 9240 avg length: 182 reward: 68
episode 9260 avg length: 225 reward: 111
episode 9280 avg length: 224 reward: 110
episode 9300 avg length: 195 reward: 97
episode 9320 avg length: 245 reward: 110
episode 9340 avg length: 249 reward: 87
episode 9360 avg length: 238 reward: 105
episode 9380 avg length: 231 reward: 83
episode 9400 avg length: 245 reward: 60
episode 9420 avg length: 251 reward: 81
episode 9440 avg length: 218 reward: 86
episode 9460 avg length: 177 reward: 62
episode 9480 avg length: 212 reward: 64
episode 9500 avg length: 213 reward: 96
episode 9520 avg length: 267 reward: 121
episode 9540 avg length: 195 reward: 89
episode 9560 avg length: 259 reward: 140
episode 9580 avg length: 246 reward: 116
episode 9600 avg length: 266 reward: 122
episode 9620 avg length: 255 reward: 104
episode 9640 avg length: 203 reward: 116
episode 9660 avg length: 239 reward: 117
episode 9680 avg length: 239 reward: 118
episode 9700 avg length: 254 reward: 137
episode 9720 avg length: 269 reward: 144
episode 9740 avg length: 274 reward: 136
episode 9760 avg length: 259 reward: 123
episode 9780 avg length: 230 reward: 102
episode 9800 avg length: 268 reward: 139
episode 9820 avg length: 258 reward: 120
episode 9840 avg length: 271 reward: 111
episode 9860 avg length: 260 reward: 130
episode 9880 avg length: 280 reward: 135
episode 9900 avg length: 269 reward: 126
episode 9920 avg length: 290 reward: 159
episode 9940 avg length: 286 reward: 129
episode 9960 avg length: 259 reward: 117
episode 9980 avg length: 299 reward: 139
episode 10000 avg length: 298 reward: 141
episode 10020 avg length: 294 reward: 115
episode 10040 avg length: 284 reward: 117
episode 10060 avg length: 299 reward: 156
episode 10080 avg length: 290 reward: 145
episode 10100 avg length: 280 reward: 151
episode 10120 avg length: 299 reward: 163
episode 10140 avg length: 290 reward: 151
episode 10160 avg length: 269 reward: 133
episode 10180 avg length: 259 reward: 134
episode 10200 avg length: 272 reward: 137
episode 10220 avg length: 260 reward: 121
episode 10240 avg length: 259 reward: 103
episode 10260 avg length: 260 reward: 126
episode 10280 avg length: 279 reward: 150
episode 10300 avg length: 268 reward: 128
episode 10320 avg length: 261 reward: 140
episode 10340 avg length: 243 reward: 111
episode 10360 avg length: 236 reward: 113
episode 10380 avg length: 219 reward: 112
episode 10400 avg length: 267 reward: 140
episode 10420 avg length: 279 reward: 146
episode 10440 avg length: 285 reward: 137
episode 10460 avg length: 255 reward: 107
episode 10480 avg length: 249 reward: 115
episode 10500 avg length: 241 reward: 106
episode 10520 avg length: 219 reward: 102
episode 10540 avg length: 200 reward: 52
episode 10560 avg length: 267 reward: 124
episode 10580 avg length: 235 reward: 111
episode 10600 avg length: 223 reward: 86
episode 10620 avg length: 220 reward: 90
episode 10640 avg length: 269 reward: 145
episode 10660 avg length: 255 reward: 133
episode 10680 avg length: 277 reward: 130
episode 10700 avg length: 280 reward: 142
episode 10720 avg length: 278 reward: 128
episode 10740 avg length: 260 reward: 90
episode 10760 avg length: 288 reward: 145
episode 10780 avg length: 238 reward: 94
episode 10800 avg length: 278 reward: 136
episode 10820 avg length: 288 reward: 150
episode 10840 avg length: 280 reward: 148
episode 10860 avg length: 240 reward: 117
episode 10880 avg length: 257 reward: 124
episode 10900 avg length: 261 reward: 130
episode 10920 avg length: 229 reward: 115
episode 10940 avg length: 259 reward: 144
episode 10960 avg length: 238 reward: 138
episode 10980 avg length: 230 reward: 112
episode 11000 avg length: 254 reward: 126
episode 11020 avg length: 281 reward: 141
episode 11040 avg length: 270 reward: 120
episode 11060 avg length: 297 reward: 174
episode 11080 avg length: 261 reward: 138
episode 11100 avg length: 259 reward: 125
episode 11120 avg length: 292 reward: 173
episode 11140 avg length: 275 reward: 146
episode 11160 avg length: 299 reward: 165
episode 11180 avg length: 299 reward: 175
episode 11200 avg length: 289 reward: 161
episode 11220 avg length: 299 reward: 166
episode 11240 avg length: 278 reward: 160
episode 11260 avg length: 290 reward: 142
episode 11280 avg length: 299 reward: 164
episode 11300 avg length: 279 reward: 155
episode 11320 avg length: 299 reward: 178
episode 11340 avg length: 299 reward: 150
episode 11360 avg length: 265 reward: 110
episode 11380 avg length: 288 reward: 156
episode 11400 avg length: 278 reward: 146
episode 11420 avg length: 268 reward: 141
episode 11440 avg length: 291 reward: 130
episode 11460 avg length: 299 reward: 161
episode 11480 avg length: 284 reward: 142
episode 11500 avg length: 262 reward: 132
episode 11520 avg length: 287 reward: 149
episode 11540 avg length: 288 reward: 150
episode 11560 avg length: 288 reward: 157
episode 11580 avg length: 288 reward: 156
episode 11600 avg length: 284 reward: 133
episode 11620 avg length: 287 reward: 152
episode 11640 avg length: 249 reward: 130
episode 11660 avg length: 240 reward: 106
episode 11680 avg length: 271 reward: 131
episode 11700 avg length: 271 reward: 117
episode 11720 avg length: 286 reward: 143
episode 11740 avg length: 293 reward: 150
episode 11760 avg length: 289 reward: 155
episode 11780 avg length: 290 reward: 137
episode 11800 avg length: 289 reward: 133
episode 11820 avg length: 273 reward: 121
episode 11840 avg length: 274 reward: 109
episode 11860 avg length: 261 reward: 147
episode 11880 avg length: 210 reward: 114
episode 11900 avg length: 245 reward: 143
episode 11920 avg length: 210 reward: 115
episode 11940 avg length: 218 reward: 102
episode 11960 avg length: 214 reward: 102
episode 11980 avg length: 269 reward: 133
episode 12000 avg length: 262 reward: 144
episode 12020 avg length: 235 reward: 131
episode 12040 avg length: 253 reward: 149
episode 12060 avg length: 227 reward: 120
episode 12080 avg length: 202 reward: 98
episode 12100 avg length: 240 reward: 117
episode 12120 avg length: 231 reward: 108
episode 12140 avg length: 230 reward: 122
episode 12160 avg length: 228 reward: 108
episode 12180 avg length: 233 reward: 96
episode 12200 avg length: 252 reward: 123
episode 12220 avg length: 272 reward: 154
episode 12240 avg length: 251 reward: 122
episode 12260 avg length: 273 reward: 147
episode 12280 avg length: 239 reward: 111
episode 12300 avg length: 287 reward: 126
episode 12320 avg length: 278 reward: 121
episode 12340 avg length: 258 reward: 120
episode 12360 avg length: 265 reward: 104
episode 12380 avg length: 279 reward: 118
episode 12400 avg length: 254 reward: 72
episode 12420 avg length: 187 reward: 74
episode 12440 avg length: 244 reward: 90
episode 12460 avg length: 228 reward: 116
episode 12480 avg length: 258 reward: 125
episode 12500 avg length: 247 reward: 118
episode 12520 avg length: 244 reward: 101
episode 12540 avg length: 267 reward: 135
episode 12560 avg length: 253 reward: 99
episode 12580 avg length: 285 reward: 135
episode 12600 avg length: 259 reward: 113
episode 12620 avg length: 256 reward: 108
episode 12640 avg length: 238 reward: 114
episode 12660 avg length: 265 reward: 128
episode 12680 avg length: 289 reward: 145
episode 12700 avg length: 287 reward: 147
episode 12720 avg length: 283 reward: 139
episode 12740 avg length: 255 reward: 108
episode 12760 avg length: 299 reward: 150
episode 12780 avg length: 277 reward: 138
episode 12800 avg length: 290 reward: 151
episode 12820 avg length: 284 reward: 159
episode 12840 avg length: 299 reward: 150
episode 12860 avg length: 289 reward: 146
episode 12880 avg length: 299 reward: 158
episode 12900 avg length: 299 reward: 144
episode 12920 avg length: 279 reward: 129
episode 12940 avg length: 282 reward: 132
episode 12960 avg length: 280 reward: 132
episode 12980 avg length: 278 reward: 108
episode 13000 avg length: 284 reward: 136
episode 13020 avg length: 289 reward: 128
episode 13040 avg length: 291 reward: 149
episode 13060 avg length: 299 reward: 140
episode 13080 avg length: 292 reward: 141
episode 13100 avg length: 290 reward: 139
episode 13120 avg length: 299 reward: 139
episode 13140 avg length: 291 reward: 151
episode 13160 avg length: 291 reward: 141
episode 13180 avg length: 299 reward: 169
episode 13200 avg length: 299 reward: 162
episode 13220 avg length: 299 reward: 170
episode 13240 avg length: 299 reward: 170
episode 13260 avg length: 299 reward: 155
episode 13280 avg length: 299 reward: 153
episode 13300 avg length: 299 reward: 163
episode 13320 avg length: 281 reward: 131
episode 13340 avg length: 289 reward: 153
episode 13360 avg length: 285 reward: 133
episode 13380 avg length: 280 reward: 134
episode 13400 avg length: 282 reward: 134
episode 13420 avg length: 268 reward: 114
episode 13440 avg length: 290 reward: 142
episode 13460 avg length: 270 reward: 145
episode 13480 avg length: 257 reward: 127
episode 13500 avg length: 272 reward: 139
episode 13520 avg length: 270 reward: 129
episode 13540 avg length: 279 reward: 149
episode 13560 avg length: 269 reward: 95
episode 13580 avg length: 270 reward: 113
episode 13600 avg length: 258 reward: 125
episode 13620 avg length: 217 reward: 88
episode 13640 avg length: 157 reward: 59
episode 13660 avg length: 132 reward: 41
episode 13680 avg length: 220 reward: 92
episode 13700 avg length: 241 reward: 109
episode 13720 avg length: 252 reward: 127
episode 13740 avg length: 253 reward: 104
episode 13760 avg length: 269 reward: 128
episode 13780 avg length: 230 reward: 96
episode 13800 avg length: 258 reward: 127
episode 13820 avg length: 290 reward: 151
episode 13840 avg length: 299 reward: 135
episode 13860 avg length: 280 reward: 111
episode 13880 avg length: 268 reward: 124
episode 13900 avg length: 255 reward: 93
episode 13920 avg length: 258 reward: 128
episode 13940 avg length: 244 reward: 127
episode 13960 avg length: 238 reward: 117
episode 13980 avg length: 237 reward: 104
episode 14000 avg length: 251 reward: 123
episode 14020 avg length: 267 reward: 114
episode 14040 avg length: 271 reward: 109
episode 14060 avg length: 247 reward: 117
episode 14080 avg length: 282 reward: 129
episode 14100 avg length: 266 reward: 144
episode 14120 avg length: 256 reward: 132
episode 14140 avg length: 267 reward: 140
episode 14160 avg length: 289 reward: 149
episode 14180 avg length: 262 reward: 95
episode 14200 avg length: 278 reward: 128
episode 14220 avg length: 279 reward: 136
episode 14240 avg length: 249 reward: 105
episode 14260 avg length: 235 reward: 112
episode 14280 avg length: 273 reward: 131
episode 14300 avg length: 278 reward: 130
episode 14320 avg length: 259 reward: 123
episode 14340 avg length: 234 reward: 78
episode 14360 avg length: 268 reward: 125
episode 14380 avg length: 294 reward: 153
episode 14400 avg length: 299 reward: 150
episode 14420 avg length: 278 reward: 129
episode 14440 avg length: 297 reward: 155
episode 14460 avg length: 247 reward: 106
episode 14480 avg length: 289 reward: 154
episode 14500 avg length: 270 reward: 133
episode 14520 avg length: 259 reward: 133
episode 14540 avg length: 280 reward: 151
episode 14560 avg length: 268 reward: 129
episode 14580 avg length: 299 reward: 159
episode 14600 avg length: 279 reward: 131
episode 14620 avg length: 242 reward: 100
episode 14640 avg length: 236 reward: 114
episode 14660 avg length: 253 reward: 132
episode 14680 avg length: 272 reward: 134
episode 14700 avg length: 297 reward: 175
episode 14720 avg length: 278 reward: 148
episode 14740 avg length: 289 reward: 154
episode 14760 avg length: 288 reward: 148
episode 14780 avg length: 278 reward: 140
episode 14800 avg length: 266 reward: 128
episode 14820 avg length: 288 reward: 161
episode 14840 avg length: 278 reward: 145
episode 14860 avg length: 290 reward: 161
episode 14880 avg length: 279 reward: 139
episode 14900 avg length: 284 reward: 155
episode 14920 avg length: 245 reward: 136
episode 14940 avg length: 269 reward: 137
episode 14960 avg length: 262 reward: 146
episode 14980 avg length: 299 reward: 154
episode 15000 avg length: 273 reward: 172
episode 15020 avg length: 278 reward: 142
episode 15040 avg length: 277 reward: 150
episode 15060 avg length: 232 reward: 119
episode 15080 avg length: 280 reward: 141
episode 15100 avg length: 260 reward: 137
episode 15120 avg length: 285 reward: 167
episode 15140 avg length: 280 reward: 149
episode 15160 avg length: 237 reward: 118
episode 15180 avg length: 223 reward: 111
episode 15200 avg length: 243 reward: 134
episode 15220 avg length: 269 reward: 138
episode 15240 avg length: 251 reward: 127
episode 15260 avg length: 289 reward: 157
episode 15280 avg length: 229 reward: 107
episode 15300 avg length: 277 reward: 143
episode 15320 avg length: 288 reward: 154
episode 15340 avg length: 289 reward: 149
episode 15360 avg length: 288 reward: 145
episode 15380 avg length: 260 reward: 134
episode 15400 avg length: 246 reward: 126
episode 15420 avg length: 244 reward: 132
episode 15440 avg length: 272 reward: 129
episode 15460 avg length: 267 reward: 134
episode 15480 avg length: 263 reward: 135
episode 15500 avg length: 280 reward: 141
episode 15520 avg length: 254 reward: 126
episode 15540 avg length: 275 reward: 133
episode 15560 avg length: 271 reward: 120
episode 15580 avg length: 270 reward: 130
episode 15600 avg length: 299 reward: 144
episode 15620 avg length: 254 reward: 88
episode 15640 avg length: 271 reward: 126
episode 15660 avg length: 289 reward: 153
episode 15680 avg length: 231 reward: 104
episode 15700 avg length: 227 reward: 127
episode 15720 avg length: 174 reward: 82
episode 15740 avg length: 214 reward: 92
episode 15760 avg length: 190 reward: 89
episode 15780 avg length: 159 reward: 49
episode 15800 avg length: 222 reward: 100
episode 15820 avg length: 269 reward: 133
episode 15840 avg length: 243 reward: 100
episode 15860 avg length: 191 reward: 68
episode 15880 avg length: 221 reward: 86
episode 15900 avg length: 206 reward: 109
episode 15920 avg length: 228 reward: 89
episode 15940 avg length: 250 reward: 108
episode 15960 avg length: 229 reward: 110
episode 15980 avg length: 263 reward: 139
episode 16000 avg length: 250 reward: 125
episode 16020 avg length: 270 reward: 140
episode 16040 avg length: 251 reward: 131
episode 16060 avg length: 258 reward: 124
episode 16080 avg length: 268 reward: 130
episode 16100 avg length: 263 reward: 125
episode 16120 avg length: 280 reward: 150
episode 16140 avg length: 267 reward: 132
episode 16160 avg length: 284 reward: 137
episode 16180 avg length: 275 reward: 128
episode 16200 avg length: 269 reward: 132
episode 16220 avg length: 280 reward: 132
episode 16240 avg length: 279 reward: 145
episode 16260 avg length: 299 reward: 152
episode 16280 avg length: 238 reward: 112
episode 16300 avg length: 284 reward: 159
episode 16320 avg length: 280 reward: 136
episode 16340 avg length: 271 reward: 120
episode 16360 avg length: 281 reward: 139
episode 16380 avg length: 267 reward: 141
episode 16400 avg length: 299 reward: 164
episode 16420 avg length: 239 reward: 113
episode 16440 avg length: 276 reward: 143
episode 16460 avg length: 268 reward: 144
episode 16480 avg length: 269 reward: 134
episode 16500 avg length: 273 reward: 148
episode 16520 avg length: 247 reward: 97
episode 16540 avg length: 266 reward: 129
episode 16560 avg length: 267 reward: 119
episode 16580 avg length: 270 reward: 124
episode 16600 avg length: 262 reward: 101
episode 16620 avg length: 257 reward: 121
episode 16640 avg length: 233 reward: 99
episode 16660 avg length: 268 reward: 114
episode 16680 avg length: 261 reward: 126
episode 16700 avg length: 278 reward: 143
episode 16720 avg length: 278 reward: 117
episode 16740 avg length: 266 reward: 135
episode 16760 avg length: 282 reward: 140
episode 16780 avg length: 299 reward: 154
episode 16800 avg length: 279 reward: 144
episode 16820 avg length: 281 reward: 124
episode 16840 avg length: 280 reward: 132
episode 16860 avg length: 278 reward: 148
episode 16880 avg length: 280 reward: 113
episode 16900 avg length: 268 reward: 133
episode 16920 avg length: 291 reward: 147
episode 16940 avg length: 274 reward: 150
episode 16960 avg length: 281 reward: 137
episode 16980 avg length: 251 reward: 126
episode 17000 avg length: 261 reward: 135
episode 17020 avg length: 267 reward: 105
episode 17040 avg length: 274 reward: 176
episode 17060 avg length: 262 reward: 131
episode 17080 avg length: 186 reward: 184
episode 17100 avg length: 225 reward: 150
episode 17120 avg length: 201 reward: 218
episode 17140 avg length: 211 reward: 220
episode 17160 avg length: 221 reward: 218
episode 17180 avg length: 232 reward: 210
episode 17200 avg length: 216 reward: 220
episode 17220 avg length: 226 reward: 203
episode 17240 avg length: 198 reward: 170
episode 17260 avg length: 196 reward: 222
episode 17280 avg length: 214 reward: 196
episode 17300 avg length: 229 reward: 205
episode 17320 avg length: 183 reward: 192
episode 17340 avg length: 212 reward: 186
episode 17360 avg length: 192 reward: 164
########## solved! ##########

到此这篇关于python强化练习之pytorch opp算法实现月球登陆器的文章就介绍到这了,更多相关python opp内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持!