您好,登录后才能下订单哦!
Python中怎样使用Tensorflow2 opp算法实现月球登陆器,针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。
从今天开始我们会开启一个新的篇章, 带领大家来一起学习 (卷进) 强化学习 (Reinforcement Learning). 强化学习基于环境, 分析数据采取行动, 从而最大化未来收益.
On-policy vs Off-policy:
On-policy: 训练数据由当前 agent 不断与环境交互得到
Off-policy: 训练的 agent 和与环境交互的 agent 不是同一个 agent, 即别人与环境交互为我提供训练数据
PPO (Proximal Policy Optimization) 即近端策略优化. PPO 是一种 on-policy 算法, 通过实现小批量更新, 解决了训练过程中新旧策略的变化差异过大导致不易学习的问题.
Actor-Critic 算法共分为两部分. 第一部分为策略函数 Actor, 负责生成动作并与环境交互; 第二部分为价值函数, 负责评估 Actor 的表现.
Gym 是一个强化学习会经常用到的包. Gym 里收集了很多游戏的环境. 下面我们就会用 LunarLander-v2 来实现一个自动版的 “阿波罗登月”.
安装:
pip install gym
如果遇到报错:
AttributeError: module 'gym.envs.box2d' has no attribute 'LunarLander'
解决办法:
pip install gym[box2d]
LunarLander-v2 是一个月球登陆器. 着陆平台位于坐标 (0, 0). 坐标是状态向量的前两个数字, 从屏幕顶部移动到着陆台和零速度的奖励大约是 100 到 140分. 如果着陆器坠毁或停止, 则回合结束, 获得额外的 -100 或 +100点. 每脚接地为 +10, 点火主机每帧 -0.3分, 正解为200分.
代码:
import gym # 创建环境 env = gym.make("LunarLander-v2") # 重置环境 env.reset() # 启动 for i in range(180): # 渲染环境 env.render() # 随机移动 observation, reward, done, info = env.step(env.action_space.sample()) if i % 10 == 0: # 调试输出 print("观察:", observation) print("得分:", reward)
输出结果:
观察: [ 0.00861025 1.4061487 0.42930993 -0.11858992 -0.00789343 -0.05729095
0. 0. ]
得分: 0.4097546298543773
观察: [ 0.04917412 1.3876126 0.41002613 -0.13066985 -0.06578191 -0.12604967
0. 0. ]
得分: -1.0858669952763478
观察: [ 0.08917055 1.3429415 0.43598312 -0.2890789 -0.17471936 -0.23913136
0. 0. ]
得分: -2.9339827504803666
观察: [ 0.1326253 1.2450166 0.44708318 -0.5567949 -0.32039645 -0.28250334
0. 0. ]
得分: -2.2779730990326357
观察: [ 0.18323365 1.1110108 0.615291 -0.61922276 -0.43743232 -0.2921057
0. 0. ]
得分: -3.107298313736037
观察: [ 0.24544087 0.94960684 0.66677517 -0.7835077 -0.5929364 -0.2968613
0. 0. ]
得分: -0.5472611013563438
观察: [ 0.3148238 0.75122666 0.7238519 -0.98458177 -0.72915816 -0.26130882
0. 0. ]
得分: -2.5665300894414416
观察: [ 0.38628978 0.49828076 0.74157137 -1.2624744 -0.85754734 -0.37227553
0. 0. ]
得分: -3.2562193227533087
观察: [ 0.46820658 0.18855602 0.92624503 -1.4677961 -1.08614 -0.4508995
0. 0. ]
得分: -4.017106927961208
观察: [ 0.57930076 -0.09440845 1.4345247 -0.693939 -2.0783656 -5.4039164
1. 0. ]
得分: -100
观察: [ 0.7383894 -0.08930686 1.4662493 -0.13461255 -3.653495 -3.109081
0. 0. ]
得分: -100
观察: [ 0.859124 -0.08471288 0.9377837 0.21408719 -3.8998525 0.10151418
0. 0. ]
得分: -100
观察: [ 9.3801367e-01 -4.6761338e-02 6.5999150e-01 1.4583524e-01
-3.9281998e+00 -4.7179851e-06 0.0000000e+00 1.0000000e+00]
得分: -100
观察: [ 0.9879366 -0.04012476 0.33624884 0.08859511 -4.253908 -1.0233303
0. 0. ]
得分: -100
观察: [ 1.0056045 -0.03840658 0.0733737 0.01812508 -4.6796274 -0.6103991
0. 0. ]
得分: -100
观察: [ 1.0112988 -0.03921754 0.07890484 -0.00624387 -4.845023 -0.17111658
0. 0. ]
得分: -100
观察: [ 1.0234139 -0.04488504 0.15701209 -0.0331554 -4.829875 0.07602684
0. 0. ]
得分: -100
观察: [ 1.0306002e+00 -4.8987642e-02 -1.1189224e-02 8.7506004e-04
-4.8712435e+00 -1.5446089e-01 0.0000000e+00 0.0000000e+00]
得分: -100
import numpy as np import tensorflow as tf from tensorflow_probability.python.distributions import Categorical class Memory: def __init__(self): """初始化""" self.actions = [] # 行动(共4种) self.states = [] # 状态, 由8个数字组成 self.logprobs = [] # 概率 self.rewards = [] # 奖励 self.is_terminals = [] # 游戏是否结束 def clear_memory(self): """清除memory""" del self.actions[:] del self.states[:] del self.logprobs[:] del self.rewards[:] del self.is_terminals[:] class ActorCritic(tf.keras.Model): def __init__(self, state_dim, action_dim, n_latent_var): super(ActorCritic, self).__init__() # 行动 self.action_layer = tf.keras.Sequential([ # [b, 8] => [b, 64] tf.keras.layers.Dense(n_latent_var, activation="tanh"), # [b, 64] => [b, 64] tf.keras.layers.Dense(n_latent_var, activation="tanh"), # [b, 64] => [b, 4] tf.keras.layers.Dense(action_dim, activation="softmax") ]) # 评判 self.value_layer = tf.keras.Sequential([ # [b, 8] => [b, 64] tf.keras.layers.Dense(n_latent_var, activation="tanh"), # [b, 64] => [b, 64] tf.keras.layers.Dense(n_latent_var, activation="tanh"), # [b, 64] => [b, 1] tf.keras.layers.Dense(1) ]) def forward(self): """前向传播, 由act替代""" raise NotImplementedError def build(self, input_shape): # No weight to train. super(ActorCritic, self).build(input_shape) # Be sure to call this at the end def act(self, state, memory): """计算行动""" # 计算4个方向概率 action_probs = self.action_layer(state) # 通过最大概率计算最终行动方向 dist = Categorical(action_probs) action = dist.sample() # 存入memory memory.states.append(state) memory.actions.append(action) memory.logprobs.append(dist.log_prob(action)) # 返回行动 return action.numpy()[0] def evaluate(self, state, action): """ 评估 :param state: 状态, 2000个一组, 形状为 [2000, 8] :param action: 行动, 2000个一组, 形状为 [2000] :return: """ # 计算行动概率 action_probs = self.action_layer(state) dist = Categorical(action_probs) # 转换成类别分布 # 计算概率密度, log(概率) action_logprobs = dist.log_prob(action) # 计算熵 dist_entropy = dist.entropy() dist_entropy = tf.squeeze(dist_entropy) # 评判 state_value = self.value_layer(state) state_value = tf.squeeze(state_value) # [2000, 1] => [2000] # 返回行动概率密度, 评判值, 行动概率熵 return action_logprobs, state_value, dist_entropy class PPO: def __init__(self, state_dim, action_dim, n_latent_var, lr, betas, gamma, K_epochs, eps_clip): self.lr = lr # 学习率 self.betas = betas # betas self.gamma = gamma # gamma self.eps_clip = eps_clip # 裁剪, 限制值范围 self.K_epochs = K_epochs # 迭代次数 # 初始化policy self.policy = ActorCritic(state_dim, action_dim, n_latent_var) self.policy_old = ActorCritic(state_dim, action_dim, n_latent_var) self.optimizer = tf.keras.optimizers.Adam(lr=lr) # 优化器 self.MseLoss = tf.keras.losses.MeanSquaredError() # 损失函数 def update(self, memory): """更新梯度""" # 蒙特卡罗预测状态回报 rewards = [] discounted_reward = 0 for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)): # 回合结束 if is_terminal: discounted_reward = 0 # 更新削减奖励(当前状态奖励 + 0.99*上一状态奖励 discounted_reward = reward + (self.gamma * discounted_reward) # 首插入 rewards.insert(0, discounted_reward) # 标准化奖励 rewards = tf.convert_to_tensor(rewards, dtype=tf.float32) rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-5) # 张量转换 old_states = tf.stack(memory.states) old_actions = tf.stack(memory.actions) old_logprobs = tf.stack(memory.logprobs) # 迭代优化 K 次: for _ in range(self.K_epochs): with tf.GradientTape() as tape: # 评估 logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions) # 计算ratios ratios = tf.exp(logprobs - old_logprobs) ratios = tf.squeeze(ratios) # 计算损失 advantages = rewards - state_values surr1 = ratios * advantages surr2 = tf.clip_by_value(ratios, 1 - self.eps_clip, 1 + self.eps_clip) * advantages loss = -tf.minimum(surr1, surr2) + 0.5 * self.MseLoss(state_values, rewards) - 0.01 * dist_entropy # 更新梯度 grads = tape.gradient(loss, self.policy.action_layer.trainable_variables + self.policy.value_layer.trainable_variables) self.optimizer.apply_gradients(zip(grads, self.policy.action_layer.trainable_variables + self.policy.value_layer.trainable_variables)) # 将新的权重赋值给旧policy self.policy_old.action_layer = self.policy.action_layer self.policy_old.value_layer = self.policy.value_layer
import gym import tensorflow as tf from PPO import Memory, PPO ############## 超参数 ############## env_name = "LunarLander-v2" # 游戏名字 env = gym.make(env_name) state_dim = 8 # 状态维度 action_dim = 4 # 行动维度 render = False # 可视化 solved_reward = 230 # 停止循环条件 (奖励 > 230) log_interval = 20 # print avg reward in the interval max_episodes = 50000 # 最大迭代次数 max_timesteps = 300 # 最大单次游戏步数 n_latent_var = 64 # 全连接隐层维度 update_timestep = 2000 # 每2000步policy更新一次 lr = 0.002 # 学习率 betas = (0.9, 0.999) # betas gamma = 0.99 # gamma K_epochs = 4 # policy迭代更新次数 eps_clip = 0.2 # PPO 限幅 ############################################# def main(): # 实例化 memory = Memory() ppo = PPO(state_dim, action_dim, n_latent_var, lr, betas, gamma, K_epochs, eps_clip) # 存放 total_reward = 0 total_length = 0 timestep = 0 # 训练 for i_episode in range(1, max_episodes + 1): # 环境初始化 state = env.reset() # 初始化(重新玩) # 转换成tensor state = tf.convert_to_tensor(state) state = tf.reshape(state, [1, 8]) # 迭代 for t in range(max_timesteps): timestep += 1 # 用旧policy得到行动 action = ppo.policy_old.act(state, memory) # 行动 state, reward, done, _ = env.step(action) # 得到(新的状态,奖励,是否终止,额外的调试信息) # 转换成tensor state = tf.convert_to_tensor(state) state = tf.reshape(state, [1, 8]) # 更新memory(奖励/游戏是否结束) memory.rewards.append(reward) memory.is_terminals.append(done) # 更新梯度 if timestep % update_timestep == 0: ppo.update(memory) # memory清零 memory.clear_memory() # 累计步数清零 timestep = 0 # 累加 total_reward += reward # 可视化 if render: env.render() # 如果游戏结束, 退出 if done: break # 游戏步长 total_length += t # 如果达到要求(230分), 退出循环 if total_reward >= (log_interval * solved_reward): print("########## Solved! ##########") # 保存模型 tf.keras.models.save_model(ppo.policy.action_layer, r"\model\action") tf.keras.models.save_model(ppo.policy.value_layer, r"\model\value") # 退出循环 break # 输出log, 每20次迭代 if i_episode % log_interval == 0: # 求20次迭代平均时长/收益 avg_length = int(total_length / log_interval) running_reward = int(total_reward / log_interval) # 调试输出 print('Episode {} \t avg length: {} \t average_reward: {}'.format(i_episode, avg_length, running_reward)) # 清零 total_reward = 0 total_length = 0 if __name__ == '__main__': main()
Episode 20 avg length: 93 reward: -243
Episode 40 avg length: 92 reward: -172
Episode 60 avg length: 79 reward: -192
Episode 80 avg length: 85 reward: -164
Episode 100 avg length: 90 reward: -179
Episode 120 avg length: 100 reward: -201
Episode 140 avg length: 91 reward: -175
Episode 160 avg length: 101 reward: -141
Episode 180 avg length: 86 reward: -153
Episode 200 avg length: 93 reward: -189
Episode 220 avg length: 96 reward: -221
Episode 240 avg length: 105 reward: -140
Episode 260 avg length: 94 reward: -121
Episode 280 avg length: 91 reward: -131
Episode 300 avg length: 91 reward: -122
Episode 320 avg length: 90 reward: -113
Episode 340 avg length: 100 reward: -110
Episode 360 avg length: 110 reward: -92
Episode 380 avg length: 110 reward: -75
Episode 400 avg length: 119 reward: -76
Episode 420 avg length: 162 reward: -77
Episode 440 avg length: 194 reward: -91
Episode 460 avg length: 144 reward: -28
Episode 480 avg length: 192 reward: -8
Episode 500 avg length: 244 reward: -25
Episode 520 avg length: 239 reward: -1
Episode 540 avg length: 269 reward: 21
Episode 560 avg length: 289 reward: 27
Episode 580 avg length: 270 reward: 65
Episode 600 avg length: 264 reward: 86
Episode 620 avg length: 256 reward: 66
Episode 640 avg length: 278 reward: 75
Episode 660 avg length: 235 reward: 11
Episode 680 avg length: 244 reward: 84
Episode 700 avg length: 253 reward: 73
Episode 720 avg length: 292 reward: 63
Episode 740 avg length: 293 reward: 104
Episode 760 avg length: 279 reward: 109
Episode 780 avg length: 246 reward: 86
Episode 800 avg length: 260 reward: 124
Episode 820 avg length: 276 reward: 131
Episode 840 avg length: 269 reward: 121
Episode 860 avg length: 194 reward: 67
Episode 880 avg length: 241 reward: 94
Episode 900 avg length: 259 reward: 98
Episode 920 avg length: 211 reward: 83
Episode 940 avg length: 260 reward: 105
Episode 960 avg length: 194 reward: 65
Episode 980 avg length: 202 reward: 68
Episode 1000 avg length: 243 reward: 79
Episode 1020 avg length: 260 reward: 66
Episode 1040 avg length: 289 reward: 117
Episode 1060 avg length: 252 reward: 94
Episode 1080 avg length: 262 reward: 114
Episode 1100 avg length: 272 reward: 112
Episode 1120 avg length: 263 reward: 97
Episode 1140 avg length: 256 reward: 93
Episode 1160 avg length: 274 reward: 120
Episode 1180 avg length: 256 reward: 117
Episode 1200 avg length: 241 reward: 105
Episode 1220 avg length: 238 reward: 103
Episode 1240 avg length: 267 reward: 121
Episode 1260 avg length: 283 reward: 124
Episode 1280 avg length: 299 reward: 149
Episode 1300 avg length: 281 reward: 126
Episode 1320 avg length: 266 reward: 102
Episode 1340 avg length: 282 reward: 128
Episode 1360 avg length: 275 reward: 114
Episode 1380 avg length: 285 reward: 105
Episode 1400 avg length: 294 reward: 123
Episode 1420 avg length: 293 reward: 132
Episode 1440 avg length: 248 reward: 85
Episode 1460 avg length: 281 reward: 115
Episode 1480 avg length: 291 reward: 152
Episode 1500 avg length: 279 reward: 130
Episode 1520 avg length: 267 reward: 103
Episode 1540 avg length: 270 reward: 137
Episode 1560 avg length: 269 reward: 120
Episode 1580 avg length: 260 reward: 113
Episode 1600 avg length: 282 reward: 147
Episode 1620 avg length: 259 reward: 125
Episode 1640 avg length: 240 reward: 90
Episode 1660 avg length: 284 reward: 125
Episode 1680 avg length: 282 reward: 123
Episode 1700 avg length: 274 reward: 123
Episode 1720 avg length: 273 reward: 130
Episode 1740 avg length: 260 reward: 117
Episode 1760 avg length: 243 reward: 106
Episode 1780 avg length: 241 reward: 90
Episode 1800 avg length: 290 reward: 144
Episode 1820 avg length: 258 reward: 131
Episode 1840 avg length: 283 reward: 142
Episode 1860 avg length: 262 reward: 100
Episode 1880 avg length: 273 reward: 132
Episode 1900 avg length: 255 reward: 92
Episode 1920 avg length: 251 reward: 117
Episode 1940 avg length: 220 reward: 103
Episode 1960 avg length: 221 reward: 111
Episode 1980 avg length: 205 reward: 83
Episode 2000 avg length: 227 reward: 102
Episode 2020 avg length: 251 reward: 123
Episode 2040 avg length: 227 reward: 100
Episode 2060 avg length: 255 reward: 135
Episode 2080 avg length: 273 reward: 136
Episode 2100 avg length: 256 reward: 126
Episode 2120 avg length: 273 reward: 141
Episode 2140 avg length: 280 reward: 109
Episode 2160 avg length: 266 reward: 112
Episode 2180 avg length: 249 reward: 88
Episode 2200 avg length: 247 reward: 119
Episode 2220 avg length: 270 reward: 143
Episode 2240 avg length: 257 reward: 65
Episode 2260 avg length: 250 reward: 30
Episode 2280 avg length: 261 reward: 112
Episode 2300 avg length: 270 reward: 139
Episode 2320 avg length: 275 reward: 128
Episode 2340 avg length: 290 reward: 149
Episode 2360 avg length: 269 reward: 139
Episode 2380 avg length: 272 reward: 137
Episode 2400 avg length: 232 reward: 105
Episode 2420 avg length: 242 reward: 127
Episode 2440 avg length: 241 reward: 134
Episode 2460 avg length: 249 reward: 113
Episode 2480 avg length: 287 reward: 154
Episode 2500 avg length: 289 reward: 149
Episode 2520 avg length: 258 reward: 129
Episode 2540 avg length: 250 reward: 101
Episode 2560 avg length: 287 reward: 158
Episode 2580 avg length: 271 reward: 145
Episode 2600 avg length: 253 reward: 120
Episode 2620 avg length: 255 reward: 127
Episode 2640 avg length: 254 reward: 122
Episode 2660 avg length: 238 reward: 123
Episode 2680 avg length: 243 reward: 115
Episode 2700 avg length: 241 reward: 93
Episode 2720 avg length: 232 reward: 90
Episode 2740 avg length: 215 reward: 83
Episode 2760 avg length: 241 reward: 112
Episode 2780 avg length: 273 reward: 129
Episode 2800 avg length: 269 reward: 133
Episode 2820 avg length: 246 reward: 91
Episode 2840 avg length: 261 reward: 130
Episode 2860 avg length: 261 reward: 136
Episode 2880 avg length: 289 reward: 128
Episode 2900 avg length: 271 reward: 131
Episode 2920 avg length: 277 reward: 145
Episode 2940 avg length: 251 reward: 117
Episode 2960 avg length: 253 reward: 120
Episode 2980 avg length: 270 reward: 133
Episode 3000 avg length: 240 reward: 85
Episode 3020 avg length: 284 reward: 141
Episode 3040 avg length: 255 reward: 117
Episode 3060 avg length: 299 reward: 134
Episode 3080 avg length: 263 reward: 122
Episode 3100 avg length: 259 reward: 126
Episode 3120 avg length: 270 reward: 125
Episode 3140 avg length: 299 reward: 150
Episode 3160 avg length: 256 reward: 116
Episode 3180 avg length: 264 reward: 124
Episode 3200 avg length: 271 reward: 128
Episode 3220 avg length: 259 reward: 122
Episode 3240 avg length: 261 reward: 125
Episode 3260 avg length: 271 reward: 129
Episode 3280 avg length: 242 reward: 126
Episode 3300 avg length: 218 reward: 93
Episode 3320 avg length: 230 reward: 116
Episode 3340 avg length: 223 reward: 109
Episode 3360 avg length: 249 reward: 122
Episode 3380 avg length: 224 reward: 104
Episode 3400 avg length: 261 reward: 131
Episode 3420 avg length: 280 reward: 140
Episode 3440 avg length: 264 reward: 125
Episode 3460 avg length: 247 reward: 105
Episode 3480 avg length: 276 reward: 141
Episode 3500 avg length: 282 reward: 149
Episode 3520 avg length: 282 reward: 141
Episode 3540 avg length: 290 reward: 152
Episode 3560 avg length: 282 reward: 141
Episode 3580 avg length: 291 reward: 151
Episode 3600 avg length: 289 reward: 166
Episode 3620 avg length: 266 reward: 142
Episode 3640 avg length: 277 reward: 91
Episode 3660 avg length: 272 reward: 114
Episode 3680 avg length: 281 reward: 159
Episode 3700 avg length: 287 reward: 160
Episode 3720 avg length: 254 reward: 78
Episode 3740 avg length: 296 reward: 174
Episode 3760 avg length: 267 reward: 124
Episode 3780 avg length: 273 reward: 148
Episode 3800 avg length: 275 reward: 147
Episode 3820 avg length: 276 reward: 145
Episode 3840 avg length: 283 reward: 151
Episode 3860 avg length: 275 reward: 142
Episode 3880 avg length: 290 reward: 142
Episode 3900 avg length: 290 reward: 154
Episode 3920 avg length: 283 reward: 141
Episode 3940 avg length: 273 reward: 145
Episode 3960 avg length: 290 reward: 161
Episode 3980 avg length: 268 reward: 145
Episode 4000 avg length: 270 reward: 142
Episode 4020 avg length: 283 reward: 156
Episode 4040 avg length: 283 reward: 149
Episode 4060 avg length: 299 reward: 172
Episode 4080 avg length: 292 reward: 158
Episode 4100 avg length: 274 reward: 143
Episode 4120 avg length: 299 reward: 163
Episode 4140 avg length: 290 reward: 153
Episode 4160 avg length: 299 reward: 165
Episode 4180 avg length: 290 reward: 160
Episode 4200 avg length: 299 reward: 157
Episode 4220 avg length: 299 reward: 171
Episode 4240 avg length: 271 reward: 148
Episode 4260 avg length: 265 reward: 139
Episode 4280 avg length: 258 reward: 137
Episode 4300 avg length: 280 reward: 137
Episode 4320 avg length: 262 reward: 133
Episode 4340 avg length: 255 reward: 110
Episode 4360 avg length: 275 reward: 134
Episode 4380 avg length: 282 reward: 154
Episode 4400 avg length: 264 reward: 128
Episode 4420 avg length: 299 reward: 150
Episode 4440 avg length: 275 reward: 151
Episode 4460 avg length: 257 reward: 116
Episode 4480 avg length: 256 reward: 104
Episode 4500 avg length: 263 reward: 134
Episode 4520 avg length: 299 reward: 164
Episode 4540 avg length: 265 reward: 137
Episode 4560 avg length: 265 reward: 147
Episode 4580 avg length: 283 reward: 138
Episode 4600 avg length: 299 reward: 152
Episode 4620 avg length: 281 reward: 154
Episode 4640 avg length: 289 reward: 161
Episode 4660 avg length: 264 reward: 143
Episode 4680 avg length: 285 reward: 138
Episode 4700 avg length: 291 reward: 143
Episode 4720 avg length: 280 reward: 154
Episode 4740 avg length: 284 reward: 125
Episode 4760 avg length: 296 reward: 136
Episode 4780 avg length: 254 reward: 127
Episode 4800 avg length: 281 reward: 147
Episode 4820 avg length: 282 reward: 143
Episode 4840 avg length: 243 reward: 119
Episode 4860 avg length: 280 reward: 139
Episode 4880 avg length: 270 reward: 137
Episode 4900 avg length: 278 reward: 150
Episode 4920 avg length: 203 reward: 83
Episode 4940 avg length: 272 reward: 153
Episode 4960 avg length: 289 reward: 151
Episode 4980 avg length: 289 reward: 157
Episode 5000 avg length: 299 reward: 168
Episode 5020 avg length: 292 reward: 136
Episode 5040 avg length: 290 reward: 158
Episode 5060 avg length: 286 reward: 157
Episode 5080 avg length: 282 reward: 154
Episode 5100 avg length: 278 reward: 121
Episode 5120 avg length: 291 reward: 138
Episode 5140 avg length: 297 reward: 143
Episode 5160 avg length: 290 reward: 165
Episode 5180 avg length: 290 reward: 157
Episode 5200 avg length: 276 reward: 150
Episode 5220 avg length: 278 reward: 149
Episode 5240 avg length: 287 reward: 153
Episode 5260 avg length: 274 reward: 145
Episode 5280 avg length: 299 reward: 176
Episode 5300 avg length: 299 reward: 173
Episode 5320 avg length: 299 reward: 164
Episode 5340 avg length: 271 reward: 157
Episode 5360 avg length: 299 reward: 180
Episode 5380 avg length: 279 reward: 156
Episode 5400 avg length: 268 reward: 133
Episode 5420 avg length: 279 reward: 136
Episode 5440 avg length: 278 reward: 130
Episode 5460 avg length: 268 reward: 137
Episode 5480 avg length: 273 reward: 152
Episode 5500 avg length: 299 reward: 168
Episode 5520 avg length: 266 reward: 95
Episode 5540 avg length: 294 reward: 146
Episode 5560 avg length: 289 reward: 165
Episode 5580 avg length: 288 reward: 139
Episode 5600 avg length: 299 reward: 174
Episode 5620 avg length: 291 reward: 168
Episode 5640 avg length: 281 reward: 147
Episode 5660 avg length: 270 reward: 126
Episode 5680 avg length: 263 reward: 153
Episode 5700 avg length: 283 reward: 161
Episode 5720 avg length: 271 reward: 154
Episode 5740 avg length: 281 reward: 154
Episode 5760 avg length: 281 reward: 144
Episode 5780 avg length: 272 reward: 145
Episode 5800 avg length: 275 reward: 128
Episode 5820 avg length: 290 reward: 159
Episode 5840 avg length: 274 reward: 142
Episode 5860 avg length: 243 reward: 122
Episode 5880 avg length: 236 reward: 124
Episode 5900 avg length: 255 reward: 139
Episode 5920 avg length: 288 reward: 140
Episode 5940 avg length: 271 reward: 140
Episode 5960 avg length: 254 reward: 108
Episode 5980 avg length: 299 reward: 149
Episode 6000 avg length: 289 reward: 149
Episode 6020 avg length: 258 reward: 109
Episode 6040 avg length: 289 reward: 129
Episode 6060 avg length: 238 reward: 94
Episode 6080 avg length: 270 reward: 87
Episode 6100 avg length: 268 reward: 96
Episode 6120 avg length: 279 reward: 142
Episode 6140 avg length: 233 reward: 112
Episode 6160 avg length: 268 reward: 142
Episode 6180 avg length: 260 reward: 133
Episode 6200 avg length: 210 reward: 109
Episode 6220 avg length: 248 reward: 111
Episode 6240 avg length: 229 reward: 92
Episode 6260 avg length: 210 reward: 98
Episode 6280 avg length: 218 reward: 102
Episode 6300 avg length: 225 reward: 117
Episode 6320 avg length: 235 reward: 112
Episode 6340 avg length: 259 reward: 124
Episode 6360 avg length: 252 reward: 113
Episode 6380 avg length: 239 reward: 119
Episode 6400 avg length: 242 reward: 95
Episode 6420 avg length: 249 reward: 111
Episode 6440 avg length: 257 reward: 136
Episode 6460 avg length: 259 reward: 123
Episode 6480 avg length: 259 reward: 112
Episode 6500 avg length: 259 reward: 129
Episode 6520 avg length: 215 reward: 101
Episode 6540 avg length: 249 reward: 137
Episode 6560 avg length: 245 reward: 121
Episode 6580 avg length: 259 reward: 127
Episode 6600 avg length: 267 reward: 142
Episode 6620 avg length: 257 reward: 86
Episode 6640 avg length: 278 reward: 141
Episode 6660 avg length: 255 reward: 92
Episode 6680 avg length: 289 reward: 145
Episode 6700 avg length: 259 reward: 133
Episode 6720 avg length: 247 reward: 116
Episode 6740 avg length: 243 reward: 56
Episode 6760 avg length: 274 reward: 114
Episode 6780 avg length: 279 reward: 133
Episode 6800 avg length: 269 reward: 152
Episode 6820 avg length: 252 reward: 105
Episode 6840 avg length: 254 reward: 123
Episode 6860 avg length: 253 reward: 98
Episode 6880 avg length: 273 reward: 132
Episode 6900 avg length: 249 reward: 108
Episode 6920 avg length: 248 reward: 84
Episode 6940 avg length: 250 reward: 107
Episode 6960 avg length: 279 reward: 99
Episode 6980 avg length: 279 reward: 140
Episode 7000 avg length: 270 reward: 105
Episode 7020 avg length: 250 reward: 109
Episode 7040 avg length: 202 reward: 87
Episode 7060 avg length: 188 reward: 56
Episode 7080 avg length: 229 reward: 93
Episode 7100 avg length: 248 reward: 105
Episode 7120 avg length: 218 reward: 105
Episode 7140 avg length: 213 reward: 77
Episode 7160 avg length: 279 reward: 128
Episode 7180 avg length: 247 reward: 110
Episode 7200 avg length: 269 reward: 124
Episode 7220 avg length: 217 reward: 64
Episode 7240 avg length: 258 reward: 140
Episode 7260 avg length: 279 reward: 116
Episode 7280 avg length: 244 reward: 97
Episode 7300 avg length: 245 reward: 104
Episode 7320 avg length: 213 reward: 81
Episode 7340 avg length: 268 reward: 126
Episode 7360 avg length: 277 reward: 124
Episode 7380 avg length: 251 reward: 122
Episode 7400 avg length: 234 reward: 108
Episode 7420 avg length: 267 reward: 127
Episode 7440 avg length: 218 reward: 89
Episode 7460 avg length: 199 reward: 80
Episode 7480 avg length: 154 reward: 55
Episode 7500 avg length: 228 reward: 114
Episode 7520 avg length: 197 reward: 49
Episode 7540 avg length: 147 reward: 59
Episode 7560 avg length: 139 reward: 49
Episode 7580 avg length: 181 reward: 74
Episode 7600 avg length: 191 reward: 61
Episode 7620 avg length: 176 reward: 78
Episode 7640 avg length: 160 reward: 35
Episode 7660 avg length: 159 reward: 50
Episode 7680 avg length: 143 reward: 68
Episode 7700 avg length: 227 reward: 103
Episode 7720 avg length: 192 reward: 59
Episode 7740 avg length: 248 reward: 118
Episode 7760 avg length: 250 reward: 128
Episode 7780 avg length: 261 reward: 110
Episode 7800 avg length: 279 reward: 157
Episode 7820 avg length: 249 reward: 153
Episode 7840 avg length: 212 reward: 78
Episode 7860 avg length: 249 reward: 144
Episode 7880 avg length: 257 reward: 107
Episode 7900 avg length: 271 reward: 136
Episode 7920 avg length: 244 reward: 129
Episode 7940 avg length: 262 reward: 145
Episode 7960 avg length: 224 reward: 94
Episode 7980 avg length: 247 reward: 110
Episode 8000 avg length: 190 reward: 81
Episode 8020 avg length: 157 reward: 67
Episode 8040 avg length: 171 reward: 67
Episode 8060 avg length: 203 reward: 96
Episode 8080 avg length: 225 reward: 87
Episode 8100 avg length: 166 reward: 84
Episode 8120 avg length: 196 reward: 82
Episode 8140 avg length: 249 reward: 120
Episode 8160 avg length: 216 reward: 112
Episode 8180 avg length: 178 reward: 97
Episode 8200 avg length: 221 reward: 120
Episode 8220 avg length: 265 reward: 122
Episode 8240 avg length: 240 reward: 125
Episode 8260 avg length: 266 reward: 146
Episode 8280 avg length: 253 reward: 116
Episode 8300 avg length: 233 reward: 129
Episode 8320 avg length: 260 reward: 126
Episode 8340 avg length: 264 reward: 138
Episode 8360 avg length: 196 reward: 88
Episode 8380 avg length: 189 reward: 60
Episode 8400 avg length: 227 reward: 66
Episode 8420 avg length: 257 reward: 114
Episode 8440 avg length: 254 reward: 99
Episode 8460 avg length: 268 reward: 127
Episode 8480 avg length: 263 reward: 131
Episode 8500 avg length: 246 reward: 107
Episode 8520 avg length: 281 reward: 127
Episode 8540 avg length: 273 reward: 146
Episode 8560 avg length: 290 reward: 124
Episode 8580 avg length: 261 reward: 103
Episode 8600 avg length: 294 reward: 140
Episode 8620 avg length: 236 reward: 110
Episode 8640 avg length: 261 reward: 125
Episode 8660 avg length: 284 reward: 108
Episode 8680 avg length: 278 reward: 141
Episode 8700 avg length: 256 reward: 124
Episode 8720 avg length: 245 reward: 95
Episode 8740 avg length: 258 reward: 136
Episode 8760 avg length: 289 reward: 147
Episode 8780 avg length: 229 reward: 98
Episode 8800 avg length: 277 reward: 138
Episode 8820 avg length: 237 reward: 129
Episode 8840 avg length: 276 reward: 141
Episode 8860 avg length: 224 reward: 102
Episode 8880 avg length: 220 reward: 108
Episode 8900 avg length: 277 reward: 137
Episode 8920 avg length: 259 reward: 120
Episode 8940 avg length: 242 reward: 124
Episode 8960 avg length: 275 reward: 119
Episode 8980 avg length: 256 reward: 140
Episode 9000 avg length: 263 reward: 110
Episode 9020 avg length: 247 reward: 101
Episode 9040 avg length: 251 reward: 99
Episode 9060 avg length: 266 reward: 128
Episode 9080 avg length: 247 reward: 119
Episode 9100 avg length: 227 reward: 95
Episode 9120 avg length: 242 reward: 95
Episode 9140 avg length: 234 reward: 120
Episode 9160 avg length: 271 reward: 145
Episode 9180 avg length: 234 reward: 106
Episode 9200 avg length: 230 reward: 102
Episode 9220 avg length: 217 reward: 111
Episode 9240 avg length: 182 reward: 68
Episode 9260 avg length: 225 reward: 111
Episode 9280 avg length: 224 reward: 110
Episode 9300 avg length: 195 reward: 97
Episode 9320 avg length: 245 reward: 110
Episode 9340 avg length: 249 reward: 87
Episode 9360 avg length: 238 reward: 105
Episode 9380 avg length: 231 reward: 83
Episode 9400 avg length: 245 reward: 60
Episode 9420 avg length: 251 reward: 81
Episode 9440 avg length: 218 reward: 86
Episode 9460 avg length: 177 reward: 62
Episode 9480 avg length: 212 reward: 64
Episode 9500 avg length: 213 reward: 96
Episode 9520 avg length: 267 reward: 121
Episode 9540 avg length: 195 reward: 89
Episode 9560 avg length: 259 reward: 140
Episode 9580 avg length: 246 reward: 116
Episode 9600 avg length: 266 reward: 122
Episode 9620 avg length: 255 reward: 104
Episode 9640 avg length: 203 reward: 116
Episode 9660 avg length: 239 reward: 117
Episode 9680 avg length: 239 reward: 118
Episode 9700 avg length: 254 reward: 137
Episode 9720 avg length: 269 reward: 144
Episode 9740 avg length: 274 reward: 136
Episode 9760 avg length: 259 reward: 123
Episode 9780 avg length: 230 reward: 102
Episode 9800 avg length: 268 reward: 139
Episode 9820 avg length: 258 reward: 120
Episode 9840 avg length: 271 reward: 111
Episode 9860 avg length: 260 reward: 130
Episode 9880 avg length: 280 reward: 135
Episode 9900 avg length: 269 reward: 126
Episode 9920 avg length: 290 reward: 159
Episode 9940 avg length: 286 reward: 129
Episode 9960 avg length: 259 reward: 117
Episode 9980 avg length: 299 reward: 139
Episode 10000 avg length: 298 reward: 141
Episode 10020 avg length: 294 reward: 115
Episode 10040 avg length: 284 reward: 117
Episode 10060 avg length: 299 reward: 156
Episode 10080 avg length: 290 reward: 145
Episode 10100 avg length: 280 reward: 151
Episode 10120 avg length: 299 reward: 163
Episode 10140 avg length: 290 reward: 151
Episode 10160 avg length: 269 reward: 133
Episode 10180 avg length: 259 reward: 134
Episode 10200 avg length: 272 reward: 137
Episode 10220 avg length: 260 reward: 121
Episode 10240 avg length: 259 reward: 103
Episode 10260 avg length: 260 reward: 126
Episode 10280 avg length: 279 reward: 150
Episode 10300 avg length: 268 reward: 128
Episode 10320 avg length: 261 reward: 140
Episode 10340 avg length: 243 reward: 111
Episode 10360 avg length: 236 reward: 113
Episode 10380 avg length: 219 reward: 112
Episode 10400 avg length: 267 reward: 140
Episode 10420 avg length: 279 reward: 146
Episode 10440 avg length: 285 reward: 137
Episode 10460 avg length: 255 reward: 107
Episode 10480 avg length: 249 reward: 115
Episode 10500 avg length: 241 reward: 106
Episode 10520 avg length: 219 reward: 102
Episode 10540 avg length: 200 reward: 52
Episode 10560 avg length: 267 reward: 124
Episode 10580 avg length: 235 reward: 111
Episode 10600 avg length: 223 reward: 86
Episode 10620 avg length: 220 reward: 90
Episode 10640 avg length: 269 reward: 145
Episode 10660 avg length: 255 reward: 133
Episode 10680 avg length: 277 reward: 130
Episode 10700 avg length: 280 reward: 142
Episode 10720 avg length: 278 reward: 128
Episode 10740 avg length: 260 reward: 90
Episode 10760 avg length: 288 reward: 145
Episode 10780 avg length: 238 reward: 94
Episode 10800 avg length: 278 reward: 136
Episode 10820 avg length: 288 reward: 150
Episode 10840 avg length: 280 reward: 148
Episode 10860 avg length: 240 reward: 117
Episode 10880 avg length: 257 reward: 124
Episode 10900 avg length: 261 reward: 130
Episode 10920 avg length: 229 reward: 115
Episode 10940 avg length: 259 reward: 144
Episode 10960 avg length: 238 reward: 138
Episode 10980 avg length: 230 reward: 112
Episode 11000 avg length: 254 reward: 126
Episode 11020 avg length: 281 reward: 141
Episode 11040 avg length: 270 reward: 120
Episode 11060 avg length: 297 reward: 174
Episode 11080 avg length: 261 reward: 138
Episode 11100 avg length: 259 reward: 125
Episode 11120 avg length: 292 reward: 173
Episode 11140 avg length: 275 reward: 146
Episode 11160 avg length: 299 reward: 165
Episode 11180 avg length: 299 reward: 175
Episode 11200 avg length: 289 reward: 161
Episode 11220 avg length: 299 reward: 166
Episode 11240 avg length: 278 reward: 160
Episode 11260 avg length: 290 reward: 142
Episode 11280 avg length: 299 reward: 164
Episode 11300 avg length: 279 reward: 155
Episode 11320 avg length: 299 reward: 178
Episode 11340 avg length: 299 reward: 150
Episode 11360 avg length: 265 reward: 110
Episode 11380 avg length: 288 reward: 156
Episode 11400 avg length: 278 reward: 146
Episode 11420 avg length: 268 reward: 141
Episode 11440 avg length: 291 reward: 130
Episode 11460 avg length: 299 reward: 161
Episode 11480 avg length: 284 reward: 142
Episode 11500 avg length: 262 reward: 132
Episode 11520 avg length: 287 reward: 149
Episode 11540 avg length: 288 reward: 150
Episode 11560 avg length: 288 reward: 157
Episode 11580 avg length: 288 reward: 156
Episode 11600 avg length: 284 reward: 133
Episode 11620 avg length: 287 reward: 152
Episode 11640 avg length: 249 reward: 130
Episode 11660 avg length: 240 reward: 106
Episode 11680 avg length: 271 reward: 131
Episode 11700 avg length: 271 reward: 117
Episode 11720 avg length: 286 reward: 143
Episode 11740 avg length: 293 reward: 150
Episode 11760 avg length: 289 reward: 155
Episode 11780 avg length: 290 reward: 137
Episode 11800 avg length: 289 reward: 133
Episode 11820 avg length: 273 reward: 121
Episode 11840 avg length: 274 reward: 109
Episode 11860 avg length: 261 reward: 147
Episode 11880 avg length: 210 reward: 114
Episode 11900 avg length: 245 reward: 143
Episode 11920 avg length: 210 reward: 115
Episode 11940 avg length: 218 reward: 102
Episode 11960 avg length: 214 reward: 102
Episode 11980 avg length: 269 reward: 133
Episode 12000 avg length: 262 reward: 144
Episode 12020 avg length: 235 reward: 131
Episode 12040 avg length: 253 reward: 149
Episode 12060 avg length: 227 reward: 120
Episode 12080 avg length: 202 reward: 98
Episode 12100 avg length: 240 reward: 117
Episode 12120 avg length: 231 reward: 108
Episode 12140 avg length: 230 reward: 122
Episode 12160 avg length: 228 reward: 108
Episode 12180 avg length: 233 reward: 96
Episode 12200 avg length: 252 reward: 123
Episode 12220 avg length: 272 reward: 154
Episode 12240 avg length: 251 reward: 122
Episode 12260 avg length: 273 reward: 147
Episode 12280 avg length: 239 reward: 111
Episode 12300 avg length: 287 reward: 126
Episode 12320 avg length: 278 reward: 121
Episode 12340 avg length: 258 reward: 120
Episode 12360 avg length: 265 reward: 104
Episode 12380 avg length: 279 reward: 118
Episode 12400 avg length: 254 reward: 72
Episode 12420 avg length: 187 reward: 74
Episode 12440 avg length: 244 reward: 90
Episode 12460 avg length: 228 reward: 116
Episode 12480 avg length: 258 reward: 125
Episode 12500 avg length: 247 reward: 118
Episode 12520 avg length: 244 reward: 101
Episode 12540 avg length: 267 reward: 135
Episode 12560 avg length: 253 reward: 99
Episode 12580 avg length: 285 reward: 135
Episode 12600 avg length: 259 reward: 113
Episode 12620 avg length: 256 reward: 108
Episode 12640 avg length: 238 reward: 114
Episode 12660 avg length: 265 reward: 128
Episode 12680 avg length: 289 reward: 145
Episode 12700 avg length: 287 reward: 147
Episode 12720 avg length: 283 reward: 139
Episode 12740 avg length: 255 reward: 108
Episode 12760 avg length: 299 reward: 150
Episode 12780 avg length: 277 reward: 138
Episode 12800 avg length: 290 reward: 151
Episode 12820 avg length: 284 reward: 159
Episode 12840 avg length: 299 reward: 150
Episode 12860 avg length: 289 reward: 146
Episode 12880 avg length: 299 reward: 158
Episode 12900 avg length: 299 reward: 144
Episode 12920 avg length: 279 reward: 129
Episode 12940 avg length: 282 reward: 132
Episode 12960 avg length: 280 reward: 132
Episode 12980 avg length: 278 reward: 108
Episode 13000 avg length: 284 reward: 136
Episode 13020 avg length: 289 reward: 128
Episode 13040 avg length: 291 reward: 149
Episode 13060 avg length: 299 reward: 140
Episode 13080 avg length: 292 reward: 141
Episode 13100 avg length: 290 reward: 139
Episode 13120 avg length: 299 reward: 139
Episode 13140 avg length: 291 reward: 151
Episode 13160 avg length: 291 reward: 141
Episode 13180 avg length: 299 reward: 169
Episode 13200 avg length: 299 reward: 162
Episode 13220 avg length: 299 reward: 170
Episode 13240 avg length: 299 reward: 170
Episode 13260 avg length: 299 reward: 155
Episode 13280 avg length: 299 reward: 153
Episode 13300 avg length: 299 reward: 163
Episode 13320 avg length: 281 reward: 131
Episode 13340 avg length: 289 reward: 153
Episode 13360 avg length: 285 reward: 133
Episode 13380 avg length: 280 reward: 134
Episode 13400 avg length: 282 reward: 134
Episode 13420 avg length: 268 reward: 114
Episode 13440 avg length: 290 reward: 142
Episode 13460 avg length: 270 reward: 145
Episode 13480 avg length: 257 reward: 127
Episode 13500 avg length: 272 reward: 139
Episode 13520 avg length: 270 reward: 129
Episode 13540 avg length: 279 reward: 149
Episode 13560 avg length: 269 reward: 95
Episode 13580 avg length: 270 reward: 113
Episode 13600 avg length: 258 reward: 125
Episode 13620 avg length: 217 reward: 88
Episode 13640 avg length: 157 reward: 59
Episode 13660 avg length: 132 reward: 41
Episode 13680 avg length: 220 reward: 92
Episode 13700 avg length: 241 reward: 109
Episode 13720 avg length: 252 reward: 127
Episode 13740 avg length: 253 reward: 104
Episode 13760 avg length: 269 reward: 128
Episode 13780 avg length: 230 reward: 96
Episode 13800 avg length: 258 reward: 127
Episode 13820 avg length: 290 reward: 151
Episode 13840 avg length: 299 reward: 135
Episode 13860 avg length: 280 reward: 111
Episode 13880 avg length: 268 reward: 124
Episode 13900 avg length: 255 reward: 93
Episode 13920 avg length: 258 reward: 128
Episode 13940 avg length: 244 reward: 127
Episode 13960 avg length: 238 reward: 117
Episode 13980 avg length: 237 reward: 104
Episode 14000 avg length: 251 reward: 123
Episode 14020 avg length: 267 reward: 114
Episode 14040 avg length: 271 reward: 109
Episode 14060 avg length: 247 reward: 117
Episode 14080 avg length: 282 reward: 129
Episode 14100 avg length: 266 reward: 144
Episode 14120 avg length: 256 reward: 132
Episode 14140 avg length: 267 reward: 140
Episode 14160 avg length: 289 reward: 149
Episode 14180 avg length: 262 reward: 95
Episode 14200 avg length: 278 reward: 128
Episode 14220 avg length: 279 reward: 136
Episode 14240 avg length: 249 reward: 105
Episode 14260 avg length: 235 reward: 112
Episode 14280 avg length: 273 reward: 131
Episode 14300 avg length: 278 reward: 130
Episode 14320 avg length: 259 reward: 123
Episode 14340 avg length: 234 reward: 78
Episode 14360 avg length: 268 reward: 125
Episode 14380 avg length: 294 reward: 153
Episode 14400 avg length: 299 reward: 150
Episode 14420 avg length: 278 reward: 129
Episode 14440 avg length: 297 reward: 155
Episode 14460 avg length: 247 reward: 106
Episode 14480 avg length: 289 reward: 154
Episode 14500 avg length: 270 reward: 133
Episode 14520 avg length: 259 reward: 133
Episode 14540 avg length: 280 reward: 151
Episode 14560 avg length: 268 reward: 129
Episode 14580 avg length: 299 reward: 159
Episode 14600 avg length: 279 reward: 131
Episode 14620 avg length: 242 reward: 100
Episode 14640 avg length: 236 reward: 114
Episode 14660 avg length: 253 reward: 132
Episode 14680 avg length: 272 reward: 134
Episode 14700 avg length: 297 reward: 175
Episode 14720 avg length: 278 reward: 148
Episode 14740 avg length: 289 reward: 154
Episode 14760 avg length: 288 reward: 148
Episode 14780 avg length: 278 reward: 140
Episode 14800 avg length: 266 reward: 128
Episode 14820 avg length: 288 reward: 161
Episode 14840 avg length: 278 reward: 145
Episode 14860 avg length: 290 reward: 161
Episode 14880 avg length: 279 reward: 139
Episode 14900 avg length: 284 reward: 155
Episode 14920 avg length: 245 reward: 136
Episode 14940 avg length: 269 reward: 137
Episode 14960 avg length: 262 reward: 146
Episode 14980 avg length: 299 reward: 154
Episode 15000 avg length: 273 reward: 172
Episode 15020 avg length: 278 reward: 142
Episode 15040 avg length: 277 reward: 150
Episode 15060 avg length: 232 reward: 119
Episode 15080 avg length: 280 reward: 141
Episode 15100 avg length: 260 reward: 137
Episode 15120 avg length: 285 reward: 167
Episode 15140 avg length: 280 reward: 149
Episode 15160 avg length: 237 reward: 118
Episode 15180 avg length: 223 reward: 111
Episode 15200 avg length: 243 reward: 134
Episode 15220 avg length: 269 reward: 138
Episode 15240 avg length: 251 reward: 127
Episode 15260 avg length: 289 reward: 157
Episode 15280 avg length: 229 reward: 107
Episode 15300 avg length: 277 reward: 143
Episode 15320 avg length: 288 reward: 154
Episode 15340 avg length: 289 reward: 149
Episode 15360 avg length: 288 reward: 145
Episode 15380 avg length: 260 reward: 134
Episode 15400 avg length: 246 reward: 126
Episode 15420 avg length: 244 reward: 132
Episode 15440 avg length: 272 reward: 129
Episode 15460 avg length: 267 reward: 134
Episode 15480 avg length: 263 reward: 135
Episode 15500 avg length: 280 reward: 141
Episode 15520 avg length: 254 reward: 126
Episode 15540 avg length: 275 reward: 133
Episode 15560 avg length: 271 reward: 120
Episode 15580 avg length: 270 reward: 130
Episode 15600 avg length: 299 reward: 144
Episode 15620 avg length: 254 reward: 88
Episode 15640 avg length: 271 reward: 126
Episode 15660 avg length: 289 reward: 153
Episode 15680 avg length: 231 reward: 104
Episode 15700 avg length: 227 reward: 127
Episode 15720 avg length: 174 reward: 82
Episode 15740 avg length: 214 reward: 92
Episode 15760 avg length: 190 reward: 89
Episode 15780 avg length: 159 reward: 49
Episode 15800 avg length: 222 reward: 100
Episode 15820 avg length: 269 reward: 133
Episode 15840 avg length: 243 reward: 100
Episode 15860 avg length: 191 reward: 68
Episode 15880 avg length: 221 reward: 86
Episode 15900 avg length: 206 reward: 109
Episode 15920 avg length: 228 reward: 89
Episode 15940 avg length: 250 reward: 108
Episode 15960 avg length: 229 reward: 110
Episode 15980 avg length: 263 reward: 139
Episode 16000 avg length: 250 reward: 125
Episode 16020 avg length: 270 reward: 140
Episode 16040 avg length: 251 reward: 131
Episode 16060 avg length: 258 reward: 124
Episode 16080 avg length: 268 reward: 130
Episode 16100 avg length: 263 reward: 125
Episode 16120 avg length: 280 reward: 150
Episode 16140 avg length: 267 reward: 132
Episode 16160 avg length: 284 reward: 137
Episode 16180 avg length: 275 reward: 128
Episode 16200 avg length: 269 reward: 132
Episode 16220 avg length: 280 reward: 132
Episode 16240 avg length: 279 reward: 145
Episode 16260 avg length: 299 reward: 152
Episode 16280 avg length: 238 reward: 112
Episode 16300 avg length: 284 reward: 159
Episode 16320 avg length: 280 reward: 136
Episode 16340 avg length: 271 reward: 120
Episode 16360 avg length: 281 reward: 139
Episode 16380 avg length: 267 reward: 141
Episode 16400 avg length: 299 reward: 164
Episode 16420 avg length: 239 reward: 113
Episode 16440 avg length: 276 reward: 143
Episode 16460 avg length: 268 reward: 144
Episode 16480 avg length: 269 reward: 134
Episode 16500 avg length: 273 reward: 148
Episode 16520 avg length: 247 reward: 97
Episode 16540 avg length: 266 reward: 129
Episode 16560 avg length: 267 reward: 119
Episode 16580 avg length: 270 reward: 124
Episode 16600 avg length: 262 reward: 101
Episode 16620 avg length: 257 reward: 121
Episode 16640 avg length: 233 reward: 99
Episode 16660 avg length: 268 reward: 114
Episode 16680 avg length: 261 reward: 126
Episode 16700 avg length: 278 reward: 143
Episode 16720 avg length: 278 reward: 117
Episode 16740 avg length: 266 reward: 135
Episode 16760 avg length: 282 reward: 140
Episode 16780 avg length: 299 reward: 154
Episode 16800 avg length: 279 reward: 144
Episode 16820 avg length: 281 reward: 124
Episode 16840 avg length: 280 reward: 132
Episode 16860 avg length: 278 reward: 148
Episode 16880 avg length: 280 reward: 113
Episode 16900 avg length: 268 reward: 133
Episode 16920 avg length: 291 reward: 147
Episode 16940 avg length: 274 reward: 150
Episode 16960 avg length: 281 reward: 137
Episode 16980 avg length: 251 reward: 126
Episode 17000 avg length: 261 reward: 135
Episode 17020 avg length: 267 reward: 105
Episode 17040 avg length: 274 reward: 176
Episode 17060 avg length: 262 reward: 131
Episode 17080 avg length: 186 reward: 184
Episode 17100 avg length: 225 reward: 150
Episode 17120 avg length: 201 reward: 218
Episode 17140 avg length: 211 reward: 220
Episode 17160 avg length: 221 reward: 218
Episode 17180 avg length: 232 reward: 210
Episode 17200 avg length: 216 reward: 220
Episode 17220 avg length: 226 reward: 203
Episode 17240 avg length: 198 reward: 170
Episode 17260 avg length: 196 reward: 222
Episode 17280 avg length: 214 reward: 196
Episode 17300 avg length: 229 reward: 205
Episode 17320 avg length: 183 reward: 192
Episode 17340 avg length: 212 reward: 186
Episode 17360 avg length: 192 reward: 164
########## Solved! ##########
关于Python中怎样使用Tensorflow2 opp算法实现月球登陆器问题的解答就分享到这里了,希望以上内容可以对大家有一定的帮助,如果你还有很多疑惑没有解开,可以关注亿速云行业资讯频道了解更多相关知识。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。