Create ten new levels and run the learned agent 10,000 times on each one. Then store difficulty (i.e.
1 - win_percentage), the probability of a mistake and the number of holes in the ice. Sort the levels by difficulty:
np.random.seed(1) levels =  for i in range(10): print(i) level_config = Level.random(config.p_mistake_draw) level = LeveledFrozenLake(level_config) win_precentage = sum( play_level(level, tuned_policy.learned_action) for _ in range(n_attempts) ) / n_attempts n_holes = (np.array(list(''.join(level_config.board))) == 'H').sum() levels.append(dict( difficulty=1-win_precentage, p_mistake=level_config.p_mistake, n_holes=n_holes, level=level, )) levels = sorted(levels, key=lambda l: l['difficulty'])
This will take about ten minutes to run because the agent has to call the policy network for each step.
Print the levels in sorted order of difficulty:
level_df = pd.DataFrame(levels) level_df = level_df.sort_values('difficulty') level_df[['difficulty', 'p_mistake', 'n_holes']]
Finally, play the easiest and hardest levels to confirm the result from your learned agent.
easiest = 0 hardest = 9 play_manually(level_df.level[hardest])