Sort levels by difficulty

  1. Create ten new levels and run the learned agent 10,000 times on each one. Then store difficulty (i.e. 1 - win_percentage), the probability of a mistake and the number of holes in the ice. Sort the levels by difficulty:

    np.random.seed(1)
    levels = []
    for i in range(10):
        print(i)
        level_config = Level.random(config.p_mistake_draw)
        level = LeveledFrozenLake(level_config)
        win_precentage = sum(
            play_level(level, tuned_policy.learned_action)
            for _ in range(n_attempts)
        ) / n_attempts
        n_holes = (np.array(list(''.join(level_config.board))) == 'H').sum()
        levels.append(dict(
            difficulty=1-win_precentage,
            p_mistake=level_config.p_mistake,
            n_holes=n_holes,
            level=level,
        ))
    
    levels = sorted(levels, key=lambda l: l['difficulty'])
    

    This will take about ten minutes to run because the agent has to call the policy network for each step.

  2. Print the levels in sorted order of difficulty:

    level_df = pd.DataFrame(levels)
    level_df = level_df.sort_values('difficulty')
    level_df[['difficulty', 'p_mistake', 'n_holes']]
    

  3. Finally, play the easiest and hardest levels to confirm the result from your learned agent.

    easiest = 0
    hardest = 9
    
    play_manually(level_df.level[hardest])