Define a new state array

Remember the state of the Frozen Lake environment is an integer in the range [0, 15] representing the location of the cursor on the 4x4 board. This works well if the board is fixed because the only information you need in order to make a good decision about the next action is the current location. But if the same agent is going to play multiple different boards, then the agent needs information about two things: the configuration of the board and the location of the cursor.

There’s no requirement that state be a single integer. It can easily be an array or a higher dimensional tensor. To keep things simple, state will be a 1-D array in this tutorial.

The following diagram shows how you can store all the relevant information about the state of the game in a 1-D array:

Start with the board. The board will always be 4x4. You also know that the top left corner (position 0) is always ‘S’ for start and the bottom right corner (position 15) is always ‘G’ for goal. Because these two cells are fixed, you don’t need to put those cells in the state array. There are 14 more cells and they can all be either ‘F’ for frozen or ‘H’ for hole. Since the state array needs to be numeric, use 0 to represent frozen and 1 to represent H moving left to right (in row-major order).

Use two one hot vectors to store the position of the cursor: one for the row of the cursor and one for the column. To convert a single integer location into two one hot vectors, get the row (m) and column (n) of the integer location. The one hot vectors will be the mth and nth rows of a 4x4 identity matrix.

There’s a function called get_state() in the frozen-lake package that generates a full state array for a game board at a given state. Take a look at frozen_lake/state.py to see how it works.

  1. The get_state() function is imported in your Jupyter notebook. Try it out on the test level by running the following code in a cell:

    test_level = get_test_level()
    test_level.render()
    get_state(test_level)
    

    The board, row and column components of the state array are highlighted separately in the screenshot.