Colour Grid World

colour_grid_world is a 9 x 9 single-agent tabular gridworld.

Shared Gridworld Conventions

This environment is a single-agent tabular gridworld with an explicit stochastic transition model. It exposes a full transition matrix via get_transition_matrix().

It uses the standard gridworld action convention:

  • 0: move left

  • 1: move right

  • 2: move down

  • 3: move up

  • 4: stay in place

When slip is enabled, the intended action is taken with high probability and the remaining probability mass is spread uniformly over the other actions.

Environment Details

  • start state 0,

  • goal state 80,

  • slip probability 0.1,

  • one special blue state, one green state, and one purple state.

The environment uses:

  • observation space Discrete(81),

  • action space Discrete(5),

  • labels {"start"}, {"goal"}, {"blue"}, {"green"}, and {"purple"},

  • cost 1.0 on "blue" and 0.0 otherwise.

Reward is sparse:

  • 1.0 when the agent reaches the goal state,

  • 0.0 otherwise.

The episode terminates when the goal state is reached.

This is a small benchmark for experiments where the reward target and the safety-relevant state are intentionally different.