Colour Grid World¶
colour_grid_world is a 9 x 9 single-agent tabular gridworld.
Environment Details¶
start state
0,goal state
80,slip probability
0.1,one special blue state, one green state, and one purple state.
The environment uses:
observation space
Discrete(81),action space
Discrete(5),labels
{"start"},{"goal"},{"blue"},{"green"}, and{"purple"},cost
1.0on"blue"and0.0otherwise.
Reward is sparse:
1.0when the agent reaches the goal state,0.0otherwise.
The episode terminates when the goal state is reached.
This is a small benchmark for experiments where the reward target and the safety-relevant state are intentionally different.