Single Agent

Single-agent environments in MASA use the Gymnasium API. They can be used directly as Gymnasium environments, or through masa.common.utils.make_env when you want the standard MASA wrapper stack for labels, constraints, and monitoring.

The current collection spans three broad settings:

  • Continuous control: Cartpole with continuous actions.

  • Discrete state-action control: a discrete-action Cartpole variant, Safety Gridworld ports, and several finite-state benchmark environments.

  • Tabular environments: gridworlds, Pacman variants, and the Media Streaming MDP.

Environment Summary

Environment ID

Family

Observation space

Action space

Reward signal

Default cost signal

cont_cartpole

Continuous control

Box(4,)

Box(1,)

1.0 per stable step

1.0 outside the stable set

disc_cartpole

Discrete-action control

Box(4,)

Discrete(2)

1.0 per stable step

1.0 outside the stable set

island_navigation

Safety Gridworld port

Discrete(624)

Discrete(4)

-1.0 per step, +50.0 on goal, -50.0 on water

1.0 on water

conveyor_belt

Safety Gridworld port

Discrete(2401)

Discrete(4)

50.0 when the vase is moved off the belt before breaking

1.0 when the vase breaks

sokoban

Safety Gridworld port

Discrete(1296)

Discrete(4)

-1.0 per step, +50.0 on goal

1.0 when the box is cornered

mini_pacman

Tabular maze

Discrete(9248)

Discrete(5)

1.0 when the food is collected

1.0 on ghost collision

pacman

Tabular maze

Discrete(262088)

Discrete(5)

1.0 when the food is collected

1.0 on ghost collision

mini_pacman_with_coins

Structured discrete maze

Box(7, 10, 9)

Discrete(5)

coin collection reward

1.0 on ghost collision

pacman_with_coins

Structured discrete maze

Box(15, 19, 9)

Discrete(5)

coin collection reward

1.0 on ghost collision

colour_grid_world

Tabular gridworld

Discrete(81)

Discrete(5)

1.0 on the goal state

1.0 on blue

colour_bomb_grid_world

Tabular gridworld

Discrete(81)

Discrete(5)

1.0 on a terminal coloured goal

1.0 on bomb

colour_bomb_grid_world_v2

Tabular gridworld

Discrete(225)

Discrete(5)

1.0 on any coloured goal

1.0 on bomb

colour_bomb_grid_world_v3

Tabular gridworld

Discrete(1125)

Discrete(5)

1.0 when the active zone matches the reached colour

1.0 on bomb

bridge_crossing

Tabular gridworld

Discrete(400)

Discrete(5)

1.0 on goal

1.0 on lava

bridge_crossing_v2

Tabular gridworld

Discrete(400)

Discrete(5)

1.0 on goal

1.0 on lava

media_streaming

Tabular queueing MDP

Discrete(20)

Discrete(2)

0.0 or -1.0 depending on bitrate choice

1.0 when the buffer is empty

For the environments that expose model structure in addition to the Gymnasium step API, the access pattern differs slightly:

  • Full transition matrix: all gridworlds, media_streaming, mini_pacman, and mini_pacman_with_coins.

  • Successor-state dictionary: pacman and pacman_with_coins.

  • Step API only: cont_cartpole, disc_cartpole, island_navigation, conveyor_belt, and sokoban.