Single Agent¶
Single-agent environments in MASA use the Gymnasium API. They can be used directly as Gymnasium environments, or through
masa.common.utils.make_env when you want the standard MASA wrapper stack for labels,
constraints, and monitoring.
The current collection spans three broad settings:
Continuous control: Cartpole with continuous actions.
Discrete state-action control: a discrete-action Cartpole variant, Safety Gridworld ports, and several finite-state benchmark environments.
Tabular environments: gridworlds, Pacman variants, and the Media Streaming MDP.
Environment Summary¶
Environment ID |
Family |
Observation space |
Action space |
Reward signal |
Default cost signal |
|---|---|---|---|---|---|
|
Continuous control |
|
|
|
|
|
Discrete-action control |
|
|
|
|
|
Safety Gridworld port |
|
|
|
|
|
Safety Gridworld port |
|
|
|
|
|
Safety Gridworld port |
|
|
|
|
|
Tabular maze |
|
|
|
|
|
Tabular maze |
|
|
|
|
|
Structured discrete maze |
|
|
coin collection reward |
|
|
Structured discrete maze |
|
|
coin collection reward |
|
|
Tabular gridworld |
|
|
|
|
|
Tabular gridworld |
|
|
|
|
|
Tabular gridworld |
|
|
|
|
|
Tabular gridworld |
|
|
|
|
|
Tabular gridworld |
|
|
|
|
|
Tabular gridworld |
|
|
|
|
|
Tabular queueing MDP |
|
|
|
|
For the environments that expose model structure in addition to the Gymnasium step API, the access pattern differs slightly:
Full transition matrix: all gridworlds,
media_streaming,mini_pacman, andmini_pacman_with_coins.Successor-state dictionary:
pacmanandpacman_with_coins.Step API only:
cont_cartpole,disc_cartpole,island_navigation,conveyor_belt, andsokoban.