Multi-Agent CMG¶
This tutorial introduces MASA’s multi-agent constrained Markov game path with a small repeated Chicken matrix game.
Runnable notebook: notebooks/tutorials/12_multi_agent_cmg.ipynb
PettingZoo Parallel Shape¶
MASA’s multi-agent environments use the PettingZoo ParallelEnv API. Instead of one observation, one reward, and one info dict, a parallel environment returns dictionaries keyed by agent id:
obs, infos = env.reset(seed=0)
obs, rewards, terminations, truncations, infos = env.step(
{"player_0": action_0, "player_1": action_1}
)
For ChickenMatrix, the agents are player_0 and player_1. Each agent chooses one of:
Swerve = 0
Straight = 1
The raw observation records the previous joint action and whether the previous round was a crash.
Chicken Safety Semantics¶
In Chicken, the unsafe outcome is:
player_0: Straight
player_1: Straight
That joint action causes a crash. The environment’s label_fn emits labels including:
crash
unsafe
straight_straight
The default cost_fn maps any label set containing unsafe to cost 1.0; otherwise the cost is 0.0.
MASA CMG Wrapper¶
The standard MASA multi-agent wrapper path is built through make_marl_env:
from masa.common.constraints.multi_agent.cmg import Budget
from masa.common.utils import make_marl_env
env = make_marl_env(
"chicken_matrix",
"cmg",
env_kwargs={"max_moves": 3},
budgets=[
Budget(amount=1.0, agents=("player_0",), name="player_0_budget"),
Budget(amount=1.0, agents=("player_1",), name="player_1_budget"),
Budget(amount=1.5, agents=("player_0", "player_1"), name="shared"),
],
)
This applies:
ChickenMatrix -> LabelledParallelEnv -> ConstrainedMarkovGameEnv
After wrapping, labels are available as infos[agent]["labels"], and CMG metrics are available from:
env.constraint_step_metrics()
env.constraint_episode_metrics()
Safe Rollout¶
A safe script such as Swerve/Swerve followed by Straight/Swerve produces rewards without crash labels.
The important metrics stay at zero:
player_0_cost = 0.0
player_1_cost = 0.0
shared_cum_cost = 0.0
shared_satisfied = 1.0
satisfied = 1.0