Constrained Markov Game (CMG)¶

Overview¶

Constraint monitors for labelled PettingZoo parallel environments.

This module provides a constrained Markov game monitor based on cumulative cost budgets. Each agent receives its own label set through infos[agent]["labels"] and incurs a step cost via cost_fn(labels). Budgets are then evaluated over subsets of agents:

\[C_t^{(i)} = \mathrm{cost}(L_t^{(i)}),\]

\[B_t^{(k)} = \sum_{i \in \mathcal{G}_k} C_t^{(i)},\]

where \(\mathcal{G}_k\) is the subset of agents assigned to budget \(k\). Budgets may overlap, so the same agent cost can contribute to multiple budget totals.

class masa.common.constraints.multi_agent.cmg.Budget(amount: float, agents: tuple[str, ...], name: str | None = None)[source]¶

Bases: object

Shared cumulative-cost budget over a subset of agents.

Parameters:

amount – Maximum allowed cumulative cost for this budget.
agents – Subset of agents from env.possible_agents covered by the budget.
name – Optional metric prefix. If omitted, a generated name is used.

Notes

Agent memberships are deduplicated while preserving order. Budgets may overlap, so a single agent may contribute to more than one budget.

class masa.common.constraints.multi_agent.cmg.ConstrainedMarkovGame(possible_agents: Sequence[str], budgets: Sequence[Budget], cost_fn: Callable[[Iterable[str]], float] = dummy_cost_fn)[source]¶

Bases: object

Cumulative-cost monitor for a labelled parallel PettingZoo environment.

reset()[source]¶: Reset per-agent and per-budget cumulative costs for a new episode.

update(labels_by_agent: Mapping[str, Iterable[str]])[source]¶: Update the monitor from a mapping of agent ids to active labels.

satisfied() → bool[source]¶: Return True when every budget remains within its cap.

step_metric() → dict[str, float][source]¶: Return per-step metrics for agents and budgets.

episode_metric() → dict[str, float][source]¶: Return end-of-episode cumulative metrics for agents and budgets.

class masa.common.constraints.multi_agent.cmg.ConstrainedMarkovGameEnv(env: ParallelEnv, budgets: Sequence[Budget], cost_fn: Callable[[Iterable[str]], float] = dummy_cost_fn, **kw: Any)[source]¶

Bases: ParallelEnv

PettingZoo parallel wrapper that updates a ConstrainedMarkovGame.

reset(seed: int | None = None, options: dict[str, Any] | None = None)[source]¶: Reset the wrapped env and seed the constraint from initial agent labels.

step(actions)[source]¶: Step the wrapped env and update the constraint from per-agent labels.

state()[source]¶

Returns the state.

State returns a global view of the environment appropriate for centralized training decentralized execution methods like QMIX

render()[source]¶

Displays a rendered frame from the environment, if supported.

Alternate render modes in the default environments are ‘rgb_array’ which returns a numpy array and is supported by all environments outside of classic, and ‘ansi’ which returns the strings printed (specific to classic environments).

close()[source]¶: Closes the rendering window.

observation_space(agent)[source]¶

Takes in agent and returns the observation space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the observation_spaces dict

action_space(agent)[source]¶

Takes in agent and returns the action space for that agent.

MUST return the same value for the same agent name

Default implementation is to return the action_spaces dict