Constrained Markov Decision Process (CMDP)¶

Overview¶

Cumulative-cost constraints in the CMDP style.

This module provides a simple budgeted cumulative cost constraint, commonly used to model constrained MDPs (CMDPs). At each step a cost is computed from the current label set:

\[c_t \triangleq c(L(s_t)),\]

and accumulated over the episode:

\[C_T \triangleq \sum_{t=0}^{T-1} c_t.\]

The episode is considered satisfied when:

\[C_T \le B,\]

where \(B\) is the user-specified budget.

The wrapper CumulativeCostEnv updates the monitor each step by reading info["labels"] from the wrapped LabelledEnv.

API Reference¶

class masa.common.constraints.cmdp.CumulativeCost(cost_fn: Callable[[Iterable[str]], float], budget: float)[source]¶

Bases: Constraint

CMDP-style cumulative cost constraint with a fixed budget.

The monitor keeps:

step_cost: the instantaneous cost \(c_t\),
total: the accumulated cost \(C_T\).

Parameters:

cost_fn – Mapping from a label set to a scalar cost.
budget – Episode budget \(B\). The episode is satisfied if total <= budget.

Variables:

cost_fn – The cost function labels -> float.
budget – Maximum allowed cumulative cost.
total – Running cumulative cost for the current episode.
step_cost – Cost at the most recent update.

reset()[source]¶: Reset episode counters.

update(labels: Iterable[str])[source]¶

Update costs from the current label set.

Parameters:: labels – Iterable of atomic proposition strings for the current step.

satisfied() → bool[source]¶

Check whether the episode remains within budget.

Returns:: True iff total <= budget.

episode_metric() → Dict[str, float][source]¶

End-of-episode metrics.

Returns:

"cum_cost": cumulative cost over the episode,
"satisfied": 1.0 if within budget else 0.0.

Return type:

A dict containing

step_metric() → Dict[str, float][source]¶

Per-step metrics.

Returns:

"cost": instantaneous cost,
"violation": 1.0 if the instantaneous cost is considered unsafe under the local convention cost >= 0.5,
"cum_cost": running total.

Return type:

A dict containing

property constraint_type: str¶

"cmdp".

Type:: Stable identifier string

class masa.common.constraints.cmdp.CumulativeCostEnv(env: gym.Env, cost_fn: CostFn = dummy_cost_fn, budget: float = 20.0, **kw)[source]¶

Bases: BaseConstraintEnv

Gymnasium wrapper that attaches CumulativeCost to an environment.

Parameters:

env – Base environment (must be a LabelledEnv).
cost_fn – Cost function mapping label sets to float cost.
budget – Cumulative cost budget \(B\).
**kw – Extra keyword arguments forwarded to BaseConstraintEnv.

Initialize the wrapper.

Parameters:

env – Base environment. Must already be wrapped as a LabelledEnv so that step/reset provide label sets in info["labels"].
constraint – A constraint monitor implementing Constraint.
**kw – Unused extra keyword arguments (kept for wrapper compatibility).

Raises:

TypeError – If env is not a LabelledEnv.