Constrained Markov Decision Process (CMDP)

Overview

Cumulative-cost constraints in the CMDP style.

This module provides a simple budgeted cumulative cost constraint, commonly used to model constrained MDPs (CMDPs). At each step a cost is computed from the current label set:

\[c_t \triangleq c(L(s_t)),\]

and accumulated over the episode:

\[C_T \triangleq \sum_{t=0}^{T-1} c_t.\]

The episode is considered satisfied when:

\[C_T \le B,\]

where \(B\) is the user-specified budget.

The wrapper CumulativeCostEnv updates the monitor each step by reading info["labels"] from the wrapped LabelledEnv.

API Reference

class masa.common.constraints.cmdp.CumulativeCost(cost_fn: Callable[[Iterable[str]], float], budget: float)[source]

Bases: Constraint

CMDP-style cumulative cost constraint with a fixed budget.

The monitor keeps:

  • step_cost: the instantaneous cost \(c_t\),

  • total: the accumulated cost \(C_T\).

Parameters:
  • cost_fn – Mapping from a label set to a scalar cost.

  • budget – Episode budget \(B\). The episode is satisfied if total <= budget.

Variables:
  • cost_fn – The cost function labels -> float.

  • budget – Maximum allowed cumulative cost.

  • total – Running cumulative cost for the current episode.

  • step_cost – Cost at the most recent update.

reset()[source]

Reset episode counters.

update(labels: Iterable[str])[source]

Update costs from the current label set.

Parameters:

labels – Iterable of atomic proposition strings for the current step.

satisfied() bool[source]

Check whether the episode remains within budget.

Returns:

True iff total <= budget.

episode_metric() Dict[str, float][source]

End-of-episode metrics.

Returns:

  • "cum_cost": cumulative cost over the episode,

  • "satisfied": 1.0 if within budget else 0.0.

Return type:

A dict containing

step_metric() Dict[str, float][source]

Per-step metrics.

Returns:

  • "cost": instantaneous cost,

  • "violation": 1.0 if the instantaneous cost is considered unsafe under the local convention cost >= 0.5,

  • "cum_cost": running total.

Return type:

A dict containing

property constraint_type: str

"cmdp".

Type:

Stable identifier string

class masa.common.constraints.cmdp.CumulativeCostEnv(env: gym.Env, cost_fn: CostFn = dummy_cost_fn, budget: float = 20.0, **kw)[source]

Bases: BaseConstraintEnv

Gymnasium wrapper that attaches CumulativeCost to an environment.

Parameters:
  • env – Base environment (must be a LabelledEnv).

  • cost_fn – Cost function mapping label sets to float cost.

  • budget – Cumulative cost budget \(B\).

  • **kw – Extra keyword arguments forwarded to BaseConstraintEnv.

Initialize the wrapper.

Parameters:
  • env – Base environment. Must already be wrapped as a LabelledEnv so that step/reset provide label sets in info["labels"].

  • constraint – A constraint monitor implementing Constraint.

  • **kw – Unused extra keyword arguments (kept for wrapper compatibility).

Raises:

TypeError – If env is not a LabelledEnv.