Constrained Markov Decision Process (CMDP)¶
Overview¶
Cumulative-cost constraints in the CMDP style.
This module provides a simple budgeted cumulative cost constraint, commonly used to model constrained MDPs (CMDPs). At each step a cost is computed from the current label set:
and accumulated over the episode:
The episode is considered satisfied when:
where \(B\) is the user-specified budget.
The wrapper CumulativeCostEnv updates the monitor each step by reading
info["labels"] from the wrapped LabelledEnv.
API Reference¶
- class masa.common.constraints.cmdp.CumulativeCost(cost_fn: Callable[[Iterable[str]], float], budget: float)[source]¶
Bases:
ConstraintCMDP-style cumulative cost constraint with a fixed budget.
The monitor keeps:
step_cost: the instantaneous cost \(c_t\),total: the accumulated cost \(C_T\).
- Parameters:
cost_fn – Mapping from a label set to a scalar cost.
budget – Episode budget \(B\). The episode is satisfied if
total <= budget.
- Variables:
cost_fn – The cost function
labels -> float.budget – Maximum allowed cumulative cost.
total – Running cumulative cost for the current episode.
step_cost – Cost at the most recent update.
- update(labels: Iterable[str])[source]¶
Update costs from the current label set.
- Parameters:
labels – Iterable of atomic proposition strings for the current step.
- satisfied() bool[source]¶
Check whether the episode remains within budget.
- Returns:
Trueifftotal <= budget.
- episode_metric() Dict[str, float][source]¶
End-of-episode metrics.
- Returns:
"cum_cost": cumulative cost over the episode,"satisfied":1.0if within budget else0.0.
- Return type:
A dict containing
- step_metric() Dict[str, float][source]¶
Per-step metrics.
- Returns:
"cost": instantaneous cost,"violation": 1.0 if the instantaneous cost is considered unsafe under the local conventioncost >= 0.5,"cum_cost": running total.
- Return type:
A dict containing
- property constraint_type: str¶
"cmdp".- Type:
Stable identifier string
- class masa.common.constraints.cmdp.CumulativeCostEnv(env: gym.Env, cost_fn: CostFn = dummy_cost_fn, budget: float = 20.0, **kw)[source]¶
Bases:
BaseConstraintEnvGymnasium wrapper that attaches
CumulativeCostto an environment.- Parameters:
env – Base environment (must be a
LabelledEnv).cost_fn – Cost function mapping label sets to float cost.
budget – Cumulative cost budget \(B\).
**kw – Extra keyword arguments forwarded to
BaseConstraintEnv.
Initialize the wrapper.
- Parameters:
env – Base environment. Must already be wrapped as a
LabelledEnvso that step/reset provide label sets ininfo["labels"].constraint – A constraint monitor implementing
Constraint.**kw – Unused extra keyword arguments (kept for wrapper compatibility).
- Raises:
TypeError – If
envis not aLabelledEnv.