Constraints¶
Overview¶
Base constraint interfaces and Gymnasium wrappers.
This module defines:
CostFn: a callable that maps a set/iterable of atomic proposition labels (strings) to a scalar cost.Constraint: a protocol for stateful constraint monitors that can be reset and updated from labels at each environment step.BaseConstraintEnv: a Gymnasium wrapper that enforces the convention that the wrapped environment is aLabelledEnvand that the constraint monitor is updated usinginfo["labels"].
The overall convention used throughout MASA is:
The base environment (or a wrapper) provides a labelling function that maps an observation/state to a set of atomic propositions
labels.Each call to
gymnasium.Env.step()returns these labels in theinfodict under the key"labels".Constraint monitors are updated as
constraint.update(labels).Constraint wrappers expose metrics for logging/training.
Mathematically, a (labelled) MDP is typically written as
where:
\(\mathcal{S}\) is the state space,
\(\mathcal{A}\) is the action space,
\(P(s'\mid s,a)\) is the transition kernel,
\(r(s,a,s')\) is a reward signal,
\(L : \mathcal{S} \to 2^{\mathsf{AP}}\) is a labelling function mapping states to sets of atomic propositions from a finite alphabet \(\mathsf{AP}\).
A cost function then maps labels to a scalar:
API Reference¶
- class masa.common.constraints.base.Constraint(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for stateful constraint monitors.
A
Constraintis a monitor that consumes atomic proposition labels at each step and maintains internal state (e.g., cumulative cost, whether an LTL automaton is in an accepting/unsafe state, etc.).Implementations are intended to be lightweight and compatible with Gymnasium wrappers: call
reset()at episode start andupdate()after each environment transition using the label set frominfo["labels"].Required interface
Implementations should provide:
reset(): clear any episode state.update(): incorporate the current label set.constraint_type: a stable identifier string for logging/dispatch.
Metrics interface
The protocol declares:
- update(labels: Iterable[str])[source]¶
Update internal state given the current set of labels.
- Parameters:
labels – Iterable of atomic proposition strings active at the current step (typically taken from
info["labels"]).
- property constraint_type: str¶
A stable identifier for the constraint (e.g.,
"cmdp","ltl_safety").
- step_metric() Dict[str, float][source]¶
Return per-step logging metrics.
Metrics returned here should be:
cheap to compute,
non-destructive (do not mutate state),
meaningful at any time step.
Examples include running cumulative cost, a per-step violation flag, a current probability estimate, etc.
- Returns:
Dictionary of scalar metrics (values should be JSON/log friendly).
- class masa.common.constraints.base.BaseConstraintEnv(env: Env, constraint: Constraint, **kw)[source]¶
Bases:
Wrapper,ConstraintCommon base wrapper for constraint-aware environments.
This wrapper enforces the MASA convention that the wrapped environment is a
LabelledEnvand providesinfo["labels"]as aset(orfrozenset) of atomic propositions at each step.The wrapper:
Delegates reset/step to the underlying environment.
Extracts
labels = info.get("labels", set()).Validates that
labelsis a set-like container of strings.Calls
self._constraint.update(labels).
- Variables:
env – The wrapped Gymnasium environment (must be a
LabelledEnv)._constraint – The underlying constraint monitor.
- Raises:
TypeError – If
envis not an instance ofLabelledEnv.ValueError – If
info["labels"]exists but is not aset/frozenset.
Notes
The properties
label_fnandcost_fnare convenience accessors for downstream algorithms. Depending on how wrappers are composed, these may beNone.Initialize the wrapper.
- Parameters:
env – Base environment. Must already be wrapped as a
LabelledEnvso that step/reset provide label sets ininfo["labels"].constraint – A constraint monitor implementing
Constraint.**kw – Unused extra keyword arguments (kept for wrapper compatibility).
- Raises:
TypeError – If
envis not aLabelledEnv.
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset environment and constraint state.
This calls
env.reset(...)and then resets and updates the constraint using the initial label set ininfo["labels"].- Parameters:
seed – Optional RNG seed forwarded to the base environment.
options – Optional reset options forwarded to the base environment.
- Returns:
A tuple
(obs, info)following the Gymnasium API.- Raises:
ValueError – If
info["labels"]is present but not a set/frozenset.
- step(action: Any)[source]¶
Step environment and update constraint from labels.
- Parameters:
action – Action to pass to the underlying environment.
- Returns:
A 5-tuple
(obs, reward, terminated, truncated, info)following the Gymnasium API.- Raises:
ValueError – If
info["labels"]is present but not a set/frozenset.
- property cost_fn¶
Expose the cost function if available.
- Returns:
The underlying cost function if present on the wrapped stack, else
None.
- property label_fn¶
Expose the labelling function if available.
- Returns:
The environment labelling function if present, else
None.
- property constraint_type: str¶
Constraint identifier forwarded from the underlying monitor.
Next Steps¶
CMDP - Budgeted Constrained MDP.
LTL Safety - Safety fragment of LTL as a monitor and constraint.
PCTL - A simple Probabilistic Computation Tree Logic constraint.
Step-wise Probabilistic - Undiscounted probabilistic step-wise safety constraint.
Reach Avoid - A simple reach-avoid constraint.
ATL (Multi Agent) - Alternating Time Logic for Multi Agent Systems.