Constraints

Overview

Base constraint interfaces and Gymnasium wrappers.

This module defines:

  • CostFn: a callable that maps a set/iterable of atomic proposition labels (strings) to a scalar cost.

  • Constraint: a protocol for stateful constraint monitors that can be reset and updated from labels at each environment step.

  • BaseConstraintEnv: a Gymnasium wrapper that enforces the convention that the wrapped environment is a LabelledEnv and that the constraint monitor is updated using info["labels"].

The overall convention used throughout MASA is:

  1. The base environment (or a wrapper) provides a labelling function that maps an observation/state to a set of atomic propositions labels.

  2. Each call to gymnasium.Env.step() returns these labels in the info dict under the key "labels".

  3. Constraint monitors are updated as constraint.update(labels).

  4. Constraint wrappers expose metrics for logging/training.

Mathematically, a (labelled) MDP is typically written as

\[\mathcal{M} = (\mathcal{S}, \mathcal{A}, P, r, L),\]

where:

  • \(\mathcal{S}\) is the state space,

  • \(\mathcal{A}\) is the action space,

  • \(P(s'\mid s,a)\) is the transition kernel,

  • \(r(s,a,s')\) is a reward signal,

  • \(L : \mathcal{S} \to 2^{\mathsf{AP}}\) is a labelling function mapping states to sets of atomic propositions from a finite alphabet \(\mathsf{AP}\).

A cost function then maps labels to a scalar:

\[c(s) \triangleq \mathrm{cost}(L(s)) \in \mathbb{R}.\]

API Reference

class masa.common.constraints.base.Constraint(*args, **kwargs)[source]

Bases: Protocol

Protocol for stateful constraint monitors.

A Constraint is a monitor that consumes atomic proposition labels at each step and maintains internal state (e.g., cumulative cost, whether an LTL automaton is in an accepting/unsafe state, etc.).

Implementations are intended to be lightweight and compatible with Gymnasium wrappers: call reset() at episode start and update() after each environment transition using the label set from info["labels"].

Required interface

Implementations should provide:

Metrics interface

The protocol declares:

reset()[source]

Reset any episode-dependent internal state.

update(labels: Iterable[str])[source]

Update internal state given the current set of labels.

Parameters:

labels – Iterable of atomic proposition strings active at the current step (typically taken from info["labels"]).

property constraint_type: str

A stable identifier for the constraint (e.g., "cmdp", "ltl_safety").

step_metric() Dict[str, float][source]

Return per-step logging metrics.

Metrics returned here should be:

  • cheap to compute,

  • non-destructive (do not mutate state),

  • meaningful at any time step.

Examples include running cumulative cost, a per-step violation flag, a current probability estimate, etc.

Returns:

Dictionary of scalar metrics (values should be JSON/log friendly).

episode_metric() Dict[str, float][source]

Return end-of-episode logging metrics.

This is intended to summarize what matters for evaluation/logging at episode termination (terminated or truncated).

Returns:

Dictionary of scalar metrics (values should be JSON/log friendly).

class masa.common.constraints.base.BaseConstraintEnv(env: Env, constraint: Constraint, **kw)[source]

Bases: Wrapper, Constraint

Common base wrapper for constraint-aware environments.

This wrapper enforces the MASA convention that the wrapped environment is a LabelledEnv and provides info["labels"] as a set (or frozenset) of atomic propositions at each step.

The wrapper:

  1. Delegates reset/step to the underlying environment.

  2. Extracts labels = info.get("labels", set()).

  3. Validates that labels is a set-like container of strings.

  4. Calls self._constraint.update(labels).

Variables:
  • env – The wrapped Gymnasium environment (must be a LabelledEnv).

  • _constraint – The underlying constraint monitor.

Raises:
  • TypeError – If env is not an instance of LabelledEnv.

  • ValueError – If info["labels"] exists but is not a set/frozenset.

Notes

The properties label_fn and cost_fn are convenience accessors for downstream algorithms. Depending on how wrappers are composed, these may be None.

Initialize the wrapper.

Parameters:
  • env – Base environment. Must already be wrapped as a LabelledEnv so that step/reset provide label sets in info["labels"].

  • constraint – A constraint monitor implementing Constraint.

  • **kw – Unused extra keyword arguments (kept for wrapper compatibility).

Raises:

TypeError – If env is not a LabelledEnv.

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset environment and constraint state.

This calls env.reset(...) and then resets and updates the constraint using the initial label set in info["labels"].

Parameters:
  • seed – Optional RNG seed forwarded to the base environment.

  • options – Optional reset options forwarded to the base environment.

Returns:

A tuple (obs, info) following the Gymnasium API.

Raises:

ValueError – If info["labels"] is present but not a set/frozenset.

step(action: Any)[source]

Step environment and update constraint from labels.

Parameters:

action – Action to pass to the underlying environment.

Returns:

A 5-tuple (obs, reward, terminated, truncated, info) following the Gymnasium API.

Raises:

ValueError – If info["labels"] is present but not a set/frozenset.

property cost_fn

Expose the cost function if available.

Returns:

The underlying cost function if present on the wrapped stack, else None.

property label_fn

Expose the labelling function if available.

Returns:

The environment labelling function if present, else None.

property constraint_type: str

Constraint identifier forwarded from the underlying monitor.

constraint_step_metrics() Dict[str, float][source]

Return per-step metrics from the underlying constraint.

Returns:

Dictionary of scalar metrics.

constraint_episode_metrics() Dict[str, float][source]

Return end-of-episode metrics from the underlying constraint.

Returns:

Dictionary of scalar metrics.

Next Steps

  • CMDP - Budgeted Constrained MDP.

  • LTL Safety - Safety fragment of LTL as a monitor and constraint.

  • PCTL - A simple Probabilistic Computation Tree Logic constraint.

  • Step-wise Probabilistic - Undiscounted probabilistic step-wise safety constraint.

  • Reach Avoid - A simple reach-avoid constraint.

  • ATL (Multi Agent) - Alternating Time Logic for Multi Agent Systems.