Labelling Function¶
This page describes the labelling function API, the underlying labelled Markov Decision Process (MDP) formalism, and the MASA convention for mapping environment observations to sets of atomic predicates. These components are foundational for safety objectives, probabilistic shielding, and temporal-logic specifications (e.g. safety-LTL).
Atomic Predicates and Labels¶
We assume a fixed (finite) set of atomic predicates:
An atomic predicate represents a Boolean property of the environment state, such as:
"unsafe""goal""collision""near_obstacle"
At runtime, a state may satisfy zero or more atomic predicates simultaneously; therefore the label of a state/observation is a set of predicates.
Labelling Function API¶
Type Signature¶
The labelling function type is:
LabelFn = Callable[[Any], Iterable[str]]
Formally, MASA treats labelling as a map from observations to predicate sets:
where \(\mathcal{O}\) is the observation space and \(2^{\mathcal{AP}}\) is the power set of atomic predicates.
Semantics¶
Given an observation obs, the labelling function returns the set of atomic predicates that hold in
that observation.
Requirements:
The output must be iterable (e.g.
list,tuple,set).Elements must be strings, each naming an atomic predicate in \(\mathcal{AP}\).
The returned collection is interpreted as a set (duplicates are ignored).
Example¶
def label_fn(obs):
labels = set()
if obs["x"] < 0:
labels.add("unsafe")
if obs["goal_reached"]:
labels.add("goal")
return labels
Observation-to-Labels Convention¶
MASA uses the following convention:
Important
Labels are computed from observations, not from internal environment state.
This improves:
Compatibility with partially observable environments,
Consistency under wrappers/abstractions,
Compositionality with automata- and logic-based monitors.
Concept |
Convention |
|---|---|
Input to label function |
Raw observation returned by |
Output |
A set of atomic predicate strings (\(L(obs) \subseteq \mathcal{AP}\)). |
Empty output |
Valid (no predicates satisfied). |
Determinism |
Strongly recommended. |
Labelled Environment Wrapper¶
To standardise access to labels, MASA provides masa.common.labelled_env.LabelledEnv, a lightweight
Gymnasium wrapper that computes labels on every reset() and step()
and injects them into the info dictionary under the key "labels":
info["labels"] = set(label_fn(obs))
Usage¶
import gymnasium as gym
from masa.common.labelled_env import LabelledEnv
env = gym.make("CartPole-v1")
env = LabelledEnv(env, label_fn)
obs, info = env.reset()
labels = info["labels"]
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
labels = info["labels"]
API Reference¶
- class masa.common.labelled_env.LabelledEnv(env: Env, label_fn: Callable[[Any], Iterable[str]])[source]¶
Bases:
WrapperGymnasium wrapper that attaches a labelling function to an environment.
At every call to
reset()andstep(), the wrapped environment’s observation is passed through a user-providedLabelFn. The resulting set of atomic propositions is stored under the"labels"key in theinfodictionary.This induces a labelled MDP without modifying the observation or reward spaces, enabling downstream components to reason symbolically over environment behaviour.
- Variables:
label_fn (
LabelFn) – Function mapping observations to an iterable of atomic proposition names.
Initialize the labelled environment wrapper.
- Parameters:
env (gym.Env) – The base Gymnasium environment to wrap.
label_fn (
LabelFn) – Labelling function mapping observations to atomic propositions.
Notes
The wrapper does not alter the observation, reward, termination, or truncation signals. All labelling information is communicated exclusively via the
infodictionary.- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset the environment and compute initial labels.
The observation returned by the underlying environment is passed through
label_fn, and the resulting set of labels is stored underinfo["labels"].- Parameters:
seed (int | None, optional) – Random seed passed to the underlying environment reset.
options (Dict[str, Any] | None, optional) – Optional reset options forwarded to the environment.
- Returns:
- A tuple
(obs, info)where: obsis the initial observation.infocontains all original entries from the wrapped environment, plus a"labels"entry of typeset[str].
- A tuple
- Return type:
Tuple[Any, Dict[str, Any]]
Notes
The
labelsentry is always a set, even if the labelling function returns a different iterable type.
- step(action)[source]¶
Step the environment and compute labels for the next observation.
After stepping the wrapped environment, the resulting observation is labelled using
label_fn. The labels are attached to theinfodictionary under the"labels"key.- Parameters:
action (Any) – Action to apply to the environment.
- Returns:
- A tuple
(obs, reward, terminated, truncated, info)where: obsis the next observation,rewardis the scalar reward,terminatedindicates episode termination,truncatedindicates episode truncation,infoincludes a"labels"entry of typeset[str].
- A tuple
- Return type:
Tuple[Any, float, bool, bool, Dict[str, Any]]
Notes
The labelling function is applied after the environment transition, meaning labels correspond to the post-transition state.
Common Pitfalls¶
Pitfall |
Recommendation |
|---|---|
Returning non-strings |
Always return strings naming atomic predicates. |
Using environment internals |
Derive labels from observations (not hidden state). |
Stateful label functions |
Prefer pure, stateless functions (state belongs in constraints/monitors). |
Inconsistent predicate vocabulary |
Define and document \(\mathcal{AP}\) clearly (including spelling/casing). |