Labelling Function¶

This page describes the labelling function API, the underlying labelled Markov Decision Process (MDP) formalism, and the MASA convention for mapping environment observations to sets of atomic predicates. These components are foundational for safety objectives, probabilistic shielding, and temporal-logic specifications (e.g. safety-LTL).

Atomic Predicates and Labels¶

We assume a fixed (finite) set of atomic predicates:

\[\mathcal{AP} = \{p_1, p_2, \dots, p_k\}.\]

An atomic predicate represents a Boolean property of the environment state, such as:

"unsafe"
"goal"
"collision"
"near_obstacle"

At runtime, a state may satisfy zero or more atomic predicates simultaneously; therefore the label of a state/observation is a set of predicates.

Labelling Function API¶

Type Signature¶

The labelling function type is:

LabelFn = Callable[[Any], Iterable[str]]

Formally, MASA treats labelling as a map from observations to predicate sets:

\[L : \mathcal{O} \rightarrow 2^{\mathcal{AP}},\]

where \(\mathcal{O}\) is the observation space and \(2^{\mathcal{AP}}\) is the power set of atomic predicates.

Semantics¶

Given an observation obs, the labelling function returns the set of atomic predicates that hold in that observation.

Requirements:

The output must be iterable (e.g. list, tuple, set).
Elements must be strings, each naming an atomic predicate in \(\mathcal{AP}\).
The returned collection is interpreted as a set (duplicates are ignored).

Example¶

def label_fn(obs):
    labels = set()

    if obs["x"] < 0:
        labels.add("unsafe")

    if obs["goal_reached"]:
        labels.add("goal")

    return labels

Observation-to-Labels Convention¶

MASA uses the following convention:

Important

Labels are computed from observations, not from internal environment state.

This improves:

Compatibility with partially observable environments,
Consistency under wrappers/abstractions,
Compositionality with automata- and logic-based monitors.

Convention summary¶
Concept	Convention
Input to label function	Raw observation returned by `gymnasium.Env.reset()` / `gymnasium.Env.step()`.
Output	A set of atomic predicate strings (\(L(obs) \subseteq \mathcal{AP}\)).
Empty output	Valid (no predicates satisfied).
Determinism	Strongly recommended.

Labelled Environment Wrapper¶

To standardise access to labels, MASA provides masa.common.labelled_env.LabelledEnv, a lightweight Gymnasium wrapper that computes labels on every reset() and step() and injects them into the info dictionary under the key "labels":

info["labels"] = set(label_fn(obs))

Usage¶

import gymnasium as gym
from masa.common.labelled_env import LabelledEnv

env = gym.make("CartPole-v1")
env = LabelledEnv(env, label_fn)

obs, info = env.reset()
labels = info["labels"]

obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
labels = info["labels"]

API Reference¶

class masa.common.labelled_env.LabelledEnv(env: Env, label_fn: Callable[[Any], Iterable[str]])[source]¶

Bases: Wrapper

Gymnasium wrapper that attaches a labelling function to an environment.

At every call to reset() and step(), the wrapped environment’s observation is passed through a user-provided LabelFn. The resulting set of atomic propositions is stored under the "labels" key in the info dictionary.

This induces a labelled MDP without modifying the observation or reward spaces, enabling downstream components to reason symbolically over environment behaviour.

Variables:: label_fn (LabelFn) – Function mapping observations to an iterable of atomic proposition names.

Initialize the labelled environment wrapper.

Parameters:

env (gym.Env) – The base Gymnasium environment to wrap.
label_fn (LabelFn) – Labelling function mapping observations to atomic propositions.

Notes

The wrapper does not alter the observation, reward, termination, or truncation signals. All labelling information is communicated exclusively via the info dictionary.

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶

Reset the environment and compute initial labels.

The observation returned by the underlying environment is passed through label_fn, and the resulting set of labels is stored under info["labels"].

Parameters:

seed (int | None, optional) – Random seed passed to the underlying environment reset.
options (Dict[str, Any] | None, optional) – Optional reset options forwarded to the environment.

Returns:

A tuple (obs, info) where:

obs is the initial observation.
info contains all original entries from the wrapped environment, plus a "labels" entry of type set[str].

Return type:

Tuple[Any, Dict[str, Any]]

Notes

The labels entry is always a set, even if the labelling function returns a different iterable type.

step(action)[source]¶

Step the environment and compute labels for the next observation.

After stepping the wrapped environment, the resulting observation is labelled using label_fn. The labels are attached to the info dictionary under the "labels" key.

Parameters:

action (Any) – Action to apply to the environment.

Returns:

A tuple (obs, reward, terminated, truncated, info) where:

obs is the next observation,
reward is the scalar reward,
terminated indicates episode termination,
truncated indicates episode truncation,
info includes a "labels" entry of type set[str].

Return type:

Tuple[Any, float, bool, bool, Dict[str, Any]]

Notes

The labelling function is applied after the environment transition, meaning labels correspond to the post-transition state.

Common Pitfalls¶

Pitfall	Recommendation
Returning non-strings	Always return strings naming atomic predicates.
Using environment internals	Derive labels from observations (not hidden state).
Stateful label functions	Prefer pure, stateless functions (state belongs in constraints/monitors).
Inconsistent predicate vocabulary	Define and document \(\mathcal{AP}\) clearly (including spelling/casing).