Labelling Function

This page describes the labelling function API, the underlying labelled Markov Decision Process (MDP) formalism, and the MASA convention for mapping environment observations to sets of atomic predicates. These components are foundational for safety objectives, probabilistic shielding, and temporal-logic specifications (e.g. safety-LTL).

Atomic Predicates and Labels

We assume a fixed (finite) set of atomic predicates:

\[\mathcal{AP} = \{p_1, p_2, \dots, p_k\}.\]

An atomic predicate represents a Boolean property of the environment state, such as:

  • "unsafe"

  • "goal"

  • "collision"

  • "near_obstacle"

At runtime, a state may satisfy zero or more atomic predicates simultaneously; therefore the label of a state/observation is a set of predicates.

Labelling Function API

Type Signature

The labelling function type is:

LabelFn = Callable[[Any], Iterable[str]]

Formally, MASA treats labelling as a map from observations to predicate sets:

\[L : \mathcal{O} \rightarrow 2^{\mathcal{AP}},\]

where \(\mathcal{O}\) is the observation space and \(2^{\mathcal{AP}}\) is the power set of atomic predicates.

Semantics

Given an observation obs, the labelling function returns the set of atomic predicates that hold in that observation.

Requirements:

  • The output must be iterable (e.g. list, tuple, set).

  • Elements must be strings, each naming an atomic predicate in \(\mathcal{AP}\).

  • The returned collection is interpreted as a set (duplicates are ignored).

Example

def label_fn(obs):
    labels = set()

    if obs["x"] < 0:
        labels.add("unsafe")

    if obs["goal_reached"]:
        labels.add("goal")

    return labels

Observation-to-Labels Convention

MASA uses the following convention:

Important

Labels are computed from observations, not from internal environment state.

This improves:

  • Compatibility with partially observable environments,

  • Consistency under wrappers/abstractions,

  • Compositionality with automata- and logic-based monitors.

Convention summary

Concept

Convention

Input to label function

Raw observation returned by gymnasium.Env.reset() / gymnasium.Env.step().

Output

A set of atomic predicate strings (\(L(obs) \subseteq \mathcal{AP}\)).

Empty output

Valid (no predicates satisfied).

Determinism

Strongly recommended.

Labelled Environment Wrapper

To standardise access to labels, MASA provides masa.common.labelled_env.LabelledEnv, a lightweight Gymnasium wrapper that computes labels on every reset() and step() and injects them into the info dictionary under the key "labels":

info["labels"] = set(label_fn(obs))

Usage

import gymnasium as gym
from masa.common.labelled_env import LabelledEnv

env = gym.make("CartPole-v1")
env = LabelledEnv(env, label_fn)

obs, info = env.reset()
labels = info["labels"]

obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
labels = info["labels"]

API Reference

class masa.common.labelled_env.LabelledEnv(env: Env, label_fn: Callable[[Any], Iterable[str]])[source]

Bases: Wrapper

Gymnasium wrapper that attaches a labelling function to an environment.

At every call to reset() and step(), the wrapped environment’s observation is passed through a user-provided LabelFn. The resulting set of atomic propositions is stored under the "labels" key in the info dictionary.

This induces a labelled MDP without modifying the observation or reward spaces, enabling downstream components to reason symbolically over environment behaviour.

Variables:

label_fn (LabelFn) – Function mapping observations to an iterable of atomic proposition names.

Initialize the labelled environment wrapper.

Parameters:
  • env (gym.Env) – The base Gymnasium environment to wrap.

  • label_fn (LabelFn) – Labelling function mapping observations to atomic propositions.

Notes

The wrapper does not alter the observation, reward, termination, or truncation signals. All labelling information is communicated exclusively via the info dictionary.

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset the environment and compute initial labels.

The observation returned by the underlying environment is passed through label_fn, and the resulting set of labels is stored under info["labels"].

Parameters:
  • seed (int | None, optional) – Random seed passed to the underlying environment reset.

  • options (Dict[str, Any] | None, optional) – Optional reset options forwarded to the environment.

Returns:

A tuple (obs, info) where:
  • obs is the initial observation.

  • info contains all original entries from the wrapped environment, plus a "labels" entry of type set[str].

Return type:

Tuple[Any, Dict[str, Any]]

Notes

The labels entry is always a set, even if the labelling function returns a different iterable type.

step(action)[source]

Step the environment and compute labels for the next observation.

After stepping the wrapped environment, the resulting observation is labelled using label_fn. The labels are attached to the info dictionary under the "labels" key.

Parameters:

action (Any) – Action to apply to the environment.

Returns:

A tuple (obs, reward, terminated, truncated, info) where:
  • obs is the next observation,

  • reward is the scalar reward,

  • terminated indicates episode termination,

  • truncated indicates episode truncation,

  • info includes a "labels" entry of type set[str].

Return type:

Tuple[Any, float, bool, bool, Dict[str, Any]]

Notes

The labelling function is applied after the environment transition, meaning labels correspond to the post-transition state.

Common Pitfalls

Pitfall

Recommendation

Returning non-strings

Always return strings naming atomic predicates.

Using environment internals

Derive labels from observations (not hidden state).

Stateful label functions

Prefer pure, stateless functions (state belongs in constraints/monitors).

Inconsistent predicate vocabulary

Define and document \(\mathcal{AP}\) clearly (including spelling/casing).