LTL Safety Colour Bomb¶

This tutorial focuses on MASA’s ltl_safety constraint. Colour Bomb Grid World gives us labelled traces such as bomb and medic; the LTL-safety wrapper advances a DFA over those labels and reports violations when the DFA reaches an accepting unsafe state.

Runnable notebook: notebooks/tutorials/05_ltl_safety_colour_bomb.ipynb

Setup¶

Use CPU-first setup before importing MASA/JAX modules:

import os

os.environ.setdefault("JAX_PLATFORMS", "cpu")
os.environ.setdefault("TF_CPP_MIN_LOG_LEVEL", "2")

Import the environment labels, DFA helpers, and the MASA factory:

from masa.common.ltl import Atom, DFA
from masa.common.utils import make_env
from masa.envs.tabular.colour_bomb_grid_world import label_fn
from masa.examples.colour_bomb_grid_world.property_2 import make_dfa as make_diffusion_dfa
from masa.examples.colour_bomb_grid_world.property_3 import make_dfa as make_medic_recovery_dfa

Define and Load DFA Properties¶

A safety DFA uses accepting states as unsafe states. The inline property below means “never observe bomb”:

def make_never_bomb_dfa():
    dfa = DFA([0, 1], 0, [1])
    dfa.add_edge(0, 1, Atom("bomb"))
    return dfa

Inline never-bomb DFA. — Observing `bomb` moves from safe state `q0` to accepting unsafe state `q1`.¶

The loaded property_2 DFA is more nuanced: after entering a bomb cell, the agent must stay on bomb for one extra step before leaving.

Loaded property_2 bomb diffusion DFA. — Leaving bomb immediately reaches accepting unsafe state `q3`; staying on bomb once reaches `q2` and can return to safe state `q0`.¶

Build an LTL-Safety Environment¶

Use obs_type="dict" when teaching or debugging, because it exposes both parts of the product observation:

env = make_env(
    "colour_bomb_grid_world",
    "ltl_safety",
    30,
    label_fn=label_fn,
    dfa=make_never_bomb_dfa(),
    obs_type="dict",
)

The returned observation has obs["orig"] for the base grid state and obs["automaton"] for the DFA state index. The same automaton state is also reported in info["automaton_state"].

Inline DFA: Never Hit a Bomb¶

With seed 1, actions [1, 1, 1, 1] move right from the start and reach bomb state 78.

Colour Bomb trace reaching a bomb in four right moves. — The final visited state is labelled `bomb`.¶

Never-bomb LTL-safety rollout timeline. — On the bomb label, `automaton_state` becomes `1`, `violation` becomes `1.0`, `cum_unsafe` becomes `1.0`, and `satisfied` becomes `0.0`.¶

Loaded DFA: Bomb Diffusion¶

The same environment trace can mean something different under a different DFA. Under property_2, touching bomb is allowed, but leaving immediately is unsafe.

property_2 rollout timeline showing immediate leave violation. — Actions `[1, 1, 1, 1, 0]` hit bomb and then leave, so the DFA reaches accepting unsafe state `q3`.¶

If the agent stays on bomb for one extra step, the same property remains satisfied:

property_2 rollout timeline showing safe bomb diffusion. — Actions `[1, 1, 1, 1, 4, 0]` hit bomb, stay once, then leave with `cum_unsafe=0.0` and `satisfied=1.0`.¶

V2 Extension: Medic Recovery¶

colour_bomb_grid_world_v2 adds medic labels and the larger loaded property_3 DFA. After observing bomb, the monitor expects recovery by reaching and remaining on medic; if recovery does not happen in time, the DFA reaches accepting violation state 21.

With seed 14, actions [2, 2, 2, 2] + [4] * 11 start at state 16, reach a bomb, and wait without reaching a medic.

Colour Bomb Grid World V2 trace reaching a bomb and waiting. — The V2 trace reaches a bomb and then repeatedly stays there.¶

property_3 medic recovery timeline reaching accepting state 21. — The automaton counts through the recovery obligation and reaches accepting state `21`, so later steps report violation.¶

What to Look For¶

ltl_safety treats labels as a sequence, not just isolated step costs.
obs_type="dict" makes the product state visible as obs["orig"] plus obs["automaton"].
info["automaton_state"] mirrors the monitor state.
info["constraint"]["step"] reports immediate cost and violation.
info["constraint"]["episode"] reports cum_unsafe and satisfied.