Probabilistic Shielding¶

This module provides a Gymnasium-compatible implementation of Probabilistic Shielding for Safe Reinforcement Learning, based on the state-augmentation framework introduced in:

Probabilistic Shielding for Safe Reinforcement Learning Edwin Hamel-De le Court, Francesco Belardinelli, Alexander W. Goodall arXiv: https://arxiv.org/abs/2503.07671

The approach guarantees probabilistic safety during both training and evaluation, while remaining optimality-preserving among all safe policies.

Overview¶

Probabilistic Shielding addresses reinforcement learning problems of the form:

Maximise discounted reward subject to an undiscounted probabilistic safety constraint.

Safety is expressed as an avoidance property:

\[\mathbb{P}(\text{reach unsafe}) \le p\]

Rather than constraining the policy directly, the method constructs a safety-aware augmented MDP (the shield) in which every policy is provably safe.

Any standard RL algorithm (e.g. PPO) can then be trained on the shielded environment.

Generic Procedure¶

Given an environment with known safety dynamics:

Compute safety bounds Use sound value iteration (interval iteration) to compute an inductive upper bound \(\beta(s)\) on the minimal probability of reaching an unsafe state from each state.
Augment the MDP Each state is augmented with a remaining safety budget \(q \in [\beta(s), 1]\).
Restrict actions via a shield At each augmented state \((s, q)\), only actions that provably preserve the safety bound are allowed. This is enforced by projecting agent-selected actions onto a safe probability simplex.
Train normally Train any RL algorithm on the shielded MDP. Safety is guaranteed by construction, not by penalties or Lagrangians.

The `ProbShieldWrapperDisc`¶

The main entry point is the Gymnasium wrapper:

ProbShieldWrapperDisc(env, ...)

Expected inputs and types:

env: TabularEnv | DiscreteEnv
    Must expose safety dynamics either as a full transition kernel or compact kernel:
      successor_states_matrix: np.ndarray[int]  # (K, n_states)
      probabilities: np.ndarray[float]          # (K, n_states, n_actions)

label_fn: Callable[[int], Labels]
    Called on discrete state ids (or abstract ids if using safety_abstraction).

cost_fn: Callable[[Labels], float]
    MASA CostFn. Maps labels (set of atomic predicates) -> float cost.

safety_abstraction: Optional[Callable[[Any], int]]
    Maps raw env observations/states -> discrete abstract state id.
    Required if observation_space is not Discrete.

The wrapper:

Computes safety bounds automatically using interval value iteration
Augments observations with the current safety budget
Projects actions to satisfy the probabilistic safety constraint
Is compatible with any on-policy or off-policy RL algorithm

Implementation details can be found in prob_shield_wrapper_disc.py.

Usage Examples¶

Basic Probabilistic Shielding (Discrete MDP, PCTL)¶

For environments with discrete state spaces and PCTL safety constraints:

env = make_env(
    "pacman",
    "pctl",
    1000,
    label_fn=label_fn,
    cost_fn=cost_fn,
    alpha=0.01,
)

env = ProbShieldWrapperDisc(
    env,
    init_safety_bound=0.01,
    theta=1e-15,
    max_vi_steps=10_000,
    granularity=20,
)

See the full example in prob_shield_example.py.