Algorithms Overview

This section documents the algorithm classes currently present in the MASA codebase. The pages here are intentionally lightweight for now and focus on the core implementation ideas verified against the code.

Implemented Algorithms

MASA currently contains three main groups of learning algorithms:

  • tabular algorithms for discrete state and action spaces, including safety-aware variants

  • neural on-policy actor-critic algorithms

  • shield-aware PPO variants used with probabilistic shielding wrappers

The algorithms currently registered in the main plugin registry are:

Algorithm

Family

Core idea

Safety mechanism

QL

Tabular

Standard one-step Q-learning baseline

None built into the update

QL_Lambda

Tabular

Q-learning with cost-penalized reward

Linear cost penalty

SEM

Tabular

Learns task and auxiliary safety-related tables

Safety-weighted action selection

LCRL

Tabular

Q-learning with absorbing-style violation value

Fixed violation return via r_min

RECREG

Tabular

Learns task and backup policies with overrides

Risk threshold and backup-action override

PPO

On-policy

Clipped actor-critic policy optimization

None built into the base algorithm

A2C

On-policy

Advantage actor-critic

None built into the base algorithm

These are registered in masa/plugins/supported.py.

Sections

Supporting Infrastructure

Several components are not standalone learning algorithms, but they are important for understanding how MASA algorithms work:

  • masa/common/on_policy_algorithm.py: shared rollout, return, and GAE logic for A2C and PPO

  • masa/common/policies.py: actor-critic networks and action distributions

  • masa/prob_shield/eventual_discounted_vi.py: value iteration used by shielding utilities and by RECREG in exact mode

  • masa/prob_shield/interval_bound_vi.py: interval-bound value iteration for safety analysis