Algorithms Overview¶
This section documents the algorithm classes currently present in the MASA codebase. The pages here are intentionally lightweight for now and focus on the core implementation ideas verified against the code.
Implemented Algorithms¶
MASA currently contains three main groups of learning algorithms:
tabular algorithms for discrete state and action spaces, including safety-aware variants
neural on-policy actor-critic algorithms
shield-aware PPO variants used with probabilistic shielding wrappers
The algorithms currently registered in the main plugin registry are:
Algorithm |
Family |
Core idea |
Safety mechanism |
|---|---|---|---|
|
Tabular |
Standard one-step Q-learning baseline |
None built into the update |
|
Tabular |
Q-learning with cost-penalized reward |
Linear cost penalty |
|
Tabular |
Learns task and auxiliary safety-related tables |
Safety-weighted action selection |
|
Tabular |
Q-learning with absorbing-style violation value |
Fixed violation return via |
|
Tabular |
Learns task and backup policies with overrides |
Risk threshold and backup-action override |
|
On-policy |
Clipped actor-critic policy optimization |
None built into the base algorithm |
|
On-policy |
Advantage actor-critic |
None built into the base algorithm |
These are registered in masa/plugins/supported.py.
Sections¶
Supporting Infrastructure¶
Several components are not standalone learning algorithms, but they are important for understanding how MASA algorithms work:
masa/common/on_policy_algorithm.py: shared rollout, return, and GAE logic forA2CandPPOmasa/common/policies.py: actor-critic networks and action distributionsmasa/prob_shield/eventual_discounted_vi.py: value iteration used by shielding utilities and byRECREGin exact modemasa/prob_shield/interval_bound_vi.py: interval-bound value iteration for safety analysis