Skip to content

Get Started

Quick Start
Core Concepts
- Labelling Function
- Cost Function
Basic Usage

Common API

Constraints
Wrappers
Metrics
- Logging
Linear Temporal Logic (LTL)
Probabilistic Computation Tree Logic (PCTL)

Environments

Multi Agent
- Gridworlds
  - Markov Stag Hunt
- Matrix Games
Single Agent

Algorithms

Algorithms Overview
Tabular Algorithms
- Q Learning
- Q Learning Lambda
- LCRL
- SEM
- RECREG
On-Policy Algorithms
Shielded Algorithms
- Parameterized PPO
- Parameterized PPO V2

Tutorials

Basics
Constraints
- Constraints Tour
- LTL-Safety
  - LTL Safety Colour Bomb
Wrappers
- Vectorization and Normalization
Environments
- Create a New Environment
Baselines
- Tabular Safe RL Baselines
- Continuous Safe RL Baselines
Shielding
Multi-Agent
- Multi-Agent CMG

Misc

Probabilistic Shielding

On-Policy Algorithms¶

This section covers MASA’s on-policy actor-critic methods. The currently implemented and registered algorithms in this part of the codebase are A2C and PPO.

A2C
PPO
PPO Lagrangian
Constrained Policy Optimization

Copyright © 2025, Alexander Goodall, Omar Adalat, Edwin Hamel De Le Court, Francesco Belardinelli

Made with Sphinx and @pradyunsg's Furo