A2C¶

Source: masa/algorithms/a2c/a2c.py

A2C is MASA’s advantage actor-critic implementation. It is built on the shared OnPolicyAlgorithm scaffold and uses the same actor-critic policy family as PPO.

Key Details¶

collects on-policy rollouts using the shared rollout buffer
computes returns and generalized advantage estimates
performs one gradient update on the actor and critic per rollout batch
can optionally normalize advantages

Implementation Notes¶

The class delegates most rollout handling to masa/common/on_policy_algorithm.py. That shared scaffold handles vectorized environments, action formatting, bootstrapping on truncation, and return or advantage computation. A2C mainly defines the actor-critic loss and the one-update optimization step.

The default policy class is PPOPolicy, so A2C and PPO share the same network family even though they use different optimization objectives.

When To Use It¶

Use A2C when:

you want a simple on-policy actor-critic baseline
you want a lighter update scheme than PPO
you want to compare unclipped policy-gradient style training against PPO