SEM¶
Source: masa/algorithms/tabular/sem.py
SEM is a tabular algorithm that learns more than a single task-value table. In MASA it maintains three tables:
Qfor task returnDas an auxiliary table updated using positive violation penaltiesCas an auxiliary table updated using negative violation penalties
Key Details¶
has separate learning rates and discount factors for the task and auxiliary tables
updates the task table and the safety-related tables in parallel
changes action selection using a safety weighting derived from
DandC
Action Selection Intuition¶
Unlike the simpler penalty methods, SEM does not rely only on changing the reward target. Instead, it adjusts action preferences directly using learned auxiliary tables. In the implementation this happens through a multiplicative weighting term built from clipped versions of D and C.
When To Use It¶
Use SEM when:
you want a more structured safe tabular method than plain penalization
you want action selection to depend on explicit learned safety signals
you want to separate task learning from auxiliary safety-related estimates