Q Learning¶
Source: masa/algorithms/tabular/q_learning.py
QL is the base tabular Q-learning implementation in MASA. It assumes discrete observation and action spaces and learns a single Q-table with one-step temporal-difference updates and a max backup.
Key Details¶
supports Boltzmann and epsilon-greedy exploration
stores a small transition buffer and updates the Q-table from collected transitions
uses the standard target
reward + gamma * max_a' Q(s', a'), with no future bootstrap on terminal transitions
Safety-Relevant Behaviour¶
If the environment uses a DFA-based cost function, QL generates counterfactual transitions for every automaton state in the product MDP. This is important for LTL-safety settings because it lets the learner update from all automaton-state interpretations of the observed transition, not only the one actually visited.
Outside that case, the algorithm still records step cost and violation information during rollout, but the optimization target itself remains standard task Q-learning.
When To Use It¶
Use QL as the baseline tabular method when you want:
a simple discrete-state baseline
a reference point for comparing the safe tabular variants
the shared rollout and exploration behaviour used by the other tabular algorithms