Q Learning Lambda¶
Source: masa/algorithms/tabular/q_learning_lambda.py
QL_Lambda extends QL with a linear cost penalty. Instead of learning only from task reward, it subtracts cost_lambda * cost from the reward target.
Key Details¶
keeps the same tabular structure and exploration options as
QLintroduces a cost-weighting parameter
cost_lambdasuppresses future bootstrapping after a violating transition
Update Intuition¶
The algorithm is still Q-learning, but the immediate target becomes a penalized reward. In practice this means unsafe behaviour is discouraged by making it less valuable, rather than by explicitly shielding or overriding actions.
This makes QL_Lambda the most direct penalty-based safe tabular baseline in the codebase.
When To Use It¶
Use QL_Lambda when:
you want a simple reward-penalty approach
expected cost penalties are an acceptable safety signal
you want a minimal change from standard Q-learning