Q Learning Lambda

Source: masa/algorithms/tabular/q_learning_lambda.py

QL_Lambda extends QL with a linear cost penalty. Instead of learning only from task reward, it subtracts cost_lambda * cost from the reward target.

Key Details

  • keeps the same tabular structure and exploration options as QL

  • introduces a cost-weighting parameter cost_lambda

  • suppresses future bootstrapping after a violating transition

Update Intuition

The algorithm is still Q-learning, but the immediate target becomes a penalized reward. In practice this means unsafe behaviour is discouraged by making it less valuable, rather than by explicitly shielding or overriding actions.

This makes QL_Lambda the most direct penalty-based safe tabular baseline in the codebase.

When To Use It

Use QL_Lambda when:

  • you want a simple reward-penalty approach

  • expected cost penalties are an acceptable safety signal

  • you want a minimal change from standard Q-learning