LCRL¶
Source: masa/algorithms/tabular/lcrl.py
LCRL extends QL, but handles violations more aggressively than QL_Lambda. Rather than subtracting a tunable linear penalty, it gives violating transitions a fixed absorbing-style return based on r_min / (1 - gamma).
Key Details¶
keeps the same tabular structure and exploration options as
QLmaps violations to a fixed long-run value controlled by
r_minstops future bootstrapping through violating transitions
Update Intuition¶
The main idea is that unsafe behaviour should not just be slightly worse than safe behaviour. Instead, it is mapped to an explicit absorbing-style value. That makes LCRL less about soft penalties and more about assigning a fixed long-run outcome to violations.
When To Use It¶
Use LCRL when:
you want violations to be strongly dominated in value
a worst-case interpretation is more appropriate than a soft penalty
you want a tabular baseline that is stricter than
QL_Lambda