Core Wrappers¶
API Reference¶
- class masa.common.wrappers.TimeLimit(env: Env, max_episode_steps: int)[source]¶
Bases:
ConstraintPersistentWrapperEpisode time-limit wrapper compatible with constraint persistence.
This is a minimal time-limit wrapper similar in spirit to Gymnasium’s time-limit handling. It sets the
truncatedflag toTrueonce the number of elapsed steps reaches_max_episode_steps.- Parameters:
env – Base environment to wrap.
max_episode_steps – Maximum number of steps per episode.
- Variables:
_max_episode_steps – Configured time limit in steps.
_elapsed_steps – Counter of steps elapsed in the current episode.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- class masa.common.wrappers.ConstraintMonitor(env: Env)[source]¶
Bases:
ConstraintPersistentWrapperMonitor that injects constraint metadata and metrics into
info.This wrapper requires the wrapped environment to be a
masa.common.constraints.base.BaseConstraintEnv, so it can query:masa.common.constraints.base.BaseConstraintEnv.constraint_typemasa.common.constraints.base.BaseConstraintEnv.constraint_step_metrics()masa.common.constraints.base.BaseConstraintEnv.constraint_episode_metrics()
On each step, the wrapper writes:
info["constraint"]["type"]: the constraint type stringinfo["constraint"]["step"]: step-level metrics (cheap, safe)info["constraint"]["episode"]: episode-level metrics (when available)
- Parameters:
env – Constraint environment to wrap.
- Raises:
TypeError – If
envis not aBaseConstraintEnv.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset and populate initial constraint metadata in
info.- Parameters:
seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.
- Returns:
A tuple
(obs, info). The returnedinfoincludesinfo["constraint"]["type"]andinfo["constraint"]["step"].
- step(action)[source]¶
Step and populate constraint metrics in
info.- Parameters:
action – Action forwarded to the underlying environment.
- Returns:
A 5-tuple
(observation, reward, terminated, truncated, info). The returnedinfoincludesconstraintfields described in the class docstring.
- class masa.common.wrappers.RewardMonitor(env: Env)[source]¶
Bases:
ConstraintPersistentWrapperMonitor that injects reward/length metrics into
info.This wrapper tracks:
per-step immediate reward in
info["metrics"]["step"]["reward"]episode return/length at episode end in
info["metrics"]["episode"]
- Parameters:
env – Base environment to wrap.
- Variables:
total_reward – Accumulated episode reward since last reset.
total_steps – Number of steps taken since last reset.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset reward counters and forward
resetto the underlying env.- Parameters:
seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.
- Returns:
A tuple
(obs, info)from the underlying environment.