Core Wrappers

API Reference

class masa.common.wrappers.TimeLimit(env: Env, max_episode_steps: int)[source]

Bases: ConstraintPersistentWrapper

Episode time-limit wrapper compatible with constraint persistence.

This is a minimal time-limit wrapper similar in spirit to Gymnasium’s time-limit handling. It sets the truncated flag to True once the number of elapsed steps reaches _max_episode_steps.

Parameters:
  • env – Base environment to wrap.

  • max_episode_steps – Maximum number of steps per episode.

Variables:
  • _max_episode_steps – Configured time limit in steps.

  • _elapsed_steps – Counter of steps elapsed in the current episode.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

step(action)[source]

Step the environment and apply time-limit truncation.

Parameters:

action – Action forwarded to the underlying environment.

Returns:

A 5-tuple (observation, reward, terminated, truncated, info). If the time limit is reached, truncated is forced to True.

reset(**kwargs)[source]

Reset the environment and the elapsed step counter.

Parameters:

**kwargs – Forwarded to the underlying environment’s reset.

Returns:

The underlying environment’s reset return value.

class masa.common.wrappers.ConstraintMonitor(env: Env)[source]

Bases: ConstraintPersistentWrapper

Monitor that injects constraint metadata and metrics into info.

This wrapper requires the wrapped environment to be a masa.common.constraints.base.BaseConstraintEnv, so it can query:

On each step, the wrapper writes:

  • info["constraint"]["type"]: the constraint type string

  • info["constraint"]["step"]: step-level metrics (cheap, safe)

  • info["constraint"]["episode"]: episode-level metrics (when available)

Parameters:

env – Constraint environment to wrap.

Raises:

TypeError – If env is not a BaseConstraintEnv.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset and populate initial constraint metadata in info.

Parameters:
  • seed – Random seed forwarded to the underlying environment.

  • options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info). The returned info includes info["constraint"]["type"] and info["constraint"]["step"].

step(action)[source]

Step and populate constraint metrics in info.

Parameters:

action – Action forwarded to the underlying environment.

Returns:

A 5-tuple (observation, reward, terminated, truncated, info). The returned info includes constraint fields described in the class docstring.

_step_metrics() Dict[str, float][source]

Read step-level constraint metrics.

Returns:

A dictionary of step-level metrics. If the underlying constraint raises an exception, returns an empty dictionary.

_episode_metrics() Dict[str, float][source]

Read episode-level constraint metrics.

Returns:

A dictionary of episode-level metrics. If the underlying constraint raises an exception, returns an empty dictionary.

class masa.common.wrappers.RewardMonitor(env: Env)[source]

Bases: ConstraintPersistentWrapper

Monitor that injects reward/length metrics into info.

This wrapper tracks:

  • per-step immediate reward in info["metrics"]["step"]["reward"]

  • episode return/length at episode end in info["metrics"]["episode"]

Parameters:

env – Base environment to wrap.

Variables:
  • total_reward – Accumulated episode reward since last reset.

  • total_steps – Number of steps taken since last reset.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset reward counters and forward reset to the underlying env.

Parameters:
  • seed – Random seed forwarded to the underlying environment.

  • options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info) from the underlying environment.

step(action)[source]

Step the environment and update reward metrics.

Parameters:

action – Action forwarded to the underlying environment.

Returns:

A 5-tuple (observation, reward, terminated, truncated, info). On episode end, info["metrics"]["episode"] is populated with episode return and length.

_episode_metrics()[source]

Compute episode-level reward metrics.

Returns:

  • "ep_reward": total episode reward.

  • "ep_length": episode length in steps.

Return type:

A dictionary with keys