Core Wrappers¶

API Reference¶

class masa.common.wrappers.TimeLimit(env: Env, max_episode_steps: int)[source]¶

Bases: ConstraintPersistentWrapper

Episode time-limit wrapper compatible with constraint persistence.

This is a minimal time-limit wrapper similar in spirit to Gymnasium’s time-limit handling. It sets the truncated flag to True once the number of elapsed steps reaches _max_episode_steps.

Parameters:

env – Base environment to wrap.
max_episode_steps – Maximum number of steps per episode.

Variables:

_max_episode_steps – Configured time limit in steps.
_elapsed_steps – Counter of steps elapsed in the current episode.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

step(action)[source]¶

Step the environment and apply time-limit truncation.

Parameters:: action – Action forwarded to the underlying environment.
Returns:: A 5-tuple (observation, reward, terminated, truncated, info). If the time limit is reached, truncated is forced to True.

reset(**kwargs)[source]¶

Reset the environment and the elapsed step counter.

Parameters:: **kwargs – Forwarded to the underlying environment’s reset.
Returns:: The underlying environment’s reset return value.

class masa.common.wrappers.ConstraintMonitor(env: Env)[source]¶

Bases: ConstraintPersistentWrapper

Monitor that injects constraint metadata and metrics into info.

This wrapper requires the wrapped environment to be a masa.common.constraints.base.BaseConstraintEnv, so it can query:

On each step, the wrapper writes:

info["constraint"]["type"]: the constraint type string
info["constraint"]["step"]: step-level metrics (cheap, safe)
info["constraint"]["episode"]: episode-level metrics (when available)

Parameters:: env – Constraint environment to wrap.
Raises:: TypeError – If env is not a BaseConstraintEnv.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶

Reset and populate initial constraint metadata in info.

Parameters:

seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info). The returned info includes info["constraint"]["type"] and info["constraint"]["step"].

step(action)[source]¶

Step and populate constraint metrics in info.

Parameters:: action – Action forwarded to the underlying environment.
Returns:: A 5-tuple (observation, reward, terminated, truncated, info). The returned info includes constraint fields described in the class docstring.

_step_metrics() → Dict[str, float][source]¶

Read step-level constraint metrics.

Returns:: A dictionary of step-level metrics. If the underlying constraint raises an exception, returns an empty dictionary.

_episode_metrics() → Dict[str, float][source]¶

Read episode-level constraint metrics.

Returns:: A dictionary of episode-level metrics. If the underlying constraint raises an exception, returns an empty dictionary.

class masa.common.wrappers.RewardMonitor(env: Env)[source]¶

Bases: ConstraintPersistentWrapper

Monitor that injects reward/length metrics into info.

This wrapper tracks:

per-step immediate reward in info["metrics"]["step"]["reward"]
episode return/length at episode end in info["metrics"]["episode"]

Parameters:

env – Base environment to wrap.

Variables:

total_reward – Accumulated episode reward since last reset.
total_steps – Number of steps taken since last reset.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶

Reset reward counters and forward reset to the underlying env.

Parameters:

seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info) from the underlying environment.

step(action)[source]¶

Step the environment and update reward metrics.

Parameters:: action – Action forwarded to the underlying environment.
Returns:: A 5-tuple (observation, reward, terminated, truncated, info). On episode end, info["metrics"]["episode"] is populated with episode return and length.

_episode_metrics()[source]¶

Compute episode-level reward metrics.

Returns:

"ep_reward": total episode reward.
"ep_length": episode length in steps.

Return type:

A dictionary with keys