Misc Wrappers

API Reference

class masa.common.wrappers.RewardShapingWrapper(env: Env, gamma: float = 0.99, impl: str = 'none')[source]

Bases: ConstraintPersistentWrapper

Potential-based reward shaping wrapper for DFA-based safety constraints.

If the wrapped environment’s constraint exposes a DFACostFn, this wrapper constructs a shaped cost function ShapedCostFn and updates the step cost entry inside info["constraint"]["step"] using:

\[c'_t \;=\; c_t \;+\; \gamma \Phi(q_{t+1}) \;-\; \Phi(q_t).\]

The potential \(\Phi\) depends on impl:

  • "none": \(\Phi(q)=0\) (no shaping)

  • "vi": approximate value iteration over DFA graph to derive potentials

  • "cycle": graph-distance based shaping using a reverse-reachability BFS

Notes

This wrapper assumes the wrapped environment is already producing info["automaton_state"] and a constraint monitor-like structure info["constraint"]["step"]["cost"]. If these keys are absent, the wrapper will fall back to default values (state 0 and cost 0.0).

Parameters:
  • env – Base environment to wrap.

  • gamma – Discount used in the shaping term \(\gamma \Phi(q_{t+1})\).

  • impl – Shaping implementation. One of {"none", "vi", "cycle"}.

Variables:
  • shaped_cost_fn – The cost function exposed by cost_fn after shaping.

  • potential_fn – Callable \(\Phi(q)\) mapping DFA states to potentials.

  • _last_potential – Potential at the previous step’s DFA state.

  • _gamma – Shaping discount factor.

  • _impl – Shaping implementation identifier.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset the environment and initialize shaping state.

Parameters:
  • seed – Random seed forwarded to the underlying environment.

  • options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info) from the underlying environment.

Notes

This wrapper reads info["automaton_state"] to initialize the previous potential _last_potential. If the key is missing, it assumes DFA state 0.

step(action: Any)[source]

Step the environment and apply potential-based shaping to the step cost.

Parameters:

action – Action forwarded to the underlying environment.

Returns:

A 5-tuple (observation, reward, terminated, truncated, info).

Side effects:

Updates info["constraint"]["step"]["cost"] in-place with the shaped cost and updates _last_potential.

Notes

If the underlying info does not contain constraint metrics, this method assumes an unshaped step cost of 0.0 and will still attempt to write back into info["constraint"]["step"].

property cost_fn

Expose the shaped cost function.

Returns:

The shaped cost function constructed in _setup_cost_fn().

class masa.common.wrappers.NormWrapper(env: Env, norm_obs: bool = True, norm_rew: bool = True, training: bool = True, clip_obs: float = 10.0, clip_rew: float = 10.0, gamma: float = 0.99, eps: float = 1e-8)[source]

Bases: ConstraintPersistentWrapper

Normalize observations and/or rewards for a single (non-vectorized) environment.

This wrapper maintains running mean/variance estimates and applies:

  • Observation normalization (elementwise): \((x - \mu) / \sqrt{\sigma^2 + \varepsilon}\)

  • Reward normalization using a running variance estimate over discounted returns.

This wrapper is intended for non-vectorized environments. For vectorized environments, use VecNormWrapper.

Parameters:
  • env – Base (non-vectorized) environment.

  • norm_obs – Whether to normalize observations.

  • norm_rew – Whether to normalize rewards.

  • training – If True, update running statistics; otherwise, statistics are frozen.

  • clip_obs – Clip normalized observations to [-clip_obs, clip_obs].

  • clip_rew – Clip normalized rewards to [-clip_rew, clip_rew].

  • gamma – Discount factor for the running return used in reward normalization.

  • eps – Small constant \(\varepsilon\) for numerical stability.

Variables:
  • norm_obs – See Args.

  • norm_rew – See Args.

  • training – See Args.

  • clip_obs – See Args.

  • clip_rew – See Args.

  • gamma – See Args.

  • eps – See Args.

  • obs_rmsmasa.common.running_mean_std.RunningMeanStd for observations.

  • rew_rmsmasa.common.running_mean_std.RunningMeanStd for returns.

  • returns – Discounted return accumulator used for reward normalization.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset the environment and (optionally) update normalization statistics.

Parameters:
  • seed – Random seed forwarded to the underlying environment.

  • options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info) where obs may be normalized.

step(action)[source]

Step the environment and apply observation/reward normalization.

Parameters:

action – Action forwarded to the underlying environment.

Returns:

A 5-tuple (obs, rew, terminated, truncated, info) where obs and/or rew may be normalized.

class masa.common.wrappers.OneHotObsWrapper(env: Env)[source]

Bases: ConstraintPersistentObsWrapper

One-hot encode gymnasium.spaces.Discrete observations.

Supported input observation spaces:

  • gymnasium.spaces.Discrete: returns a 1D one-hot vector of length n.

  • gymnasium.spaces.Dict: one-hot encodes any Discrete subspaces and passes through non-Discrete subspaces.

  • Otherwise: passes observations through unchanged.

The wrapper updates gymnasium.Env.observation_space accordingly.

Parameters:

env – Base environment to wrap.

Variables:
  • _orig_obs_space – The original observation space of the wrapped env.

  • _mode – One of {"discrete", "dict", "pass"} describing the encoding mode.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

class masa.common.wrappers.FlattenDictObsWrapper(env: Env)[source]

Bases: ConstraintPersistentObsWrapper

Flatten a gymnasium.spaces.Dict observation into a 1D Box.

The wrapper creates a deterministic key ordering (alphabetical) and concatenates each sub-observation in that order.

Supported Dict subspaces:

  • gymnasium.spaces.Box: flattened via reshape(-1).

  • gymnasium.spaces.Discrete: represented as a length-n one-hot segment for the purposes of bounds (note: the current implementation of _get_obs() expects Box values; see Notes).

Parameters:

env – Base environment with Dict observation space.

Variables:
  • _orig_obs_space – Original Dict observation space.

  • _key_slices – Mapping from key to slice in the flattened vector.

Raises:

TypeError – If the underlying observation space is not a Dict, or contains unsupported subspaces.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap