Misc Wrappers¶

API Reference¶

class masa.common.wrappers.RewardShapingWrapper(env: Env, gamma: float = 0.99, impl: str = 'none')[source]¶

Bases: ConstraintPersistentWrapper

Potential-based reward shaping wrapper for DFA-based safety constraints.

If the wrapped environment’s constraint exposes a DFACostFn, this wrapper constructs a shaped cost function ShapedCostFn and updates the step cost entry inside info["constraint"]["step"] using:

\[c'_t \;=\; c_t \;+\; \gamma \Phi(q_{t+1}) \;-\; \Phi(q_t).\]

The potential \(\Phi\) depends on impl:

"none": \(\Phi(q)=0\) (no shaping)
"vi": approximate value iteration over DFA graph to derive potentials
"cycle": graph-distance based shaping using a reverse-reachability BFS

Notes

This wrapper assumes the wrapped environment is already producing info["automaton_state"] and a constraint monitor-like structure info["constraint"]["step"]["cost"]. If these keys are absent, the wrapper will fall back to default values (state 0 and cost 0.0).

Parameters:

env – Base environment to wrap.
gamma – Discount used in the shaping term \(\gamma \Phi(q_{t+1})\).
impl – Shaping implementation. One of {"none", "vi", "cycle"}.

Variables:

shaped_cost_fn – The cost function exposed by cost_fn after shaping.
potential_fn – Callable \(\Phi(q)\) mapping DFA states to potentials.
_last_potential – Potential at the previous step’s DFA state.
_gamma – Shaping discount factor.
_impl – Shaping implementation identifier.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶

Reset the environment and initialize shaping state.

Parameters:

seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info) from the underlying environment.

Notes

This wrapper reads info["automaton_state"] to initialize the previous potential _last_potential. If the key is missing, it assumes DFA state 0.

step(action: Any)[source]¶

Step the environment and apply potential-based shaping to the step cost.

Parameters:: action – Action forwarded to the underlying environment.
Returns:: A 5-tuple (observation, reward, terminated, truncated, info).

Side effects:: Updates info["constraint"]["step"]["cost"] in-place with the shaped cost and updates _last_potential.

Notes

If the underlying info does not contain constraint metrics, this method assumes an unshaped step cost of 0.0 and will still attempt to write back into info["constraint"]["step"].

property cost_fn¶

Expose the shaped cost function.

Returns:: The shaped cost function constructed in _setup_cost_fn().

class masa.common.wrappers.NormWrapper(env: Env, norm_obs: bool = True, norm_rew: bool = True, training: bool = True, clip_obs: float = 10.0, clip_rew: float = 10.0, gamma: float = 0.99, eps: float = 1e-8)[source]¶

Bases: ConstraintPersistentWrapper

Normalize observations and/or rewards for a single (non-vectorized) environment.

This wrapper maintains running mean/variance estimates and applies:

Observation normalization (elementwise): \((x - \mu) / \sqrt{\sigma^2 + \varepsilon}\)
Reward normalization using a running variance estimate over discounted returns.

This wrapper is intended for non-vectorized environments. For vectorized environments, use VecNormWrapper.

Parameters:

env – Base (non-vectorized) environment.
norm_obs – Whether to normalize observations.
norm_rew – Whether to normalize rewards.
training – If True, update running statistics; otherwise, statistics are frozen.
clip_obs – Clip normalized observations to [-clip_obs, clip_obs].
clip_rew – Clip normalized rewards to [-clip_rew, clip_rew].
gamma – Discount factor for the running return used in reward normalization.
eps – Small constant \(\varepsilon\) for numerical stability.

Variables:

norm_obs – See Args.
norm_rew – See Args.
training – See Args.
clip_obs – See Args.
clip_rew – See Args.
gamma – See Args.
eps – See Args.
obs_rms – masa.common.running_mean_std.RunningMeanStd for observations.
rew_rms – masa.common.running_mean_std.RunningMeanStd for returns.
returns – Discounted return accumulator used for reward normalization.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶

Reset the environment and (optionally) update normalization statistics.

Parameters:

seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.

Returns:

A tuple (obs, info) where obs may be normalized.

step(action)[source]¶

Step the environment and apply observation/reward normalization.

Parameters:: action – Action forwarded to the underlying environment.
Returns:: A 5-tuple (obs, rew, terminated, truncated, info) where obs and/or rew may be normalized.

class masa.common.wrappers.OneHotObsWrapper(env: Env)[source]¶

Bases: ConstraintPersistentObsWrapper

One-hot encode gymnasium.spaces.Discrete observations.

Supported input observation spaces:

gymnasium.spaces.Discrete: returns a 1D one-hot vector of length n.
gymnasium.spaces.Dict: one-hot encodes any Discrete subspaces and passes through non-Discrete subspaces.
Otherwise: passes observations through unchanged.

The wrapper updates gymnasium.Env.observation_space accordingly.

Parameters:

env – Base environment to wrap.

Variables:

_orig_obs_space – The original observation space of the wrapped env.
_mode – One of {"discrete", "dict", "pass"} describing the encoding mode.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap

class masa.common.wrappers.FlattenDictObsWrapper(env: Env)[source]¶

Bases: ConstraintPersistentObsWrapper

Flatten a gymnasium.spaces.Dict observation into a 1D Box.

The wrapper creates a deterministic key ordering (alphabetical) and concatenates each sub-observation in that order.

Supported Dict subspaces:

gymnasium.spaces.Box: flattened via reshape(-1).
gymnasium.spaces.Discrete: represented as a length-n one-hot segment for the purposes of bounds (note: the current implementation of _get_obs() expects Box values; see Notes).

Parameters:

env – Base environment with Dict observation space.

Variables:

_orig_obs_space – Original Dict observation space.
_key_slices – Mapping from key to slice in the flattened vector.

Raises:

TypeError – If the underlying observation space is not a Dict, or contains unsupported subspaces.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:: env – The environment to wrap