Misc Wrappers¶
API Reference¶
- class masa.common.wrappers.RewardShapingWrapper(env: Env, gamma: float = 0.99, impl: str = 'none')[source]¶
Bases:
ConstraintPersistentWrapperPotential-based reward shaping wrapper for DFA-based safety constraints.
If the wrapped environment’s constraint exposes a
DFACostFn, this wrapper constructs a shaped cost functionShapedCostFnand updates the stepcostentry insideinfo["constraint"]["step"]using:\[c'_t \;=\; c_t \;+\; \gamma \Phi(q_{t+1}) \;-\; \Phi(q_t).\]The potential \(\Phi\) depends on
impl:"none": \(\Phi(q)=0\) (no shaping)"vi": approximate value iteration over DFA graph to derive potentials"cycle": graph-distance based shaping using a reverse-reachability BFS
Notes
This wrapper assumes the wrapped environment is already producing
info["automaton_state"]and a constraint monitor-like structureinfo["constraint"]["step"]["cost"]. If these keys are absent, the wrapper will fall back to default values (state0and cost0.0).- Parameters:
env – Base environment to wrap.
gamma – Discount used in the shaping term \(\gamma \Phi(q_{t+1})\).
impl – Shaping implementation. One of
{"none", "vi", "cycle"}.
- Variables:
shaped_cost_fn – The cost function exposed by
cost_fnafter shaping.potential_fn – Callable \(\Phi(q)\) mapping DFA states to potentials.
_last_potential – Potential at the previous step’s DFA state.
_gamma – Shaping discount factor.
_impl – Shaping implementation identifier.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset the environment and initialize shaping state.
- Parameters:
seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.
- Returns:
A tuple
(obs, info)from the underlying environment.
Notes
This wrapper reads
info["automaton_state"]to initialize the previous potential_last_potential. If the key is missing, it assumes DFA state0.
- step(action: Any)[source]¶
Step the environment and apply potential-based shaping to the step cost.
- Parameters:
action – Action forwarded to the underlying environment.
- Returns:
A 5-tuple
(observation, reward, terminated, truncated, info).
- Side effects:
Updates
info["constraint"]["step"]["cost"]in-place with the shaped cost and updates_last_potential.
Notes
If the underlying
infodoes not contain constraint metrics, this method assumes an unshaped step cost of0.0and will still attempt to write back intoinfo["constraint"]["step"].
- property cost_fn¶
Expose the shaped cost function.
- Returns:
The shaped cost function constructed in
_setup_cost_fn().
- class masa.common.wrappers.NormWrapper(env: Env, norm_obs: bool = True, norm_rew: bool = True, training: bool = True, clip_obs: float = 10.0, clip_rew: float = 10.0, gamma: float = 0.99, eps: float = 1e-8)[source]¶
Bases:
ConstraintPersistentWrapperNormalize observations and/or rewards for a single (non-vectorized) environment.
This wrapper maintains running mean/variance estimates and applies:
Observation normalization (elementwise): \((x - \mu) / \sqrt{\sigma^2 + \varepsilon}\)
Reward normalization using a running variance estimate over discounted returns.
This wrapper is intended for non-vectorized environments. For vectorized environments, use
VecNormWrapper.- Parameters:
env – Base (non-vectorized) environment.
norm_obs – Whether to normalize observations.
norm_rew – Whether to normalize rewards.
training – If
True, update running statistics; otherwise, statistics are frozen.clip_obs – Clip normalized observations to
[-clip_obs, clip_obs].clip_rew – Clip normalized rewards to
[-clip_rew, clip_rew].gamma – Discount factor for the running return used in reward normalization.
eps – Small constant \(\varepsilon\) for numerical stability.
- Variables:
norm_obs – See Args.
norm_rew – See Args.
training – See Args.
clip_obs – See Args.
clip_rew – See Args.
gamma – See Args.
eps – See Args.
obs_rms –
masa.common.running_mean_std.RunningMeanStdfor observations.rew_rms –
masa.common.running_mean_std.RunningMeanStdfor returns.returns – Discounted return accumulator used for reward normalization.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset the environment and (optionally) update normalization statistics.
- Parameters:
seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.
- Returns:
A tuple
(obs, info)whereobsmay be normalized.
- class masa.common.wrappers.OneHotObsWrapper(env: Env)[source]¶
Bases:
ConstraintPersistentObsWrapperOne-hot encode
gymnasium.spaces.Discreteobservations.Supported input observation spaces:
gymnasium.spaces.Discrete: returns a 1D one-hot vector of lengthn.gymnasium.spaces.Dict: one-hot encodes any Discrete subspaces and passes through non-Discrete subspaces.Otherwise: passes observations through unchanged.
The wrapper updates
gymnasium.Env.observation_spaceaccordingly.- Parameters:
env – Base environment to wrap.
- Variables:
_orig_obs_space – The original observation space of the wrapped env.
_mode – One of
{"discrete", "dict", "pass"}describing the encoding mode.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- class masa.common.wrappers.FlattenDictObsWrapper(env: Env)[source]¶
Bases:
ConstraintPersistentObsWrapperFlatten a
gymnasium.spaces.Dictobservation into a 1DBox.The wrapper creates a deterministic key ordering (alphabetical) and concatenates each sub-observation in that order.
Supported Dict subspaces:
gymnasium.spaces.Box: flattened viareshape(-1).gymnasium.spaces.Discrete: represented as a length-none-hot segment for the purposes of bounds (note: the current implementation of_get_obs()expects Box values; see Notes).
- Parameters:
env – Base environment with Dict observation space.
- Variables:
_orig_obs_space – Original Dict observation space.
_key_slices – Mapping from key to slice in the flattened vector.
- Raises:
TypeError – If the underlying observation space is not a Dict, or contains unsupported subspaces.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap