Vectorized Envs

API Reference

class masa.common.wrappers.VecEnvWrapperBase(env: Env)[source]

Bases: ConstraintPersistentWrapper

Base class for simple Python-list vector environment wrappers.

Vector environments in this file expose:

  • n_envs: number of parallel environments

  • reset(): returns (obs_list, info_list)

  • step(): returns (obs_list, rew_list, term_list, trunc_list, info_list)

  • reset_done(): reset only environments indicated by a dones mask

Parameters:

env – For DummyVecWrapper, this is the single underlying env. For VecWrapper, this is set to envs[0] to preserve a Gymnasium-like API surface.

Variables:

n_envs (int) – Number of environments.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset only the environments indicated by dones.

Parameters:
  • dones – Boolean mask/list of length n_envs. Entries set to True are reset.

  • seed – Optional base seed. Implementations may offset by environment index.

  • options – Reset options forwarded to underlying environments.

Returns:

  • reset_obs is a list of length n_envs containing reset observations at indices that were reset, and None elsewhere.

  • reset_infos is a list of length n_envs containing reset info dicts at indices that were reset, and empty dicts elsewhere.

Return type:

A tuple (reset_obs, reset_infos) where

Raises:

NotImplementedError – If not implemented by a subclass.

class masa.common.wrappers.DummyVecWrapper(env: Env)[source]

Bases: VecEnvWrapperBase

Wrap a single environment with a vector-environment API (n_envs=1).

This wrapper is useful for code paths that expect list-based vector outputs, while still running a single environment instance.

Parameters:

env – Base environment.

Variables:
  • n_envs – Always 1.

  • envs – List containing the single wrapped environment.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset and return vectorized lists of length 1.

Parameters:
  • seed – Random seed forwarded to the underlying environment.

  • options – Reset options forwarded to the underlying environment.

Returns:

([obs], [info]).

reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Conditionally reset the single environment.

Parameters:
  • dones – A length-1 mask. If dones[0] is True, reset.

  • seed – Random seed forwarded to the underlying environment.

  • options – Reset options forwarded to the underlying environment.

Returns:

A pair (reset_obs, reset_infos) as described by VecEnvWrapperBase.reset_done().

step(action)[source]

Step and return vectorized lists of length 1.

Parameters:

action – Action for the single environment.

Returns:

([obs], [rew], [terminated], [truncated], [info]).

class masa.common.wrappers.VecWrapper(envs: List[gym.Env])[source]

Bases: VecEnvWrapperBase

Wrap a list of environments with a simple vector-environment API.

Each underlying environment is reset/stepped sequentially in Python, and results are returned as Python lists.

Parameters:

envs – Non-empty list of environments.

Variables:
  • envs – The list of wrapped environments.

  • n_envs – Number of wrapped environments.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset all environments and return lists.

Parameters:
  • seed – Optional base seed. If provided, environment i receives seed + i.

  • options – Reset options forwarded to each environment.

Returns:

A pair (obs_list, info_list) of length n_envs.

reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset only environments whose done flag is True.

Parameters:
  • dones – Boolean mask/list of length n_envs.

  • seed – Optional base seed. If provided, environment i receives seed + i.

  • options – Reset options forwarded to environments being reset.

Returns:

A tuple (reset_obs, reset_infos) where non-reset indices contain None and {} respectively.

step(action)[source]

Step all environments.

Parameters:

actions – Iterable of actions of length n_envs.

Returns:

A 5-tuple of lists (obs_list, rew_list, term_list, trunc_list, info_list).

Notes

The loop expects one action per environment. If the provided actions length mismatches n_envs, Python will raise.

class masa.common.wrappers.VecNormWrapper(env: gym.Env | List[gym.Env], norm_obs: bool = True, norm_rew: bool = True, training: bool = True, clip_obs: float = 10.0, clip_rew: float = 10.0, gamma: float = 0.99, eps: float = 1e-8)[source]

Bases: VecEnvWrapperBase

Normalize observations and/or rewards for a vectorized environment.

This wrapper expects an environment implementing VecEnvWrapperBase (e.g., DummyVecWrapper or VecWrapper) and applies the same normalization logic as NormWrapper, but over batches.

Observation normalization uses running statistics of the stacked observation array (shape (n_envs, *obs_shape)). Reward normalization uses running statistics of discounted returns per environment.

Parameters:
  • env – A vectorized environment implementing VecEnvWrapperBase.

  • norm_obs – Whether to normalize observations.

  • norm_rew – Whether to normalize rewards.

  • training – If True, update running statistics; otherwise, statistics are frozen.

  • clip_obs – Clip normalized observations to [-clip_obs, clip_obs].

  • clip_rew – Clip normalized rewards to [-clip_rew, clip_rew].

  • gamma – Discount factor for the running return used in reward normalization.

  • eps – Small constant \(\varepsilon\) for numerical stability.

Variables:
  • n_envs – Copied from the wrapped vector environment.

  • obs_rmsmasa.common.running_mean_std.RunningMeanStd for observations.

  • rew_rmsmasa.common.running_mean_std.RunningMeanStd for returns.

  • returns – Vector of length n_envs storing per-env discounted returns.

Wraps an environment to allow a modular transformation of the step() and reset() methods.

Parameters:

env – The environment to wrap

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset all environments and normalize observations.

Parameters:
  • seed – Optional base seed forwarded to the underlying vector env.

  • options – Reset options forwarded to the underlying vector env.

Returns:

A pair (obs_list, info_list). Observations may be normalized.

reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]

Reset only environments indicated by dones and normalize those observations.

Parameters:
  • dones – Boolean mask/list of length n_envs.

  • seed – Optional base seed forwarded to the underlying vector env.

  • options – Reset options forwarded to the underlying vector env.

Returns:

A tuple (reset_obs, reset_infos) as described by VecEnvWrapperBase.reset_done(), with reset observations optionally normalized.

step(actions)[source]

Step all environments and apply observation/reward normalization.

Parameters:

actions – Iterable of actions of length n_envs.

Returns:

A 5-tuple (obs_list, rew_list, term_list, trunc_list, infos), where observations and/or rewards may be normalized.