Vectorized Envs¶
API Reference¶
- class masa.common.wrappers.VecEnvWrapperBase(env: Env)[source]¶
Bases:
ConstraintPersistentWrapperBase class for simple Python-list vector environment wrappers.
Vector environments in this file expose:
n_envs: number of parallel environmentsreset(): returns(obs_list, info_list)step(): returns(obs_list, rew_list, term_list, trunc_list, info_list)reset_done(): reset only environments indicated by adonesmask
- Parameters:
env – For
DummyVecWrapper, this is the single underlying env. ForVecWrapper, this is set toenvs[0]to preserve a Gymnasium-like API surface.- Variables:
n_envs (int) – Number of environments.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset only the environments indicated by
dones.- Parameters:
dones – Boolean mask/list of length
n_envs. Entries set toTrueare reset.seed – Optional base seed. Implementations may offset by environment index.
options – Reset options forwarded to underlying environments.
- Returns:
reset_obsis a list of lengthn_envscontaining reset observations at indices that were reset, andNoneelsewhere.reset_infosis a list of lengthn_envscontaining reset info dicts at indices that were reset, and empty dicts elsewhere.
- Return type:
A tuple
(reset_obs, reset_infos)where- Raises:
NotImplementedError – If not implemented by a subclass.
- class masa.common.wrappers.DummyVecWrapper(env: Env)[source]¶
Bases:
VecEnvWrapperBaseWrap a single environment with a vector-environment API (
n_envs=1).This wrapper is useful for code paths that expect list-based vector outputs, while still running a single environment instance.
- Parameters:
env – Base environment.
- Variables:
n_envs – Always
1.envs – List containing the single wrapped environment.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset and return vectorized lists of length 1.
- Parameters:
seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.
- Returns:
([obs], [info]).
- reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Conditionally reset the single environment.
- Parameters:
dones – A length-1 mask. If
dones[0]isTrue, reset.seed – Random seed forwarded to the underlying environment.
options – Reset options forwarded to the underlying environment.
- Returns:
A pair
(reset_obs, reset_infos)as described byVecEnvWrapperBase.reset_done().
- class masa.common.wrappers.VecWrapper(envs: List[gym.Env])[source]¶
Bases:
VecEnvWrapperBaseWrap a list of environments with a simple vector-environment API.
Each underlying environment is reset/stepped sequentially in Python, and results are returned as Python lists.
- Parameters:
envs – Non-empty list of environments.
- Variables:
envs – The list of wrapped environments.
n_envs – Number of wrapped environments.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset all environments and return lists.
- Parameters:
seed – Optional base seed. If provided, environment
ireceivesseed + i.options – Reset options forwarded to each environment.
- Returns:
A pair
(obs_list, info_list)of lengthn_envs.
- reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset only environments whose done flag is True.
- Parameters:
dones – Boolean mask/list of length
n_envs.seed – Optional base seed. If provided, environment
ireceivesseed + i.options – Reset options forwarded to environments being reset.
- Returns:
A tuple
(reset_obs, reset_infos)where non-reset indices containNoneand{}respectively.
- step(action)[source]¶
Step all environments.
- Parameters:
actions – Iterable of actions of length
n_envs.- Returns:
A 5-tuple of lists
(obs_list, rew_list, term_list, trunc_list, info_list).
Notes
The loop expects one action per environment. If the provided
actionslength mismatchesn_envs, Python will raise.
- class masa.common.wrappers.VecNormWrapper(env: gym.Env | List[gym.Env], norm_obs: bool = True, norm_rew: bool = True, training: bool = True, clip_obs: float = 10.0, clip_rew: float = 10.0, gamma: float = 0.99, eps: float = 1e-8)[source]¶
Bases:
VecEnvWrapperBaseNormalize observations and/or rewards for a vectorized environment.
This wrapper expects an environment implementing
VecEnvWrapperBase(e.g.,DummyVecWrapperorVecWrapper) and applies the same normalization logic asNormWrapper, but over batches.Observation normalization uses running statistics of the stacked observation array (shape
(n_envs, *obs_shape)). Reward normalization uses running statistics of discounted returns per environment.- Parameters:
env – A vectorized environment implementing
VecEnvWrapperBase.norm_obs – Whether to normalize observations.
norm_rew – Whether to normalize rewards.
training – If
True, update running statistics; otherwise, statistics are frozen.clip_obs – Clip normalized observations to
[-clip_obs, clip_obs].clip_rew – Clip normalized rewards to
[-clip_rew, clip_rew].gamma – Discount factor for the running return used in reward normalization.
eps – Small constant \(\varepsilon\) for numerical stability.
- Variables:
n_envs – Copied from the wrapped vector environment.
obs_rms –
masa.common.running_mean_std.RunningMeanStdfor observations.rew_rms –
masa.common.running_mean_std.RunningMeanStdfor returns.returns – Vector of length
n_envsstoring per-env discounted returns.
Wraps an environment to allow a modular transformation of the
step()andreset()methods.- Parameters:
env – The environment to wrap
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset all environments and normalize observations.
- Parameters:
seed – Optional base seed forwarded to the underlying vector env.
options – Reset options forwarded to the underlying vector env.
- Returns:
A pair
(obs_list, info_list). Observations may be normalized.
- reset_done(dones: List[bool] | np.ndarray, *, seed: int | None = None, options: Dict[str, Any] | None = None)[source]¶
Reset only environments indicated by
donesand normalize those observations.- Parameters:
dones – Boolean mask/list of length
n_envs.seed – Optional base seed forwarded to the underlying vector env.
options – Reset options forwarded to the underlying vector env.
- Returns:
A tuple
(reset_obs, reset_infos)as described byVecEnvWrapperBase.reset_done(), with reset observations optionally normalized.