Vectorization and Normalization

This tutorial demonstrates MASA’s observation, vectorization, and normalization wrappers without training a policy.

Runnable notebook: notebooks/tutorials/11_vectorization_and_normalization.ipynb

Wrapper Map

Wrapper

Input

Output

Use it when

OneHotObsWrapper

Discrete observations, or Dict observations containing Discrete leaves

Box one-hot vectors

an algorithm expects vector observations

FlattenDictObsWrapper

Dict observations whose values are already Box spaces

one flat Box

a wrapped environment emits structured observation pieces

DummyVecWrapper

one environment

vector API with length-1 lists

code expects vectorized reset and step outputs

VecWrapper

a list of environments

synchronous batched list outputs

you want to step several environments together

NormWrapper

one Box-observation environment

normalized observations and/or rewards

you want running statistics for a single environment

VecNormWrapper

a DummyVecWrapper or VecWrapper with Box observations

vectorized normalized observations and/or rewards

you want running statistics across parallel environments

The notebook focuses on stable reset and step behavior. It does not cover reset_done.

One-Hot Discrete Observations

colour_grid_world emits a discrete state id:

env = make_env(
    "colour_grid_world",
    "cmdp",
    5,
    label_fn=colour_grid_label_fn,
    cost_fn=colour_grid_cost_fn,
    budget=0.0,
)

The raw observation space is Discrete(81). After OneHotObsWrapper, the observation is a Box(81,) vector with exactly one active entry.

This is the usual first step before feeding a tabular environment into wrappers or algorithms that expect vector observations.

Vector APIs

DummyVecWrapper gives one environment the vectorized interface:

vec_env = DummyVecWrapper(OneHotObsWrapper(make_colour_grid_env()))
obs, info = vec_env.reset(seed=0)

The reset and step results are lists of length 1.

VecWrapper steps multiple environments synchronously:

vec_env = VecWrapper(
    [OneHotObsWrapper(make_colour_grid_env()) for _ in range(2)]
)
obs, infos = vec_env.reset(seed=10)
obs, rewards, terminated, truncated, infos = vec_env.step([0, 1])

The result lists have one entry per environment.

Normalization

NormWrapper is for a single Box-observation environment. The tutorial uses cont_cartpole with pctl:

env = NormWrapper(
    make_cartpole_env(),
    norm_obs=True,
    norm_rew=False,
    training=True,
)

With norm_obs=True, the wrapper updates running observation statistics and returns normalized observations. With norm_rew=False, rewards stay in their original scale.

VecNormWrapper applies the same idea to a vectorized environment:

env = VecNormWrapper(
    VecWrapper([make_cartpole_env(), make_cartpole_env()]),
    norm_obs=True,
    norm_rew=False,
    training=True,
)

Reset returns a batched normalized observation array with shape (2, 4), while rewards and infos still have one entry per environment.

Flattening Dict Observations

ltl_safety environments can expose structured observations. For example, colour_bomb_grid_world with obs_type="dict" returns:

orig:       the original grid state
automaton: the DFA state

Those values are discrete, so the tutorial first applies OneHotObsWrapper. That turns the dict leaves into Box vectors:

orig       -> Box(81,)
automaton  -> Box(2,)

Then FlattenDictObsWrapper concatenates them into one flat Box(83,) observation.

The ordering matters: FlattenDictObsWrapper expects Box values at runtime, so discrete dict leaves should be one-hot encoded first.