Pacman¶
MASA provides four Pacman-style single-agent environments:
mini_pacmanpacmanmini_pacman_with_coinspacman_with_coins
All four environments model a maze with one controllable agent and one stochastic ghost. They share the same discrete action space and the same default safety signal, but they differ in observation representation and reward structure.
They also differ in how they expose dynamics for exact methods:
mini_pacmanandmini_pacman_with_coinsprovide a full transition matrix.pacmanandpacman_with_coinsprovide successor states and per-action transition probabilities.
Tabular Food-Collection Variants¶
mini_pacman and pacman expose the environment state as a single discrete state index.
Observation and State¶
The discrete state encodes:
agent position,
agent direction,
ghost position,
ghost direction,
a binary food flag.
The food flag indicates whether the unique food item is still available.
Rewards and Labels¶
The default label_fn returns:
{"food"}when the agent is on the food cell, the food is still present, and the ghost is not on the same cell,{"ghost"}when the agent and ghost occupy the same cell,the empty set otherwise.
Reward is sparse:
1.0when the agent collects the food,0.0otherwise.
Termination¶
Episodes terminate only when the agent reaches the environment’s fixed terminal location. They do not terminate on food collection or ghost collision.
Variant Sizes¶
mini_pacmanuses a compact7 x 10maze and a discrete state space of size9248.pacmanuses a larger15 x 19maze and a discrete state space of size262088.
The larger pacman environment is intended as a harder version of the same basic problem, with a substantially larger tabular state
space.
Coin-Collection Variants¶
mini_pacman_with_coins and pacman_with_coins keep the same maze-and-ghost dynamics but change both the observation structure and
the reward.
Observation Tensor¶
The observation is a tensor of shape (H, W, 9):
channel
0: remaining coins,channels
1:5: one-hot agent direction at the agent position,channels
5:9: one-hot ghost direction at the ghost position.
This gives a spatial observation while keeping the action space discrete.
Concretely, the observation shapes are:
mini_pacman_with_coins:(7, 10, 9)pacman_with_coins:(15, 19, 9)
Rewards¶
At reset, the coin array is initialized to 1.0 everywhere. In practice, only traversable cells can ever pay out reward. On each
step:
the agent receives reward equal to the current coin value at its cell,
the coin at that cell is then removed.
So each cell can contribute reward at most once per episode.
Labels and Costs¶
The default labelling function for the coin variants only marks ghost collisions:
{"ghost"}if the agent and ghost share a cell,the empty set otherwise.
There is no default label for coin collection.
Termination¶
As in the food-based variants, the episode terminates when the agent reaches a fixed terminal location, not when coins are exhausted.
Safety Abstraction Support¶
The coin variants also expose helper functions for probabilistic shielding:
safety_abstraction(obs)abstr_label_fn(abs_state)
safety_abstraction(obs) maps the structured observation back to a discrete abstract state. abstr_label_fn(abs_state) then labels
that abstract state using the same ghost-collision safety semantics while discarding reward-specific coin information.
This is the setup used in MASA’s shielding example for pacman_with_coins.