Cartpole¶
MASA provides two single-agent Cartpole environments:
disc_cartpole: discrete actions.cont_cartpole: continuous actions.
Both variants implement the same dynamics and state representation. They differ only in how the control input is specified.
State, Labels, and Costs¶
The observation is a 4-dimensional vector
[x, x_dot, theta, theta_dot]
where x is the cart position and theta is the pole angle. Both environments use the same default labelling and cost functions:
label_fn(obs)returns{"stable"}exactly when|theta| <= 0.2095and|x| <= 2.4.cost_fn(labels)returns0.0when"stable"is present and1.0otherwise.
This makes Cartpole a useful benchmark for state-based safety specifications: the reward encourages long survival, while the cost marks transitions that leave the stable region.
Dynamics¶
The implementation follows the standard Cartpole equations of motion with:
gravity
9.8pole length
0.5force magnitude
10.0time step
0.02Euler integration
At reset, the state is sampled uniformly from [-0.05, 0.05]^4.
Each step returns reward 1.0. The environment terminates as soon as the state leaves the stable set described above. It does not
internally truncate episodes, so practical runs usually apply TimeLimit on top.
Variant Differences¶
disc_cartpole¶
Action space:
Discrete(2).Action
0applies force-10.0.Action
1applies force+10.0.
This is the most direct fit for discrete-action safe RL methods.
cont_cartpole¶
Action space:
Box(low=-1.0, high=1.0, shape=(1,)).The scalar action is scaled by the force magnitude, so the applied force is
10.0 * action[0].
This variant is useful for continuous-control algorithms such as PPO or A2C with continuous policies.
Practical Use¶
Both environments are registered in MASA and can be created through make_env:
from masa.common.utils import make_env
from masa.envs.discrete.cartpole import label_fn, cost_fn
env = make_env(
"disc_cartpole",
"cmdp",
500,
label_fn=label_fn,
cost_fn=cost_fn,
budget=0.0,
)
For the continuous variant, replace the environment id and import path with cont_cartpole and
masa.envs.continuous.cartpole.