DDPG
, That’s why its extension TD3
tries to tackle that issue.TRPO
or PPO
make use of a trust region to minimize that problem by avoiding too large update.Box
: A N-dimensional box that contains every point in the action space.Discrete
: A list of possible actions, where each timestep only one of the actions can be used.MultiDiscrete
: A list of possible actions, where each timestep only one action of each discrete set can be used.MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.Discrete, MultiDiscrete, Binary, MultiBinary
DQN
: usually slower to train, but is the most sample efficient.PPO
A2C