
DDPG, That’s why its extension TD3 tries to tackle that issue.TRPO or PPO make use of a trust region to minimize that problem by avoiding too large update.
Box: A N-dimensional box that contains every point in the action space.Discrete: A list of possible actions, where each timestep only one of the actions can be used.MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.Discrete, MultiDiscrete, Binary, MultiBinary
DQN : usually slower to train, but is the most sample efficient.PPOA2C