Abstract:This research tackles the challenge of action fluctuation in articulated vehicle trajectory tracking control, aiming to enhance both accuracy and smoothness. It introduces a novel approach: a smooth tracking control methodology grounded in reinforcement learning (RL). Firstly, to improve the control accuracy, we incorporate trajectory preview information as input to both the policy and value networks and establish a predictive policy iteration framework. Then, to ensure control smoothness, we employ the LipsNet network to approximate the policy function, to realize the adaptive restriction of the Lipschitz constant of the policy network. Finally, coupled with distributional RL theory, we formulate an articulated vehicle trajectory tracking control method, named smooth distributional soft actor-critic (SDSAC), focusing on achieving synergistic optimization of both control precision and action smoothness. The simulation results demonstrate that the proposed method can maintain good action smoothing ability under six different noise levels, and has strong noise robustness and high tracking accuracy. Compared with traditional value distribution reinforcement learning distributional soft actor-critic (DSAC), SDSAC improves action smoothness by more than 5.8 times under high noise conditions. In addition, compared with model predictive control, SDSAC’s average single-step solution speed is improved by about 60 times, and it has higher online computing efficiency.