Smooth reinforcement learning-based trajectory tracking for articulated vehicles

doi:10.11918/202310026

Home > Archive>Volume 56, Issue 12, 2024 >116-123. DOI:10.11918/202310026

Smooth reinforcement learning-based trajectory tracking for articulated vehicles
DOI:
                        10.11918/202310026
                    
CSTR:
                        
Author:
                        
Affiliation:(1.School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, China; 2.School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China; 3.School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China)
Clc Number:TP273+.1
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

This research tackles the challenge of action fluctuation in articulated vehicle trajectory tracking control, aiming to enhance both accuracy and smoothness. It introduces a novel approach: a smooth tracking control methodology grounded in reinforcement learning (RL). Firstly, to improve the control accuracy, we incorporate trajectory preview information as input to both the policy and value networks and establish a predictive policy iteration framework. Then, to ensure control smoothness, we employ the LipsNet network to approximate the policy function, to realize the adaptive restriction of the Lipschitz constant of the policy network. Finally, coupled with distributional RL theory, we formulate an articulated vehicle trajectory tracking control method, named smooth distributional soft actor-critic (SDSAC), focusing on achieving synergistic optimization of both control precision and action smoothness. The simulation results demonstrate that the proposed method can maintain good action smoothing ability under six different noise levels, and has strong noise robustness and high tracking accuracy. Compared with traditional value distribution reinforcement learning distributional soft actor-critic (DSAC), SDSAC improves action smoothness by more than 5.8 times under high noise conditions. In addition, compared with model predictive control, SDSAC’s average single-step solution speed is improved by about 60 times, and it has higher online computing efficiency.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 14,2023
Revised:
Adopted:
Online: December 24,2024
Published:

Publication Statement

Journal Subscription

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code