引用本文: | 陈良发,宋绪杰,肖礼明,高路路,张发旺,李升波,马飞,段京良.依托平滑强化学习的铰接车轨迹跟踪方法[J].哈尔滨工业大学学报,2024,56(12):116.DOI:10.11918/202310026 |
| CHEN Liangfa,SONG Xujie,XIAO Liming,GAO Lulu,ZHANG Fawang,LI Shengbo,MA Fei,DUAN Jingliang.Smooth reinforcement learning-based trajectory tracking for articulated vehicles[J].Journal of Harbin Institute of Technology,2024,56(12):116.DOI:10.11918/202310026 |
|
|
|
本文已被:浏览 73次 下载 66次 |
码上扫一扫! |
|
依托平滑强化学习的铰接车轨迹跟踪方法 |
陈良发1,宋绪杰2,肖礼明1,高路路1,张发旺3,李升波2,马飞1,段京良1
|
(1.北京科技大学 机械工程学院,北京 100083; 2. 清华大学 车辆与运载学院,北京 100084;3.北京理工大学 机械与车辆学院,北京 100081)
|
|
摘要: |
为解决现有铰接车轨迹跟踪控制面临的动作波动问题,提高铰接车轨迹跟踪控制的精度以及平滑性,提出了一种考虑轨迹预瞄的平滑强化学习型跟踪控制方法。首先,为保证控制精度,通过将参考轨迹信息作为预瞄信息引入强化学习策略网络和值网络,构建了预瞄型强化学习迭代框架。然后,为保证控制平滑性,引入LipsNet网络结构近似策略函数,从而实现策略网络Lipschitz常数的自适应限制。最后,结合值分布强化学习理论,建立了最终的平滑强化学习型轨迹跟踪控制方法,实现了铰接车轨迹跟踪的控制精度和控制平滑性的协同优化。仿真结果表明,本研究提出的平滑强化学习跟踪控制方法(SDSAC)在6种不同噪声等级下均能保持良好的动作平滑能力,且具备较高跟踪精度;与传统值分布强化学习(DSAC)相比,在高噪声条件下,SDSAC动作平滑度提升超过5.8倍。此外,与模型预测控制相比,SDSAC的平均单步求解速度提升约60倍,具有较高的在线计算效率。 |
关键词: 自动驾驶 铰接车 轨迹跟踪 强化学习 动作平滑 |
DOI:10.11918/202310026 |
分类号:TP273+.1 |
基金项目:国家自然科学基金(52202487); 汽车安全与节能国家重点实验室开放基金(KF2212) |
|
Smooth reinforcement learning-based trajectory tracking for articulated vehicles |
CHEN Liangfa1,SONG Xujie2,XIAO Liming1,GAO Lulu1,ZHANG Fawang3,LI Shengbo2,MA Fei1,DUAN Jingliang1
|
(1.School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, China; 2.School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China; 3.School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China)
|
Abstract: |
This research tackles the challenge of action fluctuation in articulated vehicle trajectory tracking control, aiming to enhance both accuracy and smoothness. It introduces a novel approach: a smooth tracking control methodology grounded in reinforcement learning (RL). Firstly, to improve the control accuracy, we incorporate trajectory preview information as input to both the policy and value networks and establish a predictive policy iteration framework. Then, to ensure control smoothness, we employ the LipsNet network to approximate the policy function, to realize the adaptive restriction of the Lipschitz constant of the policy network. Finally, coupled with distributional RL theory, we formulate an articulated vehicle trajectory tracking control method, named smooth distributional soft actor-critic (SDSAC), focusing on achieving synergistic optimization of both control precision and action smoothness. The simulation results demonstrate that the proposed method can maintain good action smoothing ability under six different noise levels, and has strong noise robustness and high tracking accuracy. Compared with traditional value distribution reinforcement learning distributional soft actor-critic (DSAC), SDSAC improves action smoothness by more than 5.8 times under high noise conditions. In addition, compared with model predictive control, SDSAC’s average single-step solution speed is improved by about 60 times, and it has higher online computing efficiency. |
Key words: automatic drive articulated vehicle trajectory tracking reinforcement learning action smoothing |
|
|
|
|