依托平滑强化学习的铰接车轨迹跟踪方法
CSTR:
作者:
作者单位:

(1.北京科技大学 机械工程学院,北京 100083; 2. 清华大学 车辆与运载学院,北京 100084;3.北京理工大学 机械与车辆学院,北京 100081)

作者简介:

陈良发(1999―),男,硕士研究生; 李升波(1982―),男,长聘教授,博士生导师;马飞(1968―),男,教授,博士生导师

通讯作者:

段京良, duanjl@ustb.edu.cn

中图分类号:

TP273+.1

基金项目:

国家自然科学基金(52202487); 汽车安全与节能国家重点实验室开放基金(KF2212)


Smooth reinforcement learning-based trajectory tracking for articulated vehicles
Author:
Affiliation:

(1.School of Mechanical Engineering, University of Science and Technology Beijing, Beijing 100083, China; 2.School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China; 3.School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决现有铰接车轨迹跟踪控制面临的动作波动问题,提高铰接车轨迹跟踪控制的精度以及平滑性,提出了一种考虑轨迹预瞄的平滑强化学习型跟踪控制方法。首先,为保证控制精度,通过将参考轨迹信息作为预瞄信息引入强化学习策略网络和值网络,构建了预瞄型强化学习迭代框架。然后,为保证控制平滑性,引入LipsNet网络结构近似策略函数,从而实现策略网络Lipschitz常数的自适应限制。最后,结合值分布强化学习理论,建立了最终的平滑强化学习型轨迹跟踪控制方法,实现了铰接车轨迹跟踪的控制精度和控制平滑性的协同优化。仿真结果表明,本研究提出的平滑强化学习跟踪控制方法(SDSAC)在6种不同噪声等级下均能保持良好的动作平滑能力,且具备较高跟踪精度;与传统值分布强化学习(DSAC)相比,在高噪声条件下,SDSAC动作平滑度提升超过5.8倍。此外,与模型预测控制相比,SDSAC的平均单步求解速度提升约60倍,具有较高的在线计算效率。

    Abstract:

    This research tackles the challenge of action fluctuation in articulated vehicle trajectory tracking control, aiming to enhance both accuracy and smoothness. It introduces a novel approach: a smooth tracking control methodology grounded in reinforcement learning (RL). Firstly, to improve the control accuracy, we incorporate trajectory preview information as input to both the policy and value networks and establish a predictive policy iteration framework. Then, to ensure control smoothness, we employ the LipsNet network to approximate the policy function, to realize the adaptive restriction of the Lipschitz constant of the policy network. Finally, coupled with distributional RL theory, we formulate an articulated vehicle trajectory tracking control method, named smooth distributional soft actor-critic (SDSAC), focusing on achieving synergistic optimization of both control precision and action smoothness. The simulation results demonstrate that the proposed method can maintain good action smoothing ability under six different noise levels, and has strong noise robustness and high tracking accuracy. Compared with traditional value distribution reinforcement learning distributional soft actor-critic (DSAC), SDSAC improves action smoothness by more than 5.8 times under high noise conditions. In addition, compared with model predictive control, SDSAC’s average single-step solution speed is improved by about 60 times, and it has higher online computing efficiency.

    参考文献
    相似文献
    引证文献
引用本文

陈良发,宋绪杰,肖礼明,高路路,张发旺,李升波,马飞,段京良.依托平滑强化学习的铰接车轨迹跟踪方法[J].哈尔滨工业大学学报,2024,56(12):116. DOI:10.11918/202310026

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-10-14
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-12-24
  • 出版日期:
文章二维码