组合动作空间深度强化学习的人群疏散引导方法
CSTR:
作者:
作者单位:

(模式识别与智能系统研究中心(哈尔滨工业大学),哈尔滨 150001)

作者简介:

薛怡然(1992—),男,博士研究生

通讯作者:

吴锐,simple@hit.edu.cn

中图分类号:

TP183

基金项目:

国家自然科学基金(61672190)


Crowd evacuation guidance based on combined action-space deep reinforcement learning
Author:
Affiliation:

(Pattern Recognition and Intelligent System Research Center (Harbin Institute of Technology), Harbin 150001, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    人群疏散引导系统可在建筑物内发生灾害时有效保护生命安全,减少人员财产损失。针对现有人群疏散引导系统需要人工设计模型和输入参数,工作量大且容易造成误差的问题,本文提出了基于深度强化学习的端到端智能疏散引导方法,设计了基于社会力模型的强化学习智能体仿真交互环境。使智能体可以仅以场景图像为输入,通过与仿真环境的交互和试错自主学习场景模型,探索路径规划策略,直接输出动态引导标志信息,指引人群有效疏散。针对强化学习深度Q网络(DQN)算法在人群疏散问题中因为动作空间维度较高,导致神经网络复杂度指数增长的“维度灾难”现象,本文提出了将Q网络输出层按动作维度分组的组合动作空间DQN算法,显著降低了网络结构复杂度,提高了系统在多个引导标志复杂场景中的实用性。在不同场景的仿真实验表明本文方法在逃生时间指标上优于静态引导方法,达到人工构造模型方法的相同水平。说明本文方法可以有效引导人群,提高疏散效率,同时降低人工构造模型的工作量并减小人为误差。

    Abstract:

    Crowd evacuation guidance systems are of great significance for protecting lives and reducing personal and property losses during disasters in buildings. Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring significant workloads and potential errors. An end-to-end intelligent evacuation guidance method based on deep reinforcement learning was proposed, and an interactive simulation environment based on the social force model was designed. The agent could automatically learn a scene model and explore the path planning strategy by interacting with simulation environment and through trial and error with only scene images as input, and then directly output dynamic signage information, thus achieving the crowd evacuation guidance efficiently. Aiming to solve the “dimension disaster” phenomenon of deep Q network (DQN) algorithm caused by high dimension action space and complex network structure in crowd evacuation, a combined action-space DQN algorithm was proposed. The algorithm grouped the output layer nodes of the Q network according to action dimensions, significantly reduced the network complexity, and improved the practicality of the system in complex scenes with multiple guidance signs. Experiments in different simulation scenes demonstrate that the proposed method is superior to the static guidance method in evacuation time and on par with the manually designed model method. It shows that the proposed method can effectively guide the crowd, improve the evacuation efficiency, and reduce the workload and artificial errors of manually designed models.

    参考文献
    相似文献
    引证文献
引用本文

薛怡然,吴锐,刘家锋.组合动作空间深度强化学习的人群疏散引导方法[J].哈尔滨工业大学学报,2021,53(8):29. DOI:10.11918/202101029

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-01-10
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-08-10
  • 出版日期:
文章二维码