引用本文: | 薛怡然,吴锐,刘家锋.组合动作空间深度强化学习的人群疏散引导方法[J].哈尔滨工业大学学报,2021,53(8):29.DOI:10.11918/202101029 |
| XUE Yiran,WU Rui,LIU Jiafeng.Crowd evacuation guidance based on combined action-space deep reinforcement learning[J].Journal of Harbin Institute of Technology,2021,53(8):29.DOI:10.11918/202101029 |
|
摘要: |
人群疏散引导系统可在建筑物内发生灾害时有效保护生命安全,减少人员财产损失。针对现有人群疏散引导系统需要人工设计模型和输入参数,工作量大且容易造成误差的问题,本文提出了基于深度强化学习的端到端智能疏散引导方法,设计了基于社会力模型的强化学习智能体仿真交互环境。使智能体可以仅以场景图像为输入,通过与仿真环境的交互和试错自主学习场景模型,探索路径规划策略,直接输出动态引导标志信息,指引人群有效疏散。针对强化学习深度Q网络(DQN)算法在人群疏散问题中因为动作空间维度较高,导致神经网络复杂度指数增长的“维度灾难”现象,本文提出了将Q网络输出层按动作维度分组的组合动作空间DQN算法,显著降低了网络结构复杂度,提高了系统在多个引导标志复杂场景中的实用性。在不同场景的仿真实验表明本文方法在逃生时间指标上优于静态引导方法,达到人工构造模型方法的相同水平。说明本文方法可以有效引导人群,提高疏散效率,同时降低人工构造模型的工作量并减小人为误差。 |
关键词: 神经网络 强化学习 疏散引导 人群仿真 深度Q网络 |
DOI:10.11918/202101029 |
分类号:TP183 |
文献标识码:A |
基金项目:国家自然科学基金(61672190) |
|
Crowd evacuation guidance based on combined action-space deep reinforcement learning |
XUE Yiran,WU Rui,LIU Jiafeng
|
(Pattern Recognition and Intelligent System Research Center (Harbin Institute of Technology), Harbin 150001, China)
|
Abstract: |
Crowd evacuation guidance systems are of great significance for protecting lives and reducing personal and property losses during disasters in buildings. Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring significant workloads and potential errors. An end-to-end intelligent evacuation guidance method based on deep reinforcement learning was proposed, and an interactive simulation environment based on the social force model was designed. The agent could automatically learn a scene model and explore the path planning strategy by interacting with simulation environment and through trial and error with only scene images as input, and then directly output dynamic signage information, thus achieving the crowd evacuation guidance efficiently. Aiming to solve the “dimension disaster” phenomenon of deep Q network (DQN) algorithm caused by high dimension action space and complex network structure in crowd evacuation, a combined action-space DQN algorithm was proposed. The algorithm grouped the output layer nodes of the Q network according to action dimensions, significantly reduced the network complexity, and improved the practicality of the system in complex scenes with multiple guidance signs. Experiments in different simulation scenes demonstrate that the proposed method is superior to the static guidance method in evacuation time and on par with the manually designed model method. It shows that the proposed method can effectively guide the crowd, improve the evacuation efficiency, and reduce the workload and artificial errors of manually designed models. |
Key words: neural network reinforcement learning evacuation guidance crowd simulation deep Q network (DQN) |