基于双层DQN的多智能体路径规划

Multi-agent path planning based on improved double DQN

导出

摘要目的随着虚拟现实技术的发展,在虚拟场景中,基于多智能体的逃生路径规划已成为关键技术之一。与传统的火灾演习相比,采用基于虚拟现实的方法完成火灾逃生演练具有诸多优势,如成本低、代价小、可靠性高等,但仍有一定的局限性,为此,提出一种改进的双层深度Q网络(deep Q network,DQN)架构的路径规划算法。方法基于两个结构相同的双Q网络,优化了经验池的生成方法和探索策略,并在奖励中增加火灾这样的环境因素对智能体的影响。同时,为了提高疏散的安全性和效率,提出了一种基于改进的K-medoids算法的多智能体分组策略方法。结果相关实验表明提出的改进的双层深度Q网络架构收敛速度更快,学习更加稳定,模型性能得到有效提升。综合考虑火灾场景下智能体的疏散效率和疏散安全性,使用指标平均健康疏散值(average health evacuation value,AHEP)评估疏散效果,相较于传统的路径规划方法A-STAR(a star search algorithm)和DIJKSTRA(Dijkstra’s algorithm)分别提高了84%和104%;与基于火灾场景改进的扩展A-STAR和Dijkstra-ACO(Dijkstra and ant colony optimization)混合算法比较,分别提高了30%和21%;与考虑火灾影响的DQN算法相比,提高了20%,疏散效率和安全性都得到提高,规划的路径疏散效果更好。通过比较不同分组模式下的疏散效果,验证了对多智能体合适分组可以提高智能体疏散效率。结论提出的算法优于目前大多数常用的方法,显著提高了疏散的效率和安全性。 Objective Rescue-oriented evacuation drills like fire escape drills have often been structured to optimize rehearsal training effect and firefighting awareness.To get sufficient evacuation experience,multiple drills are costly for related organizers.The requirement of that is based on evacuation drills,emergency drill venue,the physical condition of participants,and position information in real-time.The emerging virtual reality technology can be used to guide virtual fire escape in relevance to lower cost and risk and higher reliability.Moreover,to simulate its emergency drills in virtual scenarios,multi-agent path planning has been recognized and developed nowadays.Method We develop an improved double deep Q network(DQN)framework.Specifically,this virtual scenario analysis is developed through collecting enough campus information,including multiple agents,obstacles,exits,fire affected areas,and other related factors.Since all agents are assumed on the same plane,we can convert them into two-dimensional grid diagrams via transformation gridding and coordination.Furthermore,different grids are colored and utilized in two-dimensional grid plane m to represent obstacles,fire affected areas,exits and locations of agents.According to the location of the agent in the virtual scene,the grid plane m is layered,and the grid plane m1 and the grid plane m2 can be obtained in terms of the sizes of 64×100 and 48×100 of each.In the double deep Q network,we use two double Q networks with the same structure,i.e.,Q1 and Q2,which consists of two category of convolution and full connection layers.Furthermore,input size can be interlinked to the grid planes with the same size as m1 and m2 after environmental stratification.For the grid planes with the same size as m1 and m2,trainable grid planes m'1t and m'2t can be obtained by randomly assigning the same number of black blocks with size of 1×1 to represent the duplicable location of the obstacle,and generating planes corresponding to all different starting positions to represent all status of the agent in the scene,which are used to initialize experience pools D1 and D2 and train networks Q1 and Q2.For the actual evacuation drills,the evacuation of the crowd is not completely independent and discrete.Nevertheless,due to the sociality of people,there is a certain social relationship between the people involved in evacuation,and there is often a certain phenomenon of“gathering and following”in crowd evacuation.In addition,to achieve the evacuation process of the crowd better in an actual evacuation drill,the organizer often arrange a certain number of guiders at different locations to assist the participants to complete the process of evacuation.Hence,our framework can add this guide into the virtual scenario and an improved k-medoids algorithm based multi-agent grouping strategy method is implemented.Agent-based location and relationship are involved in and the related grouping of the agents are accomplished as well,i.e.,the selection of corresponding guiding agents,and the evacuation-led of other agents in the group,and the improved path planning algorithm of double deep Q network architecture mentioned above.A reliability and efficiency of evacuation are improved further.Result Extensive experiment is carried out to validate our proposed methods.In the training process,the network Q3 of the traditional DQN method converge 24000 batch sizes,while the Q1 and Q2 networks converge about 3000 batch size as well.In detail,it demonstrates that the convergence performance of proposed method is significantly faster than the traditional DQN method and more stable.Additionally,to improve the evacuation efficiency and evacuation safety of the agent in fire scenarios,average health evacuation value(AHEP)is used to evaluate the evacuation effect.In AHEP criterion,it is about 84%and 104%higher than each traditional path planning methods of A-STAR,DIJKSTRA.Compared to the extended A-STAR and Dijkstra-ACO hybrid algorithm based on changeable fire scene,hybrid algorithm can be improved by 30%and 21%;Compared to DQN algorithm,it can be reached 20%higher.What is more,evacuation efficiency and safety are improved more,and evacuation effect of the planned path is much better.Furthermore,to verify the evacuation effect under different groups,we compared the AHEP values under the four groups of 4,5,6 and 7.When the group is 6,its value is the highest,which is 17%,13%and 6%higher than those three cases of 4,5 and 7.Finally,the results show that the appropriate grouping of multi-agent can improve the evacuation efficiency of agent.Conclusion The proposed method has its potentials to improve the evacuation efficiency and security to a certain extent.

作者张晨蒋文英陈思源周文闫丰亭 Zhang Chen;Jiang Wenying;Chen Siyuan;Zhou Wen;Yan Fengting(School of Computer and Information,Anhui Normal University,Wuhu 241000,China;School of Electronic and Electrical Engineering,Shanghai University of Engineering and Technology,Shanghai 201620,China)

机构地区安徽师范大学计算机与信息学院上海工程技术大学电子电气工程学院

出处《中国图象图形学报》 CSCD 北大核心 2023年第7期2167-2181,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(61902003)。

关键词虚拟现实火灾逃生演练多智能体深度强化学习分组策略 virtual reality fire drill multi-agent deep reinforcement learning grouping strategy

分类号 TP391.9 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1艾子豪,胡永豪,闫丰亭,张惠娟,王冬青,青胜蓝,朱合华,贾金原.轻量级Web3D地铁火灾逃生在线规划关键技术[J].中国科学：信息科学,2019,49(4):405-421. 被引量：8
2曹祥红,李欣妍,魏晓鸽,李森,黄梦溪,李栋禄.基于Dijkstra-ACO混合算法的应急疏散路径动态规划[J].电子与信息学报,2020,42(6):1502-1509. 被引量：29
3程鹏举,吴楠,孟凡坤,李爽.扩展A*算法的火灾逃生路径规划研究[J].通信技术,2020,53(12):3012-3016. 被引量：3
4韩延彬,刘弘.一种基于疏散路径集合的路径选择模型在人群疏散仿真中的应用研究[J].计算机学报,2018,41(12):2653-2669. 被引量：9
5靳海亮,王赢乐,袁鸣,陈梦龙.改进A~*的高层建筑逃生路径规划算法研究[J].测绘通报,2019(11):17-21. 被引量：8

二级参考文献41

1张银玲,牛小梅.蚁群算法在移动机器人路径规划中的仿真研究[J].计算机仿真,2011,28(6):231-234. 被引量：35
2魏唯,欧阳丹彤,吕帅,冯宇轩.动态不确定环境下多目标路径规划方法[J].计算机学报,2011,34(5):836-846. 被引量：26
3十一届全国人大常委会第二十一次会议举行第二次全体会议听取关于消防工作情况的报告[J].中国消防,2011(13):4-5. 被引量：1
4孙绪彬,董海荣,宁滨,高童欣,孔庆杰.基于ACP方法的应急疏散系统研究[J].自动化学报,2014,40(1):16-23. 被引量：19
5胡玉玲,王飞跃,刘希未.基于ACP方法的高层建筑火灾中人员疏散策略研究[J].自动化学报,2014,40(2):185-196. 被引量：32
6申晶晶,王欣捷,粆倩文,金小刚.基于样本的大规模人群快速创作[J].计算机学报,2014,37(3):621-631. 被引量：4
7苗志宏,李智慧.一种基于SPH方法的人员疏散混合模型及模拟[J].自动化学报,2014,40(5):935-941. 被引量：9
8陈豪,李勇,罗靖迪.基于改进A*算法优化的移动机器人路径规划研究[J].自动化与仪器仪表,2018,0(12):1-4. 被引量：25
9傅智敏.我国火灾统计数据分析[J].安全与环境学报,2014,14(6):341-345. 被引量：52
10苏磊,江辉仙.楼宇内部路径规划算法研究及其应用综述[J].测绘与空间地理信息,2014,37(10):105-109. 被引量：3

共引文献51

1吕林森,谷溢,闫明柯.城市雨洪灾害疏散模型综述研究[J].工业建筑,2023,53(S02):81-84.
2刘弘.基于群体智能的人群疏散路径规划仿真研究[J].山东师范大学学报（自然科学版）,2019,34(4):393-401. 被引量：2
3傅白白,高歌,李树彬.基于用户均衡的无车区行人疏散路径最优选择[J].山东建筑大学学报,2020,35(2):17-22.
4章菊,李学鋆,王新宇.一种改进的智能物流车模糊路径规划算法[J].装备制造技术,2020(3):30-33. 被引量：6
5张惠娟,郭欣琪,王冬青,贾金原.基于DR预测的大规模Web3D场景预加载机制[J].系统仿真学报,2020,32(7):1341-1348. 被引量：3
6黄艳,李昌文,李安强,王强,朱思蓉.超标准洪水应急避险决策支持技术研究[J].水利学报,2020,51(7):805-815. 被引量：26
7程鹏举,吴楠,孟凡坤,李爽.扩展A*算法的火灾逃生路径规划研究[J].通信技术,2020,53(12):3012-3016. 被引量：3
8尹绪雨,顾登明.核岛3D逃生演练系统的设计与实现[J].微型电脑应用,2020,36(12):12-15. 被引量：1
9刘二根,谭茹涵,陈艺琳,郭力.基于改进人工蚁群的智能巡线机器人路径规划[J].华东交通大学学报,2020,37(6):103-107. 被引量：5
10槐创锋,郭龙,贾雪艳,张子昊.改进A*算法与动态窗口法的机器人动态路径规划[J].计算机工程与应用,2021,57(8):244-248. 被引量：46

1雷鸣,杨民,高复阳,王丹丹,张继承,秦子豪,张维.基于Pyrosim和Pathfinder的高校实验楼火灾疏散安全性分析与优化[J].安全与环境工程,2023,30(3):36-44. 被引量：7
2杨晓霞,蒋海龙,李永行,潘福全,杨金顺.地铁站乘客沿楼梯上行疏散时间预测及安全性评估[J].中国安全科学学报,2023,33(5):168-173.
3Meiqin Tang,Jiawen Sheng,Shaoyan Sun.A Coverage Optimization Algorithm for Underwater Acoustic Sensor Networks based on Dijkstra Method[J].IEEE/CAA Journal of Automatica Sinica,2023,10(8):1769-1771. 被引量：2
4刘晓然,张略淼,甄纪亮,王威.考虑场所吸引度的片区震后固定避难空间优化[J].自然灾害学报,2023,32(4):139-147. 被引量：1
5王爽,欧阳泽,王祺,马文源,周帝宏.一种基于可编程逻辑控制器和人机界面的真火消防模拟控制系统[J].电气技术,2023,24(8):56-60. 被引量：1
6太空中的“慧眼”——OroraTech森林火灾智能服务系统[J].今日消防,2023,8(6):106-106.
7李晨旭,张振波,孙孟羽,李子晗.高校多层宿舍人员疏散时间影响因素研究[J].技术与市场,2023,30(8):84-89. 被引量：1
8张艺潇,张巨银,王学平.受火温度对压力容器用JFE-HITEN610U2L力学性能影响研究[J].建材技术与应用,2023(4):39-42.
9杨智雯,李晓泉,钟远焜,刘晓妍.SAM-FFTA-ANP在高校实验室中的应用[J].广西大学学报（自然科学版）,2023,48(3):743-753. 被引量：3
10王小龙,刘亨洋,张天毅,郑列辉,魏筱舟,李小东,任勇.基于激光雷达的无人机路径规划研究[J].机械与电子,2023,41(8):56-59. 被引量：2

中国图象图形学报

2023年第7期

浏览历史

内容加载中请稍等...

基于双层DQN的多智能体路径规划

参考文献5

二级参考文献41

共引文献51

相关作者

相关机构

相关主题

浏览历史