期刊文献+

基于SAC的多智能体深度强化学习算法 被引量:10

Deep Reinforcement Learning Algorithm of Multi-agent Based on SAC
下载PDF
导出
摘要 由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralized Execution(CTDE),对单智能体深度强化学习算法Soft Actor-Critic(SAC)进行了改进,引入智能体通信机制,构建Multi-Agent Soft Actor-Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验,有效减少了环境不稳定性对算法造成的影响.最后,本文在协同以及协同竞争混合的任务中,对MASAC算法性能进行了实验分析,结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性. Due to the dynamic change of multi-agent environment,and the decision of single agent will affect other agents,it is difficult for the deep reinforcement learning algorithm of single agent to maintain stability in multi-agent environment.In order to adapt to multi-agent environment,this paper uses centralized training and decentralized execution framework(CTDE)to improve single agent deep reinforcement learning algorithm soft actor-critic(SAC).By introducing agent communication mechanism,in multi-agent soft actor-critic(MASAC),agents share observation information and historical experience,which effectively reduces the impact of environmental instability on the algorithm.Finally,in the task of cooperation and cooperation and competition,the performance of MASAC algorithm is analyzed experimentally.The results show that MASAC has better stability than SAC in multi-agent environment.
作者 肖硕 黄珍珍 张国鹏 杨树松 江海峰 李天旭 XIAO Shuo;HUANG Zhen-zhen;ZHANG Guo-peng;YANG Shu-song;JIANG Hai-fei;LI Tian-xu(Engineering Research Center of Mine Digitalization,Ministry of Education,Xuzhou,Jiangsu 221000,China;School of Computer Sciences and Technology,China University of Mining&Technology,Xuzhou,Jiangsu 221000,China;Operating Branch,Ningbo Rail Transit Group Co.,LTD.,Ningbo,Zhejiang 315000,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2021年第9期1675-1681,共7页 Acta Electronica Sinica
基金 国家自然科学基金(No.62071470,No.U1934219,No.61971421) 徐州市科技计划项目(No.KC19011,No.KC20167)。
关键词 多智能体环境 集中训练 分散执行 多智能体深度强化学习 multi-agent environments centralized training decentralized execution multi-agent deep reinforcement learning
  • 相关文献

参考文献5

二级参考文献59

  • 1李晓山,周巢尘.时段演算综述[J].计算机学报,1994,17(11):842-851. 被引量:10
  • 2CLARKE E M, GRUMBERG O, PELED D. Model Checking[M]. Cambridge: The MIT Press, 2000.
  • 3EN 50129: 2002, railway applications-communication, signailing and processing systems-safety related electronic systems for signalling[S].
  • 4IEC 61508-1 : 1997, functional safety of electrical/electronic/ programmable electronic safety-related systems[S].
  • 5ZIMMERMANN A, HOMMEL G. A train control system case study in model-based real time system design[C] // IEEE. Proceedings of the 17th International Symposium on Parallel and Distributed Processing. Washington DC: IEEE, 2003:118-126.
  • 6ZIMMERMANN A, HOMMEL G. Towards modeling and evaluation of ETCS real-time communication and operation[J]. The Journal of Systems and Software, 2005, 77(1): 47-54.
  • 7MEYER R. Model checking von phasen event-automaten bezuglieh duration calculus formeln mittels testautomaten [D]. Oldenburg: Universittit Oldenburg, 2005.
  • 8Subset-052, radio transmission FFFIS for Euroradio[S].
  • 9ZHOU C C, HANSEN M R. Duration Calculus: a Formal Approach to Real-Time Systems [M]. Essex: Springer, 2004.
  • 10MEYER R, FABER J, HOENICKE J, et al. Model checking duration calculus: a practical approach[J]. Formal Aspects of Computing, 2008, 20(4/5): 481-505.

共引文献85

同被引文献84

引证文献10

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部