期刊文献⁺

任意字段

题名或关键词

题名

关键词

文摘

作者

第一作者

机构

刊名

分类号

参考文献

作者简介

基金资助

栏目信息

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments 被引量：1

原文传递

导出

摘要 Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.

作者 Yan Zheng Jian-Ye Hao Zong-Zhang Zhang Zhao-Peng Meng Xiao-Tian Hao

机构地区 College of Intelligence and Computing National Key Laboratory for Novel Software Technology

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第2期268-280,共13页 计算机科学技术学报（英文版）

基金 The work was supported by the National Natural Science Foundation of China under Grant Nos.61702362,U1836214,and 61876119 the Special Program of Artificial Intelligence of Tianjin Research Program of Application Foundation and Advanced Technology under Grant No.16JCQNJC00100 the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission of China under Grant No.56917ZXRGGX00150 the Science and Technology Program of Tianjin of China under Grant Nos.15PTCYSY00030 and 16ZXHLGX00170 the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20181432 Acknowledgments We thank our industrial re search partner Netease, Inc., especially the Fuxi AILaboratory of Leihuo Business Groups for their discus sion and support with the experiments.

关键词 deep REINFORCEMENT LEARNING MULTIAGENT system WEIGHTED double estimator LENIENT REINFORCEMENT LEARNING COOPERATIVE Markov game

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献2

1吴天栋,石英.不完美信息博弈中对手模型的研究[J].河南科技大学学报（自然科学版）,2019,40(1):54-59. 被引量：2
2罗俊仁,张万鹏,袁唯淋,胡振震,陈少飞,陈璟.面向多智能体博弈对抗的对手建模框架[J].系统仿真学报,2022,34(9):1941-1955. 被引量：8

引证文献1

1邓有朋,范佳宣,郑岩,王振亚,吕勇梁,李雨霄.不完全信息下多智能体对手建模[J].航空学报,2023,44(S02):443-452.

1Danan Gu,Runlong Huang,Kirill Andreev,Matthew E.Dupre,Yaer Zhuang,Hongyan Liu.Assessments of mortality at oldest-old ages by province in China's 2000 and 2010 censuses[J].International Journal of Population Studies,2016,2(2):1-25. 被引量：4
2Meijia Wang,Qingshan Li,Yishuai Lin.A Personalized Search Model Using Online Social Network Data Based on a Holonic Multiagent System[J].China Communications,2020,17(2):176-205. 被引量：2
3赵贵能.集中型馈线自动化在铁路通信中断时的故障处理[J].制造业自动化,2020,42(5):133-136.
4Stephen ANOKYE,Mohammed SEID,SUN Guolin.A Survey on Machine Learning Based Proactive Caching[J].ZTE Communications,2019,17(4):46-55. 被引量：2
5Jing Xiong,Fei Han.Positioning performance analysis on combined GPS/BDS precise point positioning[J].Geodesy and Geodynamics,2020,11(1):78-83. 被引量：7
6Qingmeng TAN,Yifei TONG,Shaofeng WU,Dongbo LI.Towards a next-generation production system for industrial robots: A CPS-based hybrid architecture for smart assembly shop floors with closed-loop dynamic cyber physical interactions[J].Frontiers of Mechanical Engineering,2020,15(1):1-11.
7史景坚,周文涛,张宁,陈桥,刘金涛,曹振博,陈懿,宋航,刘友波.含储能系统的配电网电压调节深度强化学习算法[J].电力建设,2020,41(3):71-78. 被引量：10
8Mohammed SEID,Stephen ANOKYE,SUN Guolin.Machine Learning Based Unmanned Aerial Vehicle Enabled Fog-Radio Access Network and Edge Computing[J].ZTE Communications,2019,17(4):33-45. 被引量：1
9Weizhong TIAN,Fengrong WEI,Thomas BROWN.Mixture network autoregressive model with application on students’successes[J].Frontiers of Mathematics in China,2020,15(1):141-154.
10Kuanqi CAI,Chaoqun WANG,Jiyu CHENG,Shuang SONG,Clarence W.DE SILVA,Max Q.-H.Meng.Mobile Robot Path Planningin Dynamic Environments:A Survey[J].Instrumentation,2019,6(2):90-100. 被引量：1

Journal of Computer Science & Technology

2020年第2期

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...

;

使用帮助返回顶部