期刊文献+

基于平均神经网络参数的DQN算法 被引量:2

DQN Algorithm Based on Averaged Neural Network Parameters
下载PDF
导出
摘要 在深度强化学习领域,如何有效地探索环境是一个难题。深度Q网络(Deep Q-Network,DQN)使用ε-贪婪策略来探索环境,ε的大小和衰减需要人工进行调节,而调节不当会导致性能变差。这种探索策略不够高效,不能有效解决深度探索问题。针对DQN的ε-贪婪策略探索效率不够高的问题,提出一种基于平均神经网络参数的DQN算法(Averaged Parameters DQN,AP-DQN)。该算法在回合开始时,将智能体之前学习到的多个在线值网络参数进行平均,得到一个扰动神经网络参数,然后通过扰动神经网络进行动作选择,从而提高智能体的探索效率。实验结果表明,AP-DQN算法在面对深度探索问题时的探索效率优于DQN,在5个Atari游戏环境中相比DQN获得了更高的平均每回合奖励,归一化后的得分相比DQN最多提升了112.50%,最少提升了19.07%。 In the field of deep reinforcement learning,how to efficiently explore environment is a hard problem.Deep Q-network algorithm explores environment with epsilon-greedy policy whose size and decay need manual tuning.Unsuitable tuning will cause a poor performance.The epsilon-greedy policy is ineffective and cannot resolve deep exploration problem.In this paper,in order to solve the problem,a deep reinforcement learning algorithm based on averaged neural network parameters(AP-DQN)is proposed.At the beginning of episode,the algorithm averages the multiple online network parameters learned by the agent to obtain a perturbed neural network parameter,and then selects an action through the perturbed neural network,which can improve the agent’s exploration efficiency.Experiment results show that the exploration efficiency of AP-DQN is better than that of DQN on deep exploration problem and AP-DQN get higher scores than DQN in five Atari games.The normalized score increases by 112.50%at most and 19.07%at least compared with DQN.
作者 黄志勇 吴昊霖 王壮 李辉 HUANG Zhi-yong;WU Hao-lin;WANG Zhuang;LI Hui(College of Computer Science,Sichuan University,Chengdu 610065,China)
出处 《计算机科学》 CSCD 北大核心 2021年第4期223-228,共6页 Computer Science
基金 教育部联合基金(6141A02011607)。
关键词 深度强化学习 深度Q网络 神经网络参数 深度探索 Deep reinforcement learning Deep Q-network Neural network parameters Deep exploration
  • 相关文献

参考文献5

二级参考文献29

  • 1高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 2Puterman M L.Markov Decision Process:Discrete Dynamic Dtochastic Programming.New-York:Wiley,1994
  • 3Kaya M,Alhajj R.Fuzzy olap association rules mining based modular reinforcement learning approach for multiagent systems.IEEE Transactions on Systems,Man and Cybernetics part B:Cybernetics,2005,35(2):326-338
  • 4Singh S,Bertsekas D.Reinforcement learning for dynamic channel allocation in cellular telephone systems//Mozer M C,Jordan M L,Petsche T.Proceedings of the NIPS-9.Cambridge MA:MIT Press,1997:974
  • 5Vengerov D N,Berenji H R.A fuzzy reinforcement learning approach to power control in wireless transmitters.IEEE Transactions on Systems,Man,and Cybernetics part B:Cybernetics,2005,35(4):768-778
  • 6Critesl R H,Barto A G.Elevator group control using multiple reinforcement learning Agents.Machine Learning,1998,33(2/3):235-262
  • 7Kaelbling L P,Littman M L,Moore A P.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4:237-285
  • 8Sutton R S,Barto A G.Reinforcement Learning:An Introduction.Cambridge MA:MIT Press,1998
  • 9Schwartz A.A reinforcement learning method for maximizing undiscounted rewards//Huns M N,Singh M P eds.Proceedings of the 10th Annual Conference on Machine Learning.San Francisco:Morgan Kaufmann,1993:298-305
  • 10Tadepalli P,Ok D.Model-based average reward reinforcement learning.Artificial Intelligence,1998,100(1/2):177-224

共引文献98

同被引文献11

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部