期刊文献+

基于双重注意力机制的异步优势行动者评论家算法 被引量:4

Asynchronous Advantage Actor-Critic with Double Attention Mechanisms
下载PDF
导出
摘要 深度强化学习是目前机器学习领域发展最快的技术之一.传统的深度强化学习方法在处理高维度大状态的空间任务时,庞大的计算量导致其训练时间过长.虽然异步深度强化学习利用异步方法极大缩短了训练时间,但会忽略某些更具价值的图像区域和图像特征.针对上述问题,本文提出了一种基于双重注意力机制的异步优势行动者评论家算法.新算法利用特征注意力机制和视觉注意力机制来改进传统的异步深度强化学习模型.其中,特征注意力机制为卷积神经网络卷积后的所有特征图设置不同的权重,使得智能体聚焦于重要的图像特征;同时,视觉注意力机制为图像不同区域设置权重参数,权重高的区域表示该区域信息对智能体后续的策略学习有重要价值,帮助智能体更高效地学习到最优策略.新算法引入双重注意力机制,从表层和深层两个角度对图像进行编码表征,帮助智能体将聚焦点集中在重要的图像区域和图像特征上.最后,通过Atari 2600部分经典实验验证了基于双重注意力机制的异步优势行动者评论家算法的有效性. In recent years,deep reinforcement learning(DRL),which combines deep learning and reinforcement learning together,is a new research hotspot in artificial intelligence.As DRL takes advantage of deep learning,it is able to take raw images as input,which extends applications of reinforcement learning.At the mean while time,DRL retains the advantages of reinforcement learning in application such as intelligent policy decision or robotic control.However,traditional DRL such as deep Q-network(DQN)or double deep Q-network(DDQN),could hardly deal with complex tasks with high-dimensional state in a short time.Researchers have proposed many methods to solve this problem,and asynchronous advantage actor-critic(A3 C)is one of the most used algorithm.As we know,traditional asynchronous deep reinforcement learning can use multi-threading techniques to reduce large amounts of training time.However,when it comes to high-dimensional large-state space tasks,some valuable and important image areas and features are often ignored,such as Atari 2600 games.The reason is that Agent’s attention is focused on the entire input image and all features of the image,without any emphases on some important features.To handle this problem,we employ the attention mechanism to ameliorate the performance of traditional asynchronous deep reinforcement learning models.In recent years,inspired by human vision,the attention mechanism has been extensively used in machine translation,image recognition and speech recognition,becoming one of the most noteworthy and in-depth research techniques in the area of deep learning technologies.Based on this,we put forward an asynchronous advantage actor-critic with double attention mechanisms(DAM-A3 C).In DAM-A3 C,there are two main characteristics:visual attention mechanism(VAM)and feature attention mechanism(FAM).First,the application of visual attention mechanism can enable Agent to adaptively engage in the image region,especially in those more important areas which can enhance the cumulative reward at each moment,reducing the computational cost of the network’s training and finally accelerating the process of learning the approximate optimal strategy.Second,via the exertion of FAM,an asynchronous advantage actor-critic is expected to pay more attention to those features with more value.What we know is that different convolution kernels can generate different feature maps by operating convolution on the image in convolutional neural network.And feature maps completely describe the image from different features.The traditional training of convolutional neural network treats each extracted feature equally,which means all features have the same proportion,instead of different levels of focus according to their value.However,some image features have a crucial role in the description of images,such as color features,shape features and spatial relationship features,etc.In order to alleviate this problem,FAM can assist Agent to converge on feature maps with rich values,which will facilitate Agent to make correct decisions.To sum up,we introduce FAM in VAM-A3 C model and propose DAM-A3 C model.DAM-A3 Cutilizes visual attention mechanism and feature attention mechanism to enable Agent to concentrate on the important areas and important features of the image,which advances the network model to recognize important information and key features of the image in a short time.We select some classic Atari 2600 games as experimental objects to evaluate the performance of the new model.The experimental result shows that the new model has better performance than the traditional asynchronous advantage actor-critic algorithm in experimental tasks.
作者 凌兴宏 李杰 朱斐 刘全 伏玉琛 LING Xing-Hong;LI Jie;ZHU Fei;LIU Quan;FU Yu-Chen(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006;Provincial Key Laboratory for Computer Information Processing Technology,Soochow University,Suzhou,Jiangsu 215006;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012;Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000;School of Computer Science and Engineering,Changshu Institute of Technology,Changshu,Jiangsu 215500)
出处 《计算机学报》 EI CSCD 北大核心 2020年第1期93-106,共14页 Chinese Journal of Computers
基金 国家自然科学基金(61772355,61303108,61373094) 江苏省高等学校自然科学研究重大项目(17KJA520004) 吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04) 苏州市应用基础研究计划工业部分(SYG201422) 苏州市民生科技项目(SS201736) 江苏高校优势学科建设工程资助项目资助~~
关键词 注意力机制 双重注意力机制 行动者评论家 异步优势行动者评论家 异步深度强化学习 attention mechanism double attention mechanisms actor-critic asynchronous advantage actor-critic asynchronous deep reinforcement learning
  • 相关文献

参考文献5

二级参考文献54

  • 1Puterman M L.Markov Decision Process:Discrete Dynamic Dtochastic Programming.New-York:Wiley,1994
  • 2Kaya M,Alhajj R.Fuzzy olap association rules mining based modular reinforcement learning approach for multiagent systems.IEEE Transactions on Systems,Man and Cybernetics part B:Cybernetics,2005,35(2):326-338
  • 3Singh S,Bertsekas D.Reinforcement learning for dynamic channel allocation in cellular telephone systems//Mozer M C,Jordan M L,Petsche T.Proceedings of the NIPS-9.Cambridge MA:MIT Press,1997:974
  • 4Vengerov D N,Berenji H R.A fuzzy reinforcement learning approach to power control in wireless transmitters.IEEE Transactions on Systems,Man,and Cybernetics part B:Cybernetics,2005,35(4):768-778
  • 5Critesl R H,Barto A G.Elevator group control using multiple reinforcement learning Agents.Machine Learning,1998,33(2/3):235-262
  • 6Kaelbling L P,Littman M L,Moore A P.Reinforcement learning:A survey.Journal of Artificial Intelligence Research,1996,4:237-285
  • 7Sutton R S,Barto A G.Reinforcement Learning:An Introduction.Cambridge MA:MIT Press,1998
  • 8Schwartz A.A reinforcement learning method for maximizing undiscounted rewards//Huns M N,Singh M P eds.Proceedings of the 10th Annual Conference on Machine Learning.San Francisco:Morgan Kaufmann,1993:298-305
  • 9Tadepalli P,Ok D.Model-based average reward reinforcement learning.Artificial Intelligence,1998,100(1/2):177-224
  • 10Gosavi A.Reinforcement learning for long run average cost.European Journal of Operational Research,2004,155 (3):654-674

共引文献730

同被引文献28

引证文献4

二级引证文献95

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部