摘要
【目的】强化学习策略迁移是一条降低深度强化学习训练消耗的有效途径,其中,局部策略迁移是在较小粒度上实现策略的迁移,它对整体策略性能的提升和策略灵活组合形成新整体策略有重要意义。提出了深度强化学习局部策略迁移方法。【方法】借鉴了软件工程中“高内聚,低耦合”的思想,通过对作为策略载体的神经网络进行划分,使得不同子神经网络承载不同局部策略,然后通过对子神经网络的迁移,实现局部策略迁移。该方法支持局部策略灵活替换和组合,形成性能更优秀和适应新环境的新总策略。选取经典深度强化学习算法DQN作为实验算法,对比DQN算法使用本文方法前后的迁移能力和表现性能。【结果】结果表明,DQN算法使用本文方法后实现了局部策略迁移的同时表现性能还提升了约27.5%.
【Purposes】 Reinforcement learning policy transfer is an effective way to reducing the consumption of deep reinforcement learning training. Local policy transfer is policy transfer at a fine-grained level, which is of great significance to the improvement of the global policy performance and the formation of a new global policy by the combination of local policies. Therefore, a deep reinforcement learning method for local policy transfer is proposed. 【Methods】 This method draws on the idea of “high cohesion, low coupling” in software engineering. By dividing the neural network, which is the carrier of policy, different sub-neural networks carry different local policies, and then realize the transfer of local policies through the transfer of sub-neural networks. This method supports flexible replacement and combination of local policies and forms a new global policy with better performance and adaption to new environment. In this paper, the classical deep reinforcement learning algorithm DQN is selected as the experimental algorithm and the transfer ability and performance of DQN algorithm before and after using the proposed method are compared. 【Findings】 The results show that the DQN algorithm realizes local policy transfer and improves its performance by about 27.5% after using the proposed method.
作者
史腾飞
王莉
臧嵘
SHI Tengfei;WANG Li;ZANG Rong(North Automatic Control Technology Institute,Taiyuan 030006,China;College of Com puter Science and Technology(College of Data Science),Taiyuan University of Technology,Jinzhong 030600,China;Shanci Securities Company Limited,Taiyuan 030032,China)
出处
《太原理工大学学报》
CAS
北大核心
2024年第4期705-711,共7页
Journal of Taiyuan University of Technology
关键词
深度强化学习
局部策略迁移
DQN
deep reinforcement learning
local policy transfer
DQN