摘要
在动态干扰环境下的多节点无线传感器网络中,随着状态-动作空间的增大,传统强化学习难以收敛.为克服这一问题,本文提出一种基于迁移强化学习的快速抗干扰算法,即将多智能体Q学习和值函数迁移方法相结合.首先,将多节点通信抗干扰问题建模为马尔科夫博弈;然后,引入互模拟关系度量不同状态-动作对之间的相似性;最后,采用多智能体Q学习算法学习抗干扰策略,并在每一步Q值更新后,根据不同状态-动作对之间的相似性进行值函数迁移.仿真结果表明,在分时隙传输的在线抗干扰问题中,所提算法的抗干扰性能显著优于正交跳频法和随机跳频法,在达到相同抗干扰效果时,所需的迭代次数远少于常规Q学习算法.
In a multi-node wireless sensor network under dynamic jamming environment,traditional reinforcement learning is difficult to converge with the increase of state-action space.To overcome this disadvantage,in this paper,we propose a fast convergence anti-jamming algorithm based on reinforcement learning.The proposed algorithm combines multi-agent Q-learning with value function transfer.Firstly,the multi node communication anti-jamming problem is modeled as a Markov game.Then,we introduce Bisimulation Relation to measure the similarity between different state action pairs.Finally,the multi-agent Q learning algorithm is used to learn the anti-jamming strategy,and after each step of Q-value updating,the value function is transferred according to the similarity between different state-action pairs.The simulation results show that the anti-jamming performance of the proposed algorithm is significantly better than that of the orthogonal frequency hopping and the random frequency hopping.When the same anti-jamming effect is achieved,the number of iterations required is much less than that of the traditional Q-learning algorithm.
作者
周权
牛英滔
ZHOU Quan;NIU Yingtao(School of Communication Engineering,Army Engineering University of PLA,Nanjing 210000,China;The 63rd Research Institute,National University of Defense Technology,Nanjing 210000,China)
出处
《电波科学学报》
CSCD
北大核心
2023年第5期816-824,共9页
Chinese Journal of Radio Science
基金
国家自然科学基金(U19B2014)
基础加强计划技术领域基金(2019-JCJQ-JJ-212)。
关键词
无线通信
抗干扰通信
迁移学习
无线传感器网络(WSN)
多智能体强化学习
wireless communication
anti-jamming communication
transfer learning
wireless sensor network
multi-Agent reinforcement learning