摘要
为了提高在干扰多变电磁环境下跳频通信系统的抗干扰性能,提出一种基于改进SARSA学习的智能抗干扰决策算法。试错是强化学习最重要的特征,它可以影响算法的长期总收益,而试错的优劣由算法探索和利用的表现决定,故文中将基于置信度上界的动作选择策略和优先遍历思想应用于SARSA学习,以平衡智能体对状态-动作空间的探索和利用。另外,针对多种干扰并存的电磁环境以及跳频通信系统的跳速、信道划分间隔和跳频序列等可调节参数,设计了相应的系统模型、决策目标、状态-动作空间和奖赏函数。在不同干扰环境下所提算法都优于三种对比算法,表明基于置信度上界的动作选择策略和优先遍历思想的加入较好地协调了探索与利用的矛盾,提升了收敛速度和稳态性能,加强了SARSA学习对干扰环境的适应性。
In order to enhance the anti-jamming performance of frequency hopping communication system in the electromagnetic environment with changeable interference,an intelligent anti-jamming decision algorithm based on the improved SARSA(state-action-reward-state-action)learning is proposed.Trial-and-error is the most important feature of reinforcement learning,which can affect the long-term total revenue of the algorithm.However,the advantages and disadvantages of trial-and-error are determined by the performance of the algorithm′s exploration and utilization,so the action selection strategy based on the UCB(upper confidence bound)and the thought of priority traversal are applied to SARSA learning to balance the exploration and utilization of state-action space of the agent.In addition,according to the electromagnetic environment where multiple interferences coexist and the adjustable parameters of frequency hopping communication system,such as hopping speed,channel division interval and frequency hopping sequence,the corresponding system model,decision-making objective,state-action space and reward function are designed.The proposed algorithm is always superior to the other three algorithms in different interference environment,which show that the introduction of action selection strategy based on the UCB and the thought of priority traversal can balance the exploration and exploitation well,increase the convergence speed and the steady-state performance of the system,and strengthen the adaptability of SARSA learning to the electromagnetic interference environment.
作者
陈一波
赵知劲
CHEN Yibo;ZHAO Zhijin(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)
出处
《现代电子技术》
2023年第1期31-35,共5页
Modern Electronics Technique
基金
国家自然科学基金项目(U19B2016)。
关键词
复杂电磁环境
跳频系统
抗干扰
SARSA学习
置信度上界
优先遍历
状态动作空间
探索与利用
complex electromagnetic environment
frequency hopping system
anti-jamming
SARSA learning
UCB
priority traversal
state-action space
exploration and exploitation