期刊文献+

基于SARSA学习的跳频系统智能抗干扰决策算法

Frequency hopping system intelligent anti-jamming decision algorithm based on SARSA learning
下载PDF
导出
摘要 为了提高在干扰多变电磁环境下跳频通信系统的抗干扰性能,提出一种基于改进SARSA学习的智能抗干扰决策算法。试错是强化学习最重要的特征,它可以影响算法的长期总收益,而试错的优劣由算法探索和利用的表现决定,故文中将基于置信度上界的动作选择策略和优先遍历思想应用于SARSA学习,以平衡智能体对状态-动作空间的探索和利用。另外,针对多种干扰并存的电磁环境以及跳频通信系统的跳速、信道划分间隔和跳频序列等可调节参数,设计了相应的系统模型、决策目标、状态-动作空间和奖赏函数。在不同干扰环境下所提算法都优于三种对比算法,表明基于置信度上界的动作选择策略和优先遍历思想的加入较好地协调了探索与利用的矛盾,提升了收敛速度和稳态性能,加强了SARSA学习对干扰环境的适应性。 In order to enhance the anti-jamming performance of frequency hopping communication system in the electromagnetic environment with changeable interference,an intelligent anti-jamming decision algorithm based on the improved SARSA(state-action-reward-state-action)learning is proposed.Trial-and-error is the most important feature of reinforcement learning,which can affect the long-term total revenue of the algorithm.However,the advantages and disadvantages of trial-and-error are determined by the performance of the algorithm′s exploration and utilization,so the action selection strategy based on the UCB(upper confidence bound)and the thought of priority traversal are applied to SARSA learning to balance the exploration and utilization of state-action space of the agent.In addition,according to the electromagnetic environment where multiple interferences coexist and the adjustable parameters of frequency hopping communication system,such as hopping speed,channel division interval and frequency hopping sequence,the corresponding system model,decision-making objective,state-action space and reward function are designed.The proposed algorithm is always superior to the other three algorithms in different interference environment,which show that the introduction of action selection strategy based on the UCB and the thought of priority traversal can balance the exploration and exploitation well,increase the convergence speed and the steady-state performance of the system,and strengthen the adaptability of SARSA learning to the electromagnetic interference environment.
作者 陈一波 赵知劲 CHEN Yibo;ZHAO Zhijin(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)
出处 《现代电子技术》 2023年第1期31-35,共5页 Modern Electronics Technique
基金 国家自然科学基金项目(U19B2016)。
关键词 复杂电磁环境 跳频系统 抗干扰 SARSA学习 置信度上界 优先遍历 状态动作空间 探索与利用 complex electromagnetic environment frequency hopping system anti-jamming SARSA learning UCB priority traversal state-action space exploration and exploitation
  • 相关文献

参考文献7

二级参考文献41

  • 1徐肖豪,姚源.遗传算法在终端区飞机排序中的应用[J].交通运输工程学报,2004,4(3):121-126. 被引量:43
  • 2管宇,杨琪瑜.试验设计中的重复试验次数的确定[J].生物数学学报,2005,20(3):369-374. 被引量:4
  • 3李志荣,张兆宁.基于蚁群算法的航班着陆排序[J].交通运输工程与信息学报,2006,4(2):66-69. 被引量:25
  • 4Watkins C J C H. Learning from delayed rewards[D]. Cambridge, England: Cambridge University, 1989.
  • 5Rummery G A, Niranjan M. On-line Q-learning using connectionist systems[R]. Cambridge: Cambridge University, 1994.
  • 6Tousi M R, Hosseinian S H, Jadidinejad A H, et al. Application of SARSA leaming algorithm for reactive power control in power systemiC] // 2rid IEEE International Conference on Power and Energy (PECon 08), December 1-3, 2008.
  • 7LU Kai, XU Jian-min, LI Yi-shun. An optimization method for single intersection's signal timing based on SARSA(X) algorithmiC] // 2008 Chinese Control and Decision Conference (CCDC 2008): 5146-5150.
  • 8Sutton R S, Barto A (2 Reinforcement Learning: an introduction[M]. Cambridge: MIT Press, 1998.
  • 9Imthias Ahamed T P, Nagendra Rao P S, Sastry P S. A reinforcement learning approach to automatic generation control[J]. Electric Power Systems Research, 2002, 63(1) 9-26.
  • 10余涛,周斌.基于强化学习的互联电网CPS自校正控制[J].电力系统保护与控制,2009,37(10):33-38. 被引量:18

共引文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部