基于SARSA学习的跳频系统智能抗干扰决策算法

Frequency hopping system intelligent anti-jamming decision algorithm based on SARSA learning

下载PDF

导出

摘要为了提高在干扰多变电磁环境下跳频通信系统的抗干扰性能,提出一种基于改进SARSA学习的智能抗干扰决策算法。试错是强化学习最重要的特征,它可以影响算法的长期总收益,而试错的优劣由算法探索和利用的表现决定,故文中将基于置信度上界的动作选择策略和优先遍历思想应用于SARSA学习,以平衡智能体对状态-动作空间的探索和利用。另外,针对多种干扰并存的电磁环境以及跳频通信系统的跳速、信道划分间隔和跳频序列等可调节参数,设计了相应的系统模型、决策目标、状态-动作空间和奖赏函数。在不同干扰环境下所提算法都优于三种对比算法,表明基于置信度上界的动作选择策略和优先遍历思想的加入较好地协调了探索与利用的矛盾,提升了收敛速度和稳态性能,加强了SARSA学习对干扰环境的适应性。 In order to enhance the anti-jamming performance of frequency hopping communication system in the electromagnetic environment with changeable interference,an intelligent anti-jamming decision algorithm based on the improved SARSA(state-action-reward-state-action)learning is proposed.Trial-and-error is the most important feature of reinforcement learning,which can affect the long-term total revenue of the algorithm.However,the advantages and disadvantages of trial-and-error are determined by the performance of the algorithm′s exploration and utilization,so the action selection strategy based on the UCB(upper confidence bound)and the thought of priority traversal are applied to SARSA learning to balance the exploration and utilization of state-action space of the agent.In addition,according to the electromagnetic environment where multiple interferences coexist and the adjustable parameters of frequency hopping communication system,such as hopping speed,channel division interval and frequency hopping sequence,the corresponding system model,decision-making objective,state-action space and reward function are designed.The proposed algorithm is always superior to the other three algorithms in different interference environment,which show that the introduction of action selection strategy based on the UCB and the thought of priority traversal can balance the exploration and exploitation well,increase the convergence speed and the steady-state performance of the system,and strengthen the adaptability of SARSA learning to the electromagnetic interference environment.

作者陈一波赵知劲 CHEN Yibo;ZHAO Zhijin(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)

机构地区杭州电子科技大学通信工程学院

出处《现代电子技术》 2023年第1期31-35,共5页 Modern Electronics Technique

基金国家自然科学基金项目(U19B2016)。

关键词复杂电磁环境跳频系统抗干扰 SARSA学习置信度上界优先遍历状态动作空间探索与利用 complex electromagnetic environment frequency hopping system anti-jamming SARSA learning UCB priority traversal state-action space exploration and exploitation

分类号 TN914.41-34 [电子电信—通信与信息系统] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献7

1谷静,邓逸飞,张新.基于启发式强化学习的动态CRE偏置选择算法[J].计算机工程,2020,46(5):200-206. 被引量：5
2杨海,吴静.基于Sarsa学习的基站休眠策略研究[J].重庆邮电大学学报（自然科学版）,2020,32(4):536-543. 被引量：2
3谷静,侯永平,张雨轩,张新.基于改进HSARSA(λ)算法的功率控制研究[J].计算机工程,2019,45(11):112-120. 被引量：2
4王安义,李萍,张育芝.基于SARSA算法的水声通信自适应调制[J].科学技术与工程,2020,20(16):6505-6509. 被引量：4
5余涛,张水平.在策略SARSA算法在互联电网CPS最优控制中的应用[J].电力系统保护与控制,2013,41(1):211-216. 被引量：15
6何爱平,张建伟,韩云祥.基于期望Sarsa的进港航班排序模型研究[J].现代计算机,2021,27(7):55-59. 被引量：5
7王现磊,郝文宁,陈刚,余晓晗.基于模拟退火策略的Sarsa强化学习方法[J].计算机仿真,2019,36(4):219-222. 被引量：8

二级参考文献41

1徐肖豪,姚源.遗传算法在终端区飞机排序中的应用[J].交通运输工程学报,2004,4(3):121-126. 被引量：43
2管宇,杨琪瑜.试验设计中的重复试验次数的确定[J].生物数学学报,2005,20(3):369-374. 被引量：4
3李志荣,张兆宁.基于蚁群算法的航班着陆排序[J].交通运输工程与信息学报,2006,4(2):66-69. 被引量：25
4Watkins C J C H. Learning from delayed rewards[D]. Cambridge, England: Cambridge University, 1989.
5Rummery G A, Niranjan M. On-line Q-learning using connectionist systems[R]. Cambridge: Cambridge University, 1994.
6Tousi M R, Hosseinian S H, Jadidinejad A H, et al. Application of SARSA leaming algorithm for reactive power control in power systemiC] // 2rid IEEE International Conference on Power and Energy (PECon 08), December 1-3, 2008.
7LU Kai, XU Jian-min, LI Yi-shun. An optimization method for single intersection's signal timing based on SARSA(X) algorithmiC] // 2008 Chinese Control and Decision Conference (CCDC 2008): 5146-5150.
8Sutton R S, Barto A (2 Reinforcement Learning: an introduction[M]. Cambridge: MIT Press, 1998.
9Imthias Ahamed T P, Nagendra Rao P S, Sastry P S. A reinforcement learning approach to automatic generation control[J]. Electric Power Systems Research, 2002, 63(1) 9-26.
10余涛,周斌.基于强化学习的互联电网CPS自校正控制[J].电力系统保护与控制,2009,37(10):33-38. 被引量：18

共引文献32

1黄岭,秦春娣,陈伟.几种智能排课算法的对比探讨[J].电脑知识与技术,2020,0(4):159-160.
2程乐峰,余涛,张孝顺,殷林飞.机器学习在能源与电力系统领域的应用和展望[J].电力系统自动化,2019,43(1):15-31. 被引量：113
3刘洪,李吉峰,葛少云,张鹏,陈星屹.基于多主体博弈与强化学习的并网型综合能源微网协调调度[J].电力系统自动化,2019,43(1):40-48. 被引量：57
4常鲜戎,王建文,崔赵俊.基于戴维南等值模型的静稳极限在线监视[J].电测与仪表,2015,52(16):11-16. 被引量：3
5郭亮,于昌海,吴继平,滕贤亮,温丽丽.四川电网AGC机组协调优化控制策略研究与应用[J].电力系统保护与控制,2016,44(17):159-164. 被引量：9
6贺颖,潘杨,陶骞,刘悦遐,孙建军,查晓明.考虑调频死区的电网一次调频能力评价指标[J].电力系统保护与控制,2016,44(19):85-90. 被引量：17
7赵万宗,韦化,韦昌福,鲍海波.考虑市场力风险约束的最优AGC控制模型[J].电力自动化设备,2018,38(5):77-82. 被引量：2
8席磊,李玉丹,黄悦华,杨苹,许志荣.基于虚拟狼群控制策略的智能发电控制[J].中国电机工程学报,2018,38(10):2966-2979. 被引量：12
9何明星,王丽娜,孙希艳.基于CPS的军队油库管理系统研究[J].电脑知识与技术,2017,13(8X):204-205.
10沈珺,柳伟,李虎成,李娜,温镇,殷明慧.基于强化学习的多微电网分布式二次优化控制[J].电力系统自动化,2020,44(5):198-206. 被引量：24

1冯智斌,徐煜华,杜智勇,刘鑫,李文,韩昊,张晓博.对抗智能干扰的主动防御技术[J].通信学报,2022,43(10):42-54. 被引量：4
2张胜,王宁宁,宋建宇,黄若琳,王俊,唐守峰.基于人工表面等离子体激元的超宽带带通滤波器[J].固体电子学研究与进展,2022,42(5):371-375.
3武振,杨靖宇,王硕,王正欢,贾敏.数据链智能抗干扰设计与实现[J].天地一体化信息网络,2022,3(4):31-44. 被引量：1
4纪龙,苗国英,李涛,张静怡.基于UA-QMIX的价值函数分解方法研究[J].计算机仿真,2022,39(11):448-452.
5许佰涛,刘冬利,侯建强,李祎帆.基于强化学习的雷达抗复合干扰[J].舰船电子工程,2022,42(10):83-86. 被引量：1

现代电子技术

2023年第1期

浏览历史

内容加载中请稍等...

基于SARSA学习的跳频系统智能抗干扰决策算法

参考文献7

二级参考文献41

共引文献32

相关作者

相关机构

相关主题

浏览历史