摘要
针对认知无线网络(Cognitive Radio Network,CRN)中的频谱切换问题,提出了一种基于强化学习的近端策略优化(Proximal Policy Optimization,PPO)方法。首先,将频谱切换问题建模为马尔可夫决策过程,设计了一种基于用户体验质量(Quality of Experience,QoE)的回报函数。其次,通过训练算法模型使长期回报最大化,从而实现了最优频谱切换。最后,通过仿真实验对提出的切换方法进行验证。结果表明,基于PPO的频谱切换方法能够实现更高效和更稳定的切换,提高了认知用户的可用传输速率和数据交付成功率,缩短了数据交付时间。
A PPO(Proximal Policy Optimization) method based on reinforcement learning is proposed to solve the spectrum handoff problem in cognitive radio networks. Firstly, the spectrum handoff problem is transformed into a Markov decision process. Then, a novel kind of return function based on the QoE(Quality of Experience) is designed. Optimal spectrum handoff is achieved by training the model to maximize the long-term return. Finally, the proposed handoff method is compared with other methods by simulation. The results indicate that the spectrum handoff method based on PPO can achieve more efficient and stable handoff. It can improve the available date rate of secondary users, shorten the data delivery time, and improve the success rate of data delivery.
作者
李淑丰
邵尉
谢然
于玉江
LI Shufeng;SHAO Wei;XIE Ran;YU Yujiang(Army Engineering University of PLA,Nanjing Jiangsu 210000,China;Unit 31107 of PLA,Nanjing Jiangsu 210000,China;Jiangsu Ecological Environment Monitoring Center,Nanjing Jiangsu 210000,China)
出处
《通信技术》
2021年第8期1917-1924,共8页
Communications Technology
基金
江苏省自然科学基金项目(No.BK20160080)。
关键词
认知无线电
频谱切换
强化学习
近端策略优化(PPO)
cognitive radio
spectrum handoff
reinforcement learning
PPO(Proximal Policy Optimization)