To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-lea...To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.展开更多
In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Se...In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users.展开更多
A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such ...A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such as linear programming or convex optimization, the new approach obtains the capability of iteratively on-line learning environment performance by using Reinforcement Learning (RL) algorithm after observing the variability and uncertainty of the heterogeneous wireless networks. Appropriate decision-making access actions can then be obtained by employing Fuzzy Inference System (FIS) which ensures the strategy being able to explore the possible status and exploit the experiences sufficiently. The new approach considers multi-objective such as spectrum efficiency and fairness between CR Access Points (AP) effectively. By interacting with the environment and accumulating comprehensive advantages, it can achieve the largest long-term reward expected on the desired objectives and implement the best action. Moreover, the present algorithm is relatively simple and does not require complex calculations. Simulation results show that the proposed approach can get better performance with respect to fixed frequency planning scheme or general dynamic spectrum allocation policy.展开更多
Although modulation classification based on deep neural network can achieve high Modulation Classification(MC)accuracies,catastrophic forgetting will occur when the neural network model continues to learn new tasks.In...Although modulation classification based on deep neural network can achieve high Modulation Classification(MC)accuracies,catastrophic forgetting will occur when the neural network model continues to learn new tasks.In this paper,we simulate the dynamic wireless communication environment and focus on breaking the learning paradigm of isolated automatic MC.We innovate a research algorithm for continuous automatic MC.Firstly,a memory for storing representative old task modulation signals is built,which is employed to limit the gradient update direction of new tasks in the continuous learning stage to ensure that the loss of old tasks is also in a downward trend.Secondly,in order to better simulate the dynamic wireless communication environment,we employ the mini-batch gradient algorithm which is more suitable for continuous learning.Finally,the signal in the memory can be replayed to further strengthen the characteristics of the old task signal in the model.Simulation results verify the effectiveness of the method.展开更多
Cognitive emergency communication net-works can meet the requirements of large capac-ity,high density and low delay in emergency com-munications.This paper analyzes the properties of emergency users in cognitive emerg...Cognitive emergency communication net-works can meet the requirements of large capac-ity,high density and low delay in emergency com-munications.This paper analyzes the properties of emergency users in cognitive emergency communi-cation networks,designs a multi-objective optimiza-tion and proposes a novel multi-objective bacterial foraging optimization algorithm based on effective area(MOBFO-EA)to maximize the transmission rate while maximizing the lifecycle of the network.In the algorithm,the effective area is proposed to prevent the algorithm from falling into a local optimum,and the diversity and uniformity of the Pareto-optimal solu-tions distributed in the effective area are used to eval-uate the optimization algorithm.Then,the dynamic preservation is used to enhance the competitiveness of excellent individuals and the uniformity and diversity of the Pareto-optimal solutions in the effective area.Finally,the adaptive step size,adaptive moving direc-tion and inertial weight are used to shorten the search time of bacteria and accelerate the optimization con-vergence.The simulation results show that the pro-posed MOBFO-EA algorithm improves the efficiency of the Pareto-optimal solutions by approximately 55%compared with the MOPSO algorithm and by approx-imately 60%compared with the MOBFO algorithm and has the fastest and smoothest convergence.展开更多
with the development of 5G,the future wireless communication network tends to be more and more intelligent.In the face of new service de-mands of communication in the future such as super-heterogeneous network,multipl...with the development of 5G,the future wireless communication network tends to be more and more intelligent.In the face of new service de-mands of communication in the future such as super-heterogeneous network,multiple communication sce-narios,large number of antenna elements and large bandwidth,new theories and technologies of intelli-gent communication have been widely studied,among which Deep Learning(DL)is a powerful technology in artificial intelligence(AI).It can be trained to con-tinuously learn to update the optimal parameters.This paper reviews the latest research progress of DL in in-telligent communication,and emphatically introduces five scenarios including Cognitive Radio(CR),Edge Computing(EC),Channel Measurement(CM),End to end Encoder/Decoder(EED)and Visible Light Com-munication(VLC).The prospect and challenges of further research and development in the future are also discussed.展开更多
In order to improve the energy efficiency(EE)in the underlay cognitive radio(CR)networks,a power allocation strategy based on an actor-critic reinforcement learning is proposed,where a cluster of cognitive users(CUs)c...In order to improve the energy efficiency(EE)in the underlay cognitive radio(CR)networks,a power allocation strategy based on an actor-critic reinforcement learning is proposed,where a cluster of cognitive users(CUs)can simultaneously access to the same primary spectrum band under the interference constraints of the primary user(PU),by employing the non-orthogonal multiple access(NOMA)technique.In the proposed scheme,the optimization of the power allocation is formulated as a non-convex optimization problem.Additionally,the power allocation for different CUs is based on the actor-critic reinforcement learning model,in which the weighted data rate is set as the reward function,and the generated action strategy(i.e.the power allocation)is iteratively criticized and updated.Both the CU’s spectral efficiency and the PU’s interference constrains are considered in the training of the actor-critic reinforcement learning.Furthermore,the first order Taylor approximation as well as other manipulations are adopted to solve the power allocation optimization problem for the sake of considering the conventional channel conditions.According to the simulation results,we find that our scheme could achieve a higher spectral efficiency for the CUs compared to a benchmark scheme without learning process as well as the existing Q-learning based method,while the resultant interference affecting the PU transmission can be maintained at a given tolerated limit.展开更多
In this paper,we consider a cognitive radio(CR) system with a single secondary user(SU) and multiple licensed channels.The SU requests a fixed number of licensed channels and must sense the licensed channels one by on...In this paper,we consider a cognitive radio(CR) system with a single secondary user(SU) and multiple licensed channels.The SU requests a fixed number of licensed channels and must sense the licensed channels one by one before transmission.By leveraging prediction based on correlation between the licensed channels,we propose a novel spectrum sensing strategy,to decide which channel is the best choice to sense in order to reduce the sensing time overhead and further improve the SU's achievable throughput.Since the correlation coefficients between the licensed channels cannot be exactly known in advance,the spectrum sensing strategy is designed based on the model-free reinforcement learning(RL).The experimental results show that the proposed spectrum sensing strategy based on reinforcement learning converges and outperforms random sensing strategy in terms of long-term statistics.展开更多
The adoption of the Fifth Generation(5G)and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment.Although resource-constra...The adoption of the Fifth Generation(5G)and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment.Although resource-constrained,the Cognitive Radio(CR)has been identified as a key enabler of distributed 5G and beyond networks due to its cognitive abilities and ability to access idle spectrum opportunistically.Reinforcement learning is well suited to meet the demand for learning in 5G and beyond 5G networks because it does not require the learning agent to have prior information about the environment in which it operates.Intuitively,CRs should be enabled to implement reinforcement learning to efficiently gain opportunistic access to spectrum and co-exist with each other.However,the application of reinforcement learning is straightforward in a single-agent environment and complex and resource intensive in a multi-agent and multi-objective learning environment.In this paper,(1)we present a brief history and overview of reinforcement learning and its limitations;(2)we provide a review of recent multi-agent learning methods proposed and multi-agent learning algorithms applied in Cognitive Radio(CR)networks;and(3)we further present a novel framework for multi-CR reinforcement learning and conclude with a synopsis of future research directions and recommendations.展开更多
为了保证认知无线网络中次用户本身的通信服务质量,同时降低次用户因发射功率不合理而造成的功率损耗,提出了一种基于SumTree采样结合深度双Q网络(Double Deep Q Network,Double DQN)的非合作式多用户动态功率控制方法。通过这种方法,...为了保证认知无线网络中次用户本身的通信服务质量,同时降低次用户因发射功率不合理而造成的功率损耗,提出了一种基于SumTree采样结合深度双Q网络(Double Deep Q Network,Double DQN)的非合作式多用户动态功率控制方法。通过这种方法,次用户可以不断与辅助基站进行交互,在动态变化的环境下经过不断的学习,选择以较低的发射功率完成功率控制任务。其次,该方法可以解耦目标Q值动作的选择和目标Q值的计算,能够有效减少过度估计和算法的损失。并且,在抽取经验样本时考虑到不同样本之间重要性的差异,采用了结合优先级和随机抽样的SumTree采样方法,既能保证优先级转移也能保证最低优先级的非零概率采样。仿真结果表明,该方法收敛后的算法平均损失值能稳定在0.04以内,算法的收敛速度也至少快了10个训练回合,还能提高次用户总的吞吐量上限和次用户功率控制的成功率,并且将次用户的平均功耗降低了0.5 mW以上。展开更多
本文针对能量采集认知机器到机器(Machine-to-Machine,M2M)通信的能量效率问题,在保证服务质量(Quality of Service,QoS)的条件下,提出了一种能效优化算法.以最大化网络中用户能效为目标,综合考虑传输功率控制、时隙分配、传输模式选择...本文针对能量采集认知机器到机器(Machine-to-Machine,M2M)通信的能量效率问题,在保证服务质量(Quality of Service,QoS)的条件下,提出了一种能效优化算法.以最大化网络中用户能效为目标,综合考虑传输功率控制、时隙分配、传输模式选择、中继选择以及每个设备的能量状态为约束,将优化问题建模为一个混合整数非线性规划问题.将该能效优化问题转化为离散时间有限状态马尔科夫决策过程(Discrete-time and Finite-state Markov Decision Process,DFMDP)进行求解.提出一种基于深度强化学习的算法寻找最优策略.仿真结果表明,所提算法在平均能效方面优于其他方案,且收敛速度在可接受范围内.展开更多
基金National Natural Science Foundation of China(61973037)National 173 Program Project(2019-JCJQ-ZD-324).
文摘To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.
基金supported by the National Natural Science Foundation of China(Grant No.61971057).
文摘In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users.
基金supported in part by National Science Fund for Distinguished Young Scholars project under Grant No.60725105National Basic Research Program of China (973 Pro-gram) under Grant No.2009CB320404+1 种基金National Natural Science Foundation of China under Grant No.61072068Fundamental Research Funds for the Central Universities under Grant No.JY10000901031
文摘A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such as linear programming or convex optimization, the new approach obtains the capability of iteratively on-line learning environment performance by using Reinforcement Learning (RL) algorithm after observing the variability and uncertainty of the heterogeneous wireless networks. Appropriate decision-making access actions can then be obtained by employing Fuzzy Inference System (FIS) which ensures the strategy being able to explore the possible status and exploit the experiences sufficiently. The new approach considers multi-objective such as spectrum efficiency and fairness between CR Access Points (AP) effectively. By interacting with the environment and accumulating comprehensive advantages, it can achieve the largest long-term reward expected on the desired objectives and implement the best action. Moreover, the present algorithm is relatively simple and does not require complex calculations. Simulation results show that the proposed approach can get better performance with respect to fixed frequency planning scheme or general dynamic spectrum allocation policy.
文摘Although modulation classification based on deep neural network can achieve high Modulation Classification(MC)accuracies,catastrophic forgetting will occur when the neural network model continues to learn new tasks.In this paper,we simulate the dynamic wireless communication environment and focus on breaking the learning paradigm of isolated automatic MC.We innovate a research algorithm for continuous automatic MC.Firstly,a memory for storing representative old task modulation signals is built,which is employed to limit the gradient update direction of new tasks in the continuous learning stage to ensure that the loss of old tasks is also in a downward trend.Secondly,in order to better simulate the dynamic wireless communication environment,we employ the mini-batch gradient algorithm which is more suitable for continuous learning.Finally,the signal in the memory can be replayed to further strengthen the characteristics of the old task signal in the model.Simulation results verify the effectiveness of the method.
基金National Natural Sci-ence Foundation of China(Grant Nos.61871241 and 61771263)Science and Technology Program of Nantong(Grant No.JC2019117).
文摘Cognitive emergency communication net-works can meet the requirements of large capac-ity,high density and low delay in emergency com-munications.This paper analyzes the properties of emergency users in cognitive emergency communi-cation networks,designs a multi-objective optimiza-tion and proposes a novel multi-objective bacterial foraging optimization algorithm based on effective area(MOBFO-EA)to maximize the transmission rate while maximizing the lifecycle of the network.In the algorithm,the effective area is proposed to prevent the algorithm from falling into a local optimum,and the diversity and uniformity of the Pareto-optimal solu-tions distributed in the effective area are used to eval-uate the optimization algorithm.Then,the dynamic preservation is used to enhance the competitiveness of excellent individuals and the uniformity and diversity of the Pareto-optimal solutions in the effective area.Finally,the adaptive step size,adaptive moving direc-tion and inertial weight are used to shorten the search time of bacteria and accelerate the optimization con-vergence.The simulation results show that the pro-posed MOBFO-EA algorithm improves the efficiency of the Pareto-optimal solutions by approximately 55%compared with the MOPSO algorithm and by approx-imately 60%compared with the MOBFO algorithm and has the fastest and smoothest convergence.
基金the National Nat-ural Science Foundation of China under Grant No.62061039Postgraduate Innovation Project of Ningxia University No.JIP20210076Key project of Ningxia Natural Science Foundation No.2020AAC02006.
文摘with the development of 5G,the future wireless communication network tends to be more and more intelligent.In the face of new service de-mands of communication in the future such as super-heterogeneous network,multiple communication sce-narios,large number of antenna elements and large bandwidth,new theories and technologies of intelli-gent communication have been widely studied,among which Deep Learning(DL)is a powerful technology in artificial intelligence(AI).It can be trained to con-tinuously learn to update the optimal parameters.This paper reviews the latest research progress of DL in in-telligent communication,and emphatically introduces five scenarios including Cognitive Radio(CR),Edge Computing(EC),Channel Measurement(CM),End to end Encoder/Decoder(EED)and Visible Light Com-munication(VLC).The prospect and challenges of further research and development in the future are also discussed.
基金The work was supported by the Fundamental Research Funds for the Central Universities Grant3102018QD096in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2019JQ-075 and Grant 2019JQ-253,and in part by the National Natural Science Foundation of China under Grant 61901379,Grant 61901327,Grant 61825104 and Grant 61631015.
文摘In order to improve the energy efficiency(EE)in the underlay cognitive radio(CR)networks,a power allocation strategy based on an actor-critic reinforcement learning is proposed,where a cluster of cognitive users(CUs)can simultaneously access to the same primary spectrum band under the interference constraints of the primary user(PU),by employing the non-orthogonal multiple access(NOMA)technique.In the proposed scheme,the optimization of the power allocation is formulated as a non-convex optimization problem.Additionally,the power allocation for different CUs is based on the actor-critic reinforcement learning model,in which the weighted data rate is set as the reward function,and the generated action strategy(i.e.the power allocation)is iteratively criticized and updated.Both the CU’s spectral efficiency and the PU’s interference constrains are considered in the training of the actor-critic reinforcement learning.Furthermore,the first order Taylor approximation as well as other manipulations are adopted to solve the power allocation optimization problem for the sake of considering the conventional channel conditions.According to the simulation results,we find that our scheme could achieve a higher spectral efficiency for the CUs compared to a benchmark scheme without learning process as well as the existing Q-learning based method,while the resultant interference affecting the PU transmission can be maintained at a given tolerated limit.
基金supported by National Nature Science Foundation of China(NO.61372109)
文摘In this paper,we consider a cognitive radio(CR) system with a single secondary user(SU) and multiple licensed channels.The SU requests a fixed number of licensed channels and must sense the licensed channels one by one before transmission.By leveraging prediction based on correlation between the licensed channels,we propose a novel spectrum sensing strategy,to decide which channel is the best choice to sense in order to reduce the sensing time overhead and further improve the SU's achievable throughput.Since the correlation coefficients between the licensed channels cannot be exactly known in advance,the spectrum sensing strategy is designed based on the model-free reinforcement learning(RL).The experimental results show that the proposed spectrum sensing strategy based on reinforcement learning converges and outperforms random sensing strategy in terms of long-term statistics.
文摘The adoption of the Fifth Generation(5G)and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment.Although resource-constrained,the Cognitive Radio(CR)has been identified as a key enabler of distributed 5G and beyond networks due to its cognitive abilities and ability to access idle spectrum opportunistically.Reinforcement learning is well suited to meet the demand for learning in 5G and beyond 5G networks because it does not require the learning agent to have prior information about the environment in which it operates.Intuitively,CRs should be enabled to implement reinforcement learning to efficiently gain opportunistic access to spectrum and co-exist with each other.However,the application of reinforcement learning is straightforward in a single-agent environment and complex and resource intensive in a multi-agent and multi-objective learning environment.In this paper,(1)we present a brief history and overview of reinforcement learning and its limitations;(2)we provide a review of recent multi-agent learning methods proposed and multi-agent learning algorithms applied in Cognitive Radio(CR)networks;and(3)we further present a novel framework for multi-CR reinforcement learning and conclude with a synopsis of future research directions and recommendations.
文摘本文针对能量采集认知机器到机器(Machine-to-Machine,M2M)通信的能量效率问题,在保证服务质量(Quality of Service,QoS)的条件下,提出了一种能效优化算法.以最大化网络中用户能效为目标,综合考虑传输功率控制、时隙分配、传输模式选择、中继选择以及每个设备的能量状态为约束,将优化问题建模为一个混合整数非线性规划问题.将该能效优化问题转化为离散时间有限状态马尔科夫决策过程(Discrete-time and Finite-state Markov Decision Process,DFMDP)进行求解.提出一种基于深度强化学习的算法寻找最优策略.仿真结果表明,所提算法在平均能效方面优于其他方案,且收敛速度在可接受范围内.