In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforce...In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN.展开更多
近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其...近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其在复杂交通场景中性能不佳。针对上述问题,基于Double DQN提出一种融合特殊车辆优先通行的双模式多目标信号配时方法(Dual-mode Multi-objective signal timing method based on Double DQN,DMDD),以提高不同交通场景下路口的通行效率。该方法首先基于路口的饱和状态选择信号控制模式,特殊车辆在紧急控制模式下被赋予更高的通行权重,有利于其更快通过路口;接着针对等待时长、队列长度和CO 2排放量3个指标分别设计神经网络进行奖励计算;最后利用Double DQN进行最优信号相位的选择,通过灵活切换信号相位以提升通行效率。基于SUMO的实验结果表明,DMDD与对比方法相比能有效缩短路口处特殊车辆的等待时长、队列长度和CO 2排放量,特殊车辆能够更快通过路口,有效地提高了通行效率。展开更多
Multi-energy microgrids(MEMG)play an important role in promoting carbon neutrality and achieving sustainable development.This study investigates an effective energy management strategy(EMS)for MEMG.First,an energy man...Multi-energy microgrids(MEMG)play an important role in promoting carbon neutrality and achieving sustainable development.This study investigates an effective energy management strategy(EMS)for MEMG.First,an energy management system model that allows for intra-microgrid energy conversion is developed,and the corresponding Markov decision process(MDP)problem is formulated.Subsequently,an improved double deep Q network(iDDQN)algorithm is proposed to enhance the exploration ability by modifying the calculation of the Q value,and a prioritized experience replay(PER)is introduced into the iDDQN to improve the training speed and effectiveness.Finally,taking advantage of the federated learning(FL)and iDDQN algorithms,a federated iDDQN is proposed to design an MEMG energy management strategy to enable each microgrid to share its experiences in the form of local neural network(NN)parameters with the federation layer,thus ensuring the privacy and security of data.The simulation results validate the superior performance of the proposed energy management strategy in minimizing the economic costs of the MEMG while reducing CO_2 emissions and protecting data privacy.展开更多
针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然...针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。展开更多
In this paper,a dual deep Q-network(DDQN)energy management model based on long-short memory neural network(LSTM)speed prediction is proposed under the model predictive control(MPC)framework.The initial learning rate a...In this paper,a dual deep Q-network(DDQN)energy management model based on long-short memory neural network(LSTM)speed prediction is proposed under the model predictive control(MPC)framework.The initial learning rate and neuron dropout probability of the LSTM speed prediction model are optimized by the genetic algorithm(GA).The prediction results show that the root-mean-square error of the GA-LSTM speed prediction method is smaller than the SVR method in different speed prediction horizons.The predicted demand power,the state of charge(SOC),and the demand power at the current moment are used as the state input of the agent,and the real-time control of the control strategy is realized by the MPC method.The simulation results show that the proposed control strategy reduces the equivalent fuel consumption by 0.0354 kg compared with DDQN,0.8439 kg compared with ECMS,and 0.742 kg compared with the power-following control strategy.The difference between the proposed control strategy and the dynamic planning control strategy is only 0.0048 kg,0.193%,while the SOC of the power battery remains stable.Finally,the hardware-in-the-loop simulation verifies that the proposed control strategy has good real-time performance.展开更多
多年来深度强化学习算法与智能交通系统结合的方法在交通信号控制领域取得了突出成效。然而,仅依靠深度强化学习算法仍然无法弥补卷积神经网络提取特征的缺陷,从而影响智能体的整体策略输出。针对现存的特征提取问题,在深度双Q网络(doub...多年来深度强化学习算法与智能交通系统结合的方法在交通信号控制领域取得了突出成效。然而,仅依靠深度强化学习算法仍然无法弥补卷积神经网络提取特征的缺陷,从而影响智能体的整体策略输出。针对现存的特征提取问题,在深度双Q网络(double deep Q network,double DQN)模型基础上提出了一种基于注意力机制的深度强化学习模型进行交通信号控制。将压缩激活网络(squeeze and excitation networks,SENet)注意力机制添加到三维卷积神经网络中,通过建模特征图通道间的相互依赖来增强卷积神经网络的表征质量,从而输出最优的交通信号控制动作。实验结果表明,算法表现出了良好的交通信号控制效果,且具有显著的稳定性。展开更多
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
基金the Liaoning Province Applied Basic Research Program,2023JH2/101600038.
文摘In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN.
文摘近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其在复杂交通场景中性能不佳。针对上述问题,基于Double DQN提出一种融合特殊车辆优先通行的双模式多目标信号配时方法(Dual-mode Multi-objective signal timing method based on Double DQN,DMDD),以提高不同交通场景下路口的通行效率。该方法首先基于路口的饱和状态选择信号控制模式,特殊车辆在紧急控制模式下被赋予更高的通行权重,有利于其更快通过路口;接着针对等待时长、队列长度和CO 2排放量3个指标分别设计神经网络进行奖励计算;最后利用Double DQN进行最优信号相位的选择,通过灵活切换信号相位以提升通行效率。基于SUMO的实验结果表明,DMDD与对比方法相比能有效缩短路口处特殊车辆的等待时长、队列长度和CO 2排放量,特殊车辆能够更快通过路口,有效地提高了通行效率。
基金supported by the Research and Development of Key Technologies of the Regional Energy Internet based on Multi-Energy Complementary and Collaborative Optimization(BE2020081)。
文摘Multi-energy microgrids(MEMG)play an important role in promoting carbon neutrality and achieving sustainable development.This study investigates an effective energy management strategy(EMS)for MEMG.First,an energy management system model that allows for intra-microgrid energy conversion is developed,and the corresponding Markov decision process(MDP)problem is formulated.Subsequently,an improved double deep Q network(iDDQN)algorithm is proposed to enhance the exploration ability by modifying the calculation of the Q value,and a prioritized experience replay(PER)is introduced into the iDDQN to improve the training speed and effectiveness.Finally,taking advantage of the federated learning(FL)and iDDQN algorithms,a federated iDDQN is proposed to design an MEMG energy management strategy to enable each microgrid to share its experiences in the form of local neural network(NN)parameters with the federation layer,thus ensuring the privacy and security of data.The simulation results validate the superior performance of the proposed energy management strategy in minimizing the economic costs of the MEMG while reducing CO_2 emissions and protecting data privacy.
文摘针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。
基金supported by the National Natural Science Foundation of China(No.52175236)Research and development and demonstration application of heavy-duty diesel vehicle exhaust emission testing technology,China(No.24-8-cspz-18-nsh)Qingdao Civi Science and Technology Plan,China(No.19-6-1-88-nsh).
文摘In this paper,a dual deep Q-network(DDQN)energy management model based on long-short memory neural network(LSTM)speed prediction is proposed under the model predictive control(MPC)framework.The initial learning rate and neuron dropout probability of the LSTM speed prediction model are optimized by the genetic algorithm(GA).The prediction results show that the root-mean-square error of the GA-LSTM speed prediction method is smaller than the SVR method in different speed prediction horizons.The predicted demand power,the state of charge(SOC),and the demand power at the current moment are used as the state input of the agent,and the real-time control of the control strategy is realized by the MPC method.The simulation results show that the proposed control strategy reduces the equivalent fuel consumption by 0.0354 kg compared with DDQN,0.8439 kg compared with ECMS,and 0.742 kg compared with the power-following control strategy.The difference between the proposed control strategy and the dynamic planning control strategy is only 0.0048 kg,0.193%,while the SOC of the power battery remains stable.Finally,the hardware-in-the-loop simulation verifies that the proposed control strategy has good real-time performance.
文摘多年来深度强化学习算法与智能交通系统结合的方法在交通信号控制领域取得了突出成效。然而,仅依靠深度强化学习算法仍然无法弥补卷积神经网络提取特征的缺陷,从而影响智能体的整体策略输出。针对现存的特征提取问题,在深度双Q网络(double deep Q network,double DQN)模型基础上提出了一种基于注意力机制的深度强化学习模型进行交通信号控制。将压缩激活网络(squeeze and excitation networks,SENet)注意力机制添加到三维卷积神经网络中,通过建模特征图通道间的相互依赖来增强卷积神经网络的表征质量,从而输出最优的交通信号控制动作。实验结果表明,算法表现出了良好的交通信号控制效果,且具有显著的稳定性。
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.