为了优化区域交通信号配时方案,提升区域通行效率,文章提出一种基于改进多智能体Nash Q Learning的区域交通信号协调控制方法。首先,采用离散化编码方法,通过划分单元格将连续状态信息转化为离散形式。其次,在算法中融入长短时记忆网络(...为了优化区域交通信号配时方案,提升区域通行效率,文章提出一种基于改进多智能体Nash Q Learning的区域交通信号协调控制方法。首先,采用离散化编码方法,通过划分单元格将连续状态信息转化为离散形式。其次,在算法中融入长短时记忆网络(Long Short Term Memory,LSTM)模块,用于从状态数据中挖掘更多的隐藏信息,丰富Q值表中的状态数据。最后,基于微观交通仿真软件SUMO(Simulation of Urban Mobility)的仿真测试结果表明,相较于原始Nash Q Learning交通信号控制方法,所提方法在低、中、高流量下车辆的平均等待时间分别减少了11.5%、16.2%和10.0%,平均排队长度分别减少了9.1%、8.2%和7.6%,平均停车次数分别减少了18.3%、16.1%和10.0%。结果证明了该算法具有更好的控制效果。展开更多
The problem of passive detection discussed in this paper involves searching and locating an aerial emitter by dualaircraft using passive radars. In order to improve the detection probability and accuracy, a fuzzy Q le...The problem of passive detection discussed in this paper involves searching and locating an aerial emitter by dualaircraft using passive radars. In order to improve the detection probability and accuracy, a fuzzy Q learning algorithrn for dual-aircraft flight path planning is proposed. The passive detection task model of the dual-aircraft is set up based on the partition of the target active radar's radiation area. The problem is formulated as a Markov decision process (MDP) by using the fuzzy theory to make a generalization of the state space and defining the transition functions, action space and reward function properly. Details of the path planning algorithm are presented. Simulation results indicate that the algorithm can provide adaptive strategies for dual-aircraft to control their flight paths to detect a non-maneuvering or maneu- vering target.展开更多
Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning method...Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning methods have become quite popular in analyzing wireless communication systems,which among them deep reinforcement learning(DRL)has a significant role in solving optimization issues under certain constraints.To this purpose,in this paper,we investigate the PA problem in a k-user multiple access channels(MAC),where k transmitters(e.g.,mobile users)aim to send an independent message to a common receiver(e.g.,base station)through wireless channels.To this end,we first train the deep Q network(DQN)with a deep Q learning(DQL)algorithm over the simulation environment,utilizing offline learning.Then,the DQN will be used with the real data in the online training method for the PA issue by maximizing the sumrate subjected to the source power.Finally,the simulation results indicate that our proposedDQNmethod provides better performance in terms of the sumrate compared with the available DQL training approaches such as fractional programming(FP)and weighted minimum mean squared error(WMMSE).Additionally,by considering different user densities,we show that our proposed DQN outperforms benchmark algorithms,thereby,a good generalization ability is verified over wireless multi-user communication systems.展开更多
Pesticides have become more necessary in modern agricultural production.However,these pesticides have an unforeseeable long-term impact on people's wellbeing as well as the ecosystem.Due to a shortage of basic pes...Pesticides have become more necessary in modern agricultural production.However,these pesticides have an unforeseeable long-term impact on people's wellbeing as well as the ecosystem.Due to a shortage of basic pesticide exposure awareness,farmers typically utilize pesticides extremely close to harvesting.Pesticide residues within foods,particularly fruits as well as veggies,are a significant issue among farmers,merchants,and particularly consumers.The residual concentrations were far lower than these maximal allowable limits,with only a few surpassing the restrictions for such pesticides in food.There is an obligation to provide a warning about this amount of pesticide use in farming.Previous technologies failed to forecast the large number of pesticides that were dangerous to people,necessitating the development of improved detection and early warning systems.A novel methodology for verifying the status and evaluating the level of pesticides in regularly consumed veggies as well as fruits has been identified,named as the Hybrid Chronic Multi-Residual Framework(HCMF),in which the harmful level of used pesticide residues has been predicted for contamination in agro products using Q-Learning based Recurrent Neural Network and the predicted contamination levels have been analyzed using Complex Event Processing(CEP)by processing given spatial and sequential data.The analysis results are used to minimize and effectively use pesticides in the agricultural field and also ensure the safety of farmers and consumers.Overall,the technique is carried out in a Python environment,with the results showing that the proposed model has a 98.57%accuracy and a training loss of 0.30.展开更多
CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferrin...CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.展开更多
This paper researches robot soccer action selection based on Q learning .The robot learn to activate particular behavior given their current situation and reward signal. We adopt neural network to implementations ...This paper researches robot soccer action selection based on Q learning .The robot learn to activate particular behavior given their current situation and reward signal. We adopt neural network to implementations of Q learning for their generalization properties and limited computer memory requirements.展开更多
日益频繁的鸟类活动给输电线路的安全运行带来了极大威胁,而现有拟声驱鸟装置由于缺乏智能性,无法长期有效驱鸟.为了解决上述问题,本文提出基于改进Q⁃learning算法的拟声驱鸟策略.首先,为了评价各音频的驱鸟效果,结合模糊理论,将鸟类听...日益频繁的鸟类活动给输电线路的安全运行带来了极大威胁,而现有拟声驱鸟装置由于缺乏智能性,无法长期有效驱鸟.为了解决上述问题,本文提出基于改进Q⁃learning算法的拟声驱鸟策略.首先,为了评价各音频的驱鸟效果,结合模糊理论,将鸟类听到音频后的动作行为量化为不同鸟类反应类型.然后,设计单一音频驱鸟实验,统计各音频驱鸟效果数据,得到各音频的初始权重值,为拟声驱鸟装置的音频选择提供实验依据.为了使计算所得的音频权重值更符合实际实验情况,对CRITIC(Criteria Impor⁃tance Though Intercrieria Correlation)方法的权重计算公式进行了优化.最后,使用实验所得音频权重值对Q⁃learning算法进行改进,并设计与其他拟声驱鸟策略的对比实验,实验数据显示改进Q⁃learning算法的拟声驱鸟策略驱鸟效果优于其他三种驱鸟策略,收敛速度快,驱鸟效果稳定,能够降低鸟类的适应性.展开更多
For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with c...For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.展开更多
Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the ...Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the navigation test of the self-propelled model, the complex environment including various port facilities, navigation facilities, and the ships nearby must be considered carefully, because in this dense environment the impact of sea waves and winds on the model is particularly significant. In order to improve the security of the self-propelled model, this paper introduces the Q learning based on reinforcement learning combined with chaotic ideas for the model's collision avoidance, in order to improve the reliability of the local path planning. Simulation and sea test results show that this algorithm is a better solution for collision avoidance of the self navigation model under the interference of sea winds and waves with good adaptability.展开更多
Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive scheme...Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.展开更多
文摘为了优化区域交通信号配时方案,提升区域通行效率,文章提出一种基于改进多智能体Nash Q Learning的区域交通信号协调控制方法。首先,采用离散化编码方法,通过划分单元格将连续状态信息转化为离散形式。其次,在算法中融入长短时记忆网络(Long Short Term Memory,LSTM)模块,用于从状态数据中挖掘更多的隐藏信息,丰富Q值表中的状态数据。最后,基于微观交通仿真软件SUMO(Simulation of Urban Mobility)的仿真测试结果表明,相较于原始Nash Q Learning交通信号控制方法,所提方法在低、中、高流量下车辆的平均等待时间分别减少了11.5%、16.2%和10.0%,平均排队长度分别减少了9.1%、8.2%和7.6%,平均停车次数分别减少了18.3%、16.1%和10.0%。结果证明了该算法具有更好的控制效果。
基金supported by the National Natural Science Foundation of China(60874040)
文摘The problem of passive detection discussed in this paper involves searching and locating an aerial emitter by dualaircraft using passive radars. In order to improve the detection probability and accuracy, a fuzzy Q learning algorithrn for dual-aircraft flight path planning is proposed. The passive detection task model of the dual-aircraft is set up based on the partition of the target active radar's radiation area. The problem is formulated as a Markov decision process (MDP) by using the fuzzy theory to make a generalization of the state space and defining the transition functions, action space and reward function properly. Details of the path planning algorithm are presented. Simulation results indicate that the algorithm can provide adaptive strategies for dual-aircraft to control their flight paths to detect a non-maneuvering or maneu- vering target.
文摘Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning methods have become quite popular in analyzing wireless communication systems,which among them deep reinforcement learning(DRL)has a significant role in solving optimization issues under certain constraints.To this purpose,in this paper,we investigate the PA problem in a k-user multiple access channels(MAC),where k transmitters(e.g.,mobile users)aim to send an independent message to a common receiver(e.g.,base station)through wireless channels.To this end,we first train the deep Q network(DQN)with a deep Q learning(DQL)algorithm over the simulation environment,utilizing offline learning.Then,the DQN will be used with the real data in the online training method for the PA issue by maximizing the sumrate subjected to the source power.Finally,the simulation results indicate that our proposedDQNmethod provides better performance in terms of the sumrate compared with the available DQL training approaches such as fractional programming(FP)and weighted minimum mean squared error(WMMSE).Additionally,by considering different user densities,we show that our proposed DQN outperforms benchmark algorithms,thereby,a good generalization ability is verified over wireless multi-user communication systems.
文摘Pesticides have become more necessary in modern agricultural production.However,these pesticides have an unforeseeable long-term impact on people's wellbeing as well as the ecosystem.Due to a shortage of basic pesticide exposure awareness,farmers typically utilize pesticides extremely close to harvesting.Pesticide residues within foods,particularly fruits as well as veggies,are a significant issue among farmers,merchants,and particularly consumers.The residual concentrations were far lower than these maximal allowable limits,with only a few surpassing the restrictions for such pesticides in food.There is an obligation to provide a warning about this amount of pesticide use in farming.Previous technologies failed to forecast the large number of pesticides that were dangerous to people,necessitating the development of improved detection and early warning systems.A novel methodology for verifying the status and evaluating the level of pesticides in regularly consumed veggies as well as fruits has been identified,named as the Hybrid Chronic Multi-Residual Framework(HCMF),in which the harmful level of used pesticide residues has been predicted for contamination in agro products using Q-Learning based Recurrent Neural Network and the predicted contamination levels have been analyzed using Complex Event Processing(CEP)by processing given spatial and sequential data.The analysis results are used to minimize and effectively use pesticides in the agricultural field and also ensure the safety of farmers and consumers.Overall,the technique is carried out in a Python environment,with the results showing that the proposed model has a 98.57%accuracy and a training loss of 0.30.
文摘CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.
文摘This paper researches robot soccer action selection based on Q learning .The robot learn to activate particular behavior given their current situation and reward signal. We adopt neural network to implementations of Q learning for their generalization properties and limited computer memory requirements.
文摘日益频繁的鸟类活动给输电线路的安全运行带来了极大威胁,而现有拟声驱鸟装置由于缺乏智能性,无法长期有效驱鸟.为了解决上述问题,本文提出基于改进Q⁃learning算法的拟声驱鸟策略.首先,为了评价各音频的驱鸟效果,结合模糊理论,将鸟类听到音频后的动作行为量化为不同鸟类反应类型.然后,设计单一音频驱鸟实验,统计各音频驱鸟效果数据,得到各音频的初始权重值,为拟声驱鸟装置的音频选择提供实验依据.为了使计算所得的音频权重值更符合实际实验情况,对CRITIC(Criteria Impor⁃tance Though Intercrieria Correlation)方法的权重计算公式进行了优化.最后,使用实验所得音频权重值对Q⁃learning算法进行改进,并设计与其他拟声驱鸟策略的对比实验,实验数据显示改进Q⁃learning算法的拟声驱鸟策略驱鸟效果优于其他三种驱鸟策略,收敛速度快,驱鸟效果稳定,能够降低鸟类的适应性.
基金supported by the National Natural Science Foundation of China (61070143 61173088)
文摘For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.
基金Foundation item: Supported by the National Natural Science Foundation of China under Grant No.61100005.
文摘Conducting hydrodynamic and physical motion simulation tests using a large-scale self-propelled model under actual wave conditions is an important means for researching environmental adaptability of ships. During the navigation test of the self-propelled model, the complex environment including various port facilities, navigation facilities, and the ships nearby must be considered carefully, because in this dense environment the impact of sea waves and winds on the model is particularly significant. In order to improve the security of the self-propelled model, this paper introduces the Q learning based on reinforcement learning combined with chaotic ideas for the model's collision avoidance, in order to improve the reliability of the local path planning. Simulation and sea test results show that this algorithm is a better solution for collision avoidance of the self navigation model under the interference of sea winds and waves with good adaptability.
文摘Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.