The major environmental hazard in this pandemic is the unhygienic dis-posal of medical waste.Medical wastage is not properly managed it will become a hazard to the environment and humans.Managing medical wastage is a ...The major environmental hazard in this pandemic is the unhygienic dis-posal of medical waste.Medical wastage is not properly managed it will become a hazard to the environment and humans.Managing medical wastage is a major issue in the city,municipalities in the aspects of the environment,and logistics.An efficient supply chain with edge computing technology is used in managing medical waste.The supply chain operations include processing of waste collec-tion,transportation,and disposal of waste.Many research works have been applied to improve the management of wastage.The main issues in the existing techniques are ineffective and expensive and centralized edge computing which leads to failure in providing security,trustworthiness,and transparency.To over-come these issues,in this paper we implement an efficient Naive Bayes classifier algorithm and Q-Learning algorithm in decentralized edge computing technology with a binary bat optimization algorithm(NBQ-BBOA).This proposed work is used to track,detect,and manage medical waste.To minimize the transferring cost of medical wastage from various nodes,the Q-Learning algorithm is used.The accuracy obtained for the Naïve Bayes algorithm is 88%,the Q-Learning algo-rithm is 82%and NBQ-BBOA is 98%.The error rate of Root Mean Square Error(RMSE)and Mean Error(MAE)for the proposed work NBQ-BBOA are 0.012 and 0.045.展开更多
In this paper,a novel opportunistic scheduling(OS)scheme with antenna selection(AS)for the energy harvesting(EH)cooperative communication system where the relay can harvest energy from the source transmission is propo...In this paper,a novel opportunistic scheduling(OS)scheme with antenna selection(AS)for the energy harvesting(EH)cooperative communication system where the relay can harvest energy from the source transmission is proposed.In this considered scheme,we take into both traditional mathematical analysis and reinforcement learning(RL)scenarios with the power splitting(PS)factor constraint.For the case of traditional mathematical analysis of a fixed-PS factor,we derive an exact closed-form expressions for the ergodic capacity and outage probability in general signal-to-noise ratio(SNR)regime.Then,we combine the optimal PS factor with performance metrics to achieve the optimal transmission performance.Subsequently,based on the optimized PS factor,a RL technique called as Q-learning(QL)algorithm is proposed to derive the optimal antenna selection strategy.To highlight the performance advantage of the proposed QL with training the received SNR at the destination,we also examine the scenario of QL scheme with training channel between the relay and the destination.The results illustrate that,the optimized scheme is always superior to the fixed-PS factor scheme.In addition,a better system parameter setting with QL significantly outperforms the traditional mathematical analysis scheme.展开更多
Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobil...Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobile robot, this paper proposed a Double BP Q-learning algorithm based on the fusion of Double Q-learning algorithm and BP neural network. In order to solve the dimensional disaster problem, two BP neural network fitting value functions with the same network structure were used to replace the two <i>Q</i> value tables in Double Q-Learning algorithm to solve the problem that the <i>Q</i> value table cannot store excessive state information. By adding the mechanism of priority experience replay and using the parameter transfer to initialize the model parameters in different environments, it could accelerate the convergence rate of the algorithm, improve the learning efficiency and the generalization ability of the model. By designing specific action selection strategy in special environment, the deadlock state could be avoided and the mobile robot could reach the target point. Finally, the designed Double BP Q-learning algorithm was simulated and verified, and the probability of mobile robot reaching the target point in the parameter update process was compared with the Double Q-learning algorithm under the same condition of the planned path length. The results showed that the model trained by the improved Double BP Q-learning algorithm had a higher success rate in finding the optimal or sub-optimal path in the dense discrete environment, besides, it had stronger model generalization ability, fewer redundant sections, and could reach the target point without entering the deadlock zone in the special obstacles environment.展开更多
Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope wit...Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope with extremely complex and dynamic environment due to the huge state space. To reduce the state space, modular neural network Q-learning algorithm is proposed, which combines Q-learning algorithm with neural network and module method. Forward feedback neural network, Elman neural network and radius-basis neural network are separately employed to construct such algorithm. It is revealed that Elman neural network Q-learning algorithm has the best performance under the condition that the same neural network training method, i.e. gradient descent error back-propagation algorithm is applied.展开更多
针对无线传感器网络中存在的安全问题,提出了基于Q-Learning的分簇无线传感网信任管理机制(Q-learning based trust management mechanism for clustered wireless sensor networks,QLTMM-CWSN).该机制主要考虑通信信任、数据信任和能...针对无线传感器网络中存在的安全问题,提出了基于Q-Learning的分簇无线传感网信任管理机制(Q-learning based trust management mechanism for clustered wireless sensor networks,QLTMM-CWSN).该机制主要考虑通信信任、数据信任和能量信任3个方面.在网络运行过程中,基于节点的通信行为、数据分布和能量消耗,使用Q-Learning算法更新节点信任值,并选择簇内信任值最高的节点作为可信簇头节点.当簇中主簇头节点的信任值低于阈值时,可信簇头节点代替主簇头节点管理簇内成员节点,维护正常的数据传输.研究结果表明,QLTMM-CWSN机制能有效抵御通信攻击、伪造本地数据攻击、能量攻击和混合攻击.展开更多
The flow shop scheduling problem is important for the manufacturing industry.Effective flow shop scheduling can bring great benefits to the industry.However,there are few types of research on Distributed Hybrid Flow S...The flow shop scheduling problem is important for the manufacturing industry.Effective flow shop scheduling can bring great benefits to the industry.However,there are few types of research on Distributed Hybrid Flow Shop Problems(DHFSP)by learning assisted meta-heuristics.This work addresses a DHFSP with minimizing the maximum completion time(Makespan).First,a mathematical model is developed for the concerned DHFSP.Second,four Q-learning-assisted meta-heuristics,e.g.,genetic algorithm(GA),artificial bee colony algorithm(ABC),particle swarm optimization(PSO),and differential evolution(DE),are proposed.According to the nature of DHFSP,six local search operations are designed for finding high-quality solutions in local space.Instead of randomselection,Q-learning assists meta-heuristics in choosing the appropriate local search operations during iterations.Finally,based on 60 cases,comprehensive numerical experiments are conducted to assess the effectiveness of the proposed algorithms.The experimental results and discussions prove that using Q-learning to select appropriate local search operations is more effective than the random strategy.To verify the competitiveness of the Q-learning assistedmeta-heuristics,they are compared with the improved iterated greedy algorithm(IIG),which is also for solving DHFSP.The Friedman test is executed on the results by five algorithms.It is concluded that the performance of four Q-learning-assisted meta-heuristics are better than IIG,and the Q-learning-assisted PSO shows the best competitiveness.展开更多
Intelligent traffic control requires accurate estimation of the road states and incorporation of adaptive or dynamically adjusted intelligent algorithms for making the decision.In this article,these issues are handled...Intelligent traffic control requires accurate estimation of the road states and incorporation of adaptive or dynamically adjusted intelligent algorithms for making the decision.In this article,these issues are handled by proposing a novel framework for traffic control using vehicular communications and Internet of Things data.The framework integrates Kalman filtering and Q-learning.Unlike smoothing Kalman filtering,our data fusion Kalman filter incorporates a process-aware model which makes it superior in terms of the prediction error.Unlike traditional Q-learning,our Q-learning algorithm enables adaptive state quantization by changing the threshold of separating low traffic from high traffic on the road according to the maximum number of vehicles in the junction roads.For evaluation,the model has been simulated on a single intersection consisting of four roads:east,west,north,and south.A comparison of the developed adaptive quantized Q-learning(AQQL)framework with state-of-the-art and greedy approaches shows the superiority of AQQL with an improvement percentage in terms of the released number of vehicles of AQQL is 5%over the greedy approach and 340%over the state-of-the-art approach.Hence,AQQL provides an effective traffic control that can be applied in today’s intelligent traffic system.展开更多
The stability problem of power grids has become increasingly serious in recent years as the size of novel power systems increases.In order to improve and ensure the stable operation of the novel power system,this stud...The stability problem of power grids has become increasingly serious in recent years as the size of novel power systems increases.In order to improve and ensure the stable operation of the novel power system,this study proposes an artificial emotional lazy Q-learning method,which combines artificial emotion,lazy learning,and reinforcement learning for static security and stability analysis of power systems.Moreover,this study compares the analysis results of the proposed method with those of the small disturbance method for a stand-alone power system and verifies that the proposed lazy Q-learning method is able to effectively screen useful data for learning,and improve the static security stability of the new type of power system more effectively than the traditional proportional-integral-differential control and Q-learning methods.展开更多
The distributed flexible job shop scheduling problem(DFJSP)has attracted great attention with the growth of the global manufacturing industry.General DFJSP research only considers machine constraints and ignores worke...The distributed flexible job shop scheduling problem(DFJSP)has attracted great attention with the growth of the global manufacturing industry.General DFJSP research only considers machine constraints and ignores worker constraints.As one critical factor of production,effective utilization of worker resources can increase productivity.Meanwhile,energy consumption is a growing concern due to the increasingly serious environmental issues.Therefore,the distributed flexible job shop scheduling problem with dual resource constraints(DFJSP-DRC)for minimizing makespan and total energy consumption is studied in this paper.To solve the problem,we present a multi-objective mathematical model for DFJSP-DRC and propose a Q-learning-based multi-objective grey wolf optimizer(Q-MOGWO).In Q-MOGWO,high-quality initial solutions are generated by a hybrid initialization strategy,and an improved active decoding strategy is designed to obtain the scheduling schemes.To further enhance the local search capability and expand the solution space,two wolf predation strategies and three critical factory neighborhood structures based on Q-learning are proposed.These strategies and structures enable Q-MOGWO to explore the solution space more efficiently and thus find better Pareto solutions.The effectiveness of Q-MOGWO in addressing DFJSP-DRC is verified through comparison with four algorithms using 45 instances.The results reveal that Q-MOGWO outperforms comparison algorithms in terms of solution quality.展开更多
文摘The major environmental hazard in this pandemic is the unhygienic dis-posal of medical waste.Medical wastage is not properly managed it will become a hazard to the environment and humans.Managing medical wastage is a major issue in the city,municipalities in the aspects of the environment,and logistics.An efficient supply chain with edge computing technology is used in managing medical waste.The supply chain operations include processing of waste collec-tion,transportation,and disposal of waste.Many research works have been applied to improve the management of wastage.The main issues in the existing techniques are ineffective and expensive and centralized edge computing which leads to failure in providing security,trustworthiness,and transparency.To over-come these issues,in this paper we implement an efficient Naive Bayes classifier algorithm and Q-Learning algorithm in decentralized edge computing technology with a binary bat optimization algorithm(NBQ-BBOA).This proposed work is used to track,detect,and manage medical waste.To minimize the transferring cost of medical wastage from various nodes,the Q-Learning algorithm is used.The accuracy obtained for the Naïve Bayes algorithm is 88%,the Q-Learning algo-rithm is 82%and NBQ-BBOA is 98%.The error rate of Root Mean Square Error(RMSE)and Mean Error(MAE)for the proposed work NBQ-BBOA are 0.012 and 0.045.
基金supported in part by the National Natural Science Foundation of China under Grant 61720106003,Grant 61401165,Grant 61379006,Grant 61671144,and Grant 61701538in part by the Natural Science Foundation of Fujian Province under Grants 2015J01262+3 种基金in part by Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University under Grant ZQN-PY407in part by Science and Technology Innovation Teams of Henan Province for Colleges and Universities(17IRTSTHN014)in part by the Scientific and Technological Key Project of Henan Province under Grant 172102210080 and Grant 182102210449in part by the Collaborative Innovation Center for Aviation Economy Development of Henan Province。
文摘In this paper,a novel opportunistic scheduling(OS)scheme with antenna selection(AS)for the energy harvesting(EH)cooperative communication system where the relay can harvest energy from the source transmission is proposed.In this considered scheme,we take into both traditional mathematical analysis and reinforcement learning(RL)scenarios with the power splitting(PS)factor constraint.For the case of traditional mathematical analysis of a fixed-PS factor,we derive an exact closed-form expressions for the ergodic capacity and outage probability in general signal-to-noise ratio(SNR)regime.Then,we combine the optimal PS factor with performance metrics to achieve the optimal transmission performance.Subsequently,based on the optimized PS factor,a RL technique called as Q-learning(QL)algorithm is proposed to derive the optimal antenna selection strategy.To highlight the performance advantage of the proposed QL with training the received SNR at the destination,we also examine the scenario of QL scheme with training channel between the relay and the destination.The results illustrate that,the optimized scheme is always superior to the fixed-PS factor scheme.In addition,a better system parameter setting with QL significantly outperforms the traditional mathematical analysis scheme.
文摘Aiming at the dimension disaster problem, poor model generalization ability and deadlock problem in special obstacles environment caused by the increase of state information in the local path planning process of mobile robot, this paper proposed a Double BP Q-learning algorithm based on the fusion of Double Q-learning algorithm and BP neural network. In order to solve the dimensional disaster problem, two BP neural network fitting value functions with the same network structure were used to replace the two <i>Q</i> value tables in Double Q-Learning algorithm to solve the problem that the <i>Q</i> value table cannot store excessive state information. By adding the mechanism of priority experience replay and using the parameter transfer to initialize the model parameters in different environments, it could accelerate the convergence rate of the algorithm, improve the learning efficiency and the generalization ability of the model. By designing specific action selection strategy in special environment, the deadlock state could be avoided and the mobile robot could reach the target point. Finally, the designed Double BP Q-learning algorithm was simulated and verified, and the probability of mobile robot reaching the target point in the parameter update process was compared with the Double Q-learning algorithm under the same condition of the planned path length. The results showed that the model trained by the improved Double BP Q-learning algorithm had a higher success rate in finding the optimal or sub-optimal path in the dense discrete environment, besides, it had stronger model generalization ability, fewer redundant sections, and could reach the target point without entering the deadlock zone in the special obstacles environment.
文摘Reinforcement learning is an excellent approach which is used in artificial intelligence,automatic control, etc. However, ordinary reinforcement learning algorithm, such as Q-learning with lookup table cannot cope with extremely complex and dynamic environment due to the huge state space. To reduce the state space, modular neural network Q-learning algorithm is proposed, which combines Q-learning algorithm with neural network and module method. Forward feedback neural network, Elman neural network and radius-basis neural network are separately employed to construct such algorithm. It is revealed that Elman neural network Q-learning algorithm has the best performance under the condition that the same neural network training method, i.e. gradient descent error back-propagation algorithm is applied.
文摘针对无线传感器网络中存在的安全问题,提出了基于Q-Learning的分簇无线传感网信任管理机制(Q-learning based trust management mechanism for clustered wireless sensor networks,QLTMM-CWSN).该机制主要考虑通信信任、数据信任和能量信任3个方面.在网络运行过程中,基于节点的通信行为、数据分布和能量消耗,使用Q-Learning算法更新节点信任值,并选择簇内信任值最高的节点作为可信簇头节点.当簇中主簇头节点的信任值低于阈值时,可信簇头节点代替主簇头节点管理簇内成员节点,维护正常的数据传输.研究结果表明,QLTMM-CWSN机制能有效抵御通信攻击、伪造本地数据攻击、能量攻击和混合攻击.
基金partially supported by the Guangdong Basic and Applied Basic Research Foundation(2023A1515011531)the National Natural Science Foundation of China under Grant 62173356+2 种基金the Science and Technology Development Fund(FDCT),Macao SAR,under Grant 0019/2021/AZhuhai Industry-University-Research Project with Hongkong and Macao under Grant ZH22017002210014PWCthe Key Technologies for Scheduling and Optimization of Complex Distributed Manufacturing Systems(22JR10KA007).
文摘The flow shop scheduling problem is important for the manufacturing industry.Effective flow shop scheduling can bring great benefits to the industry.However,there are few types of research on Distributed Hybrid Flow Shop Problems(DHFSP)by learning assisted meta-heuristics.This work addresses a DHFSP with minimizing the maximum completion time(Makespan).First,a mathematical model is developed for the concerned DHFSP.Second,four Q-learning-assisted meta-heuristics,e.g.,genetic algorithm(GA),artificial bee colony algorithm(ABC),particle swarm optimization(PSO),and differential evolution(DE),are proposed.According to the nature of DHFSP,six local search operations are designed for finding high-quality solutions in local space.Instead of randomselection,Q-learning assists meta-heuristics in choosing the appropriate local search operations during iterations.Finally,based on 60 cases,comprehensive numerical experiments are conducted to assess the effectiveness of the proposed algorithms.The experimental results and discussions prove that using Q-learning to select appropriate local search operations is more effective than the random strategy.To verify the competitiveness of the Q-learning assistedmeta-heuristics,they are compared with the improved iterated greedy algorithm(IIG),which is also for solving DHFSP.The Friedman test is executed on the results by five algorithms.It is concluded that the performance of four Q-learning-assisted meta-heuristics are better than IIG,and the Q-learning-assisted PSO shows the best competitiveness.
文摘Intelligent traffic control requires accurate estimation of the road states and incorporation of adaptive or dynamically adjusted intelligent algorithms for making the decision.In this article,these issues are handled by proposing a novel framework for traffic control using vehicular communications and Internet of Things data.The framework integrates Kalman filtering and Q-learning.Unlike smoothing Kalman filtering,our data fusion Kalman filter incorporates a process-aware model which makes it superior in terms of the prediction error.Unlike traditional Q-learning,our Q-learning algorithm enables adaptive state quantization by changing the threshold of separating low traffic from high traffic on the road according to the maximum number of vehicles in the junction roads.For evaluation,the model has been simulated on a single intersection consisting of four roads:east,west,north,and south.A comparison of the developed adaptive quantized Q-learning(AQQL)framework with state-of-the-art and greedy approaches shows the superiority of AQQL with an improvement percentage in terms of the released number of vehicles of AQQL is 5%over the greedy approach and 340%over the state-of-the-art approach.Hence,AQQL provides an effective traffic control that can be applied in today’s intelligent traffic system.
基金the Technology Project of China Southern Power Grid Digital Grid Research Institute Corporation,Ltd.(670000KK52220003)the National Key R&D Program of China(2020YFB0906000).
文摘The stability problem of power grids has become increasingly serious in recent years as the size of novel power systems increases.In order to improve and ensure the stable operation of the novel power system,this study proposes an artificial emotional lazy Q-learning method,which combines artificial emotion,lazy learning,and reinforcement learning for static security and stability analysis of power systems.Moreover,this study compares the analysis results of the proposed method with those of the small disturbance method for a stand-alone power system and verifies that the proposed lazy Q-learning method is able to effectively screen useful data for learning,and improve the static security stability of the new type of power system more effectively than the traditional proportional-integral-differential control and Q-learning methods.
基金supported by the Natural Science Foundation of Anhui Province(Grant Number 2208085MG181)the Science Research Project of Higher Education Institutions in Anhui Province,Philosophy and Social Sciences(Grant Number 2023AH051063)the Open Fund of Key Laboratory of Anhui Higher Education Institutes(Grant Number CS2021-ZD01).
文摘The distributed flexible job shop scheduling problem(DFJSP)has attracted great attention with the growth of the global manufacturing industry.General DFJSP research only considers machine constraints and ignores worker constraints.As one critical factor of production,effective utilization of worker resources can increase productivity.Meanwhile,energy consumption is a growing concern due to the increasingly serious environmental issues.Therefore,the distributed flexible job shop scheduling problem with dual resource constraints(DFJSP-DRC)for minimizing makespan and total energy consumption is studied in this paper.To solve the problem,we present a multi-objective mathematical model for DFJSP-DRC and propose a Q-learning-based multi-objective grey wolf optimizer(Q-MOGWO).In Q-MOGWO,high-quality initial solutions are generated by a hybrid initialization strategy,and an improved active decoding strategy is designed to obtain the scheduling schemes.To further enhance the local search capability and expand the solution space,two wolf predation strategies and three critical factory neighborhood structures based on Q-learning are proposed.These strategies and structures enable Q-MOGWO to explore the solution space more efficiently and thus find better Pareto solutions.The effectiveness of Q-MOGWO in addressing DFJSP-DRC is verified through comparison with four algorithms using 45 instances.The results reveal that Q-MOGWO outperforms comparison algorithms in terms of solution quality.