Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o...Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies.展开更多
To solve the sparse reward problem of job-shop scheduling by deep reinforcement learning,a deep reinforcement learning framework considering sparse reward problem is proposed.The job shop scheduling problem is transfo...To solve the sparse reward problem of job-shop scheduling by deep reinforcement learning,a deep reinforcement learning framework considering sparse reward problem is proposed.The job shop scheduling problem is transformed into Markov decision process,and six state features are designed to improve the state feature representation by using two-way scheduling method,including four state features that distinguish the optimal action and two state features that are related to the learning goal.An extended variant of graph isomorphic network GIN++is used to encode disjunction graphs to improve the performance and generalization ability of the model.Through iterative greedy algorithm,random strategy is generated as the initial strategy,and the action with the maximum information gain is selected to expand it to optimize the exploration ability of Actor-Critic algorithm.Through validation of the trained policy model on multiple public test data sets and comparison with other advanced DRL methods and scheduling rules,the proposed method reduces the minimum average gap by 3.49%,5.31%and 4.16%,respectively,compared with the priority rule-based method,and 5.34%compared with the learning-based method.11.97%and 5.02%,effectively improving the accuracy of DRL to solve the approximate solution of JSSP minimum completion time.展开更多
Industry 4.0 production environments and smart manufacturing systems integrate both the physical and decision-making aspects of manufacturing operations into autonomous and decentralized systems.One of the key aspects...Industry 4.0 production environments and smart manufacturing systems integrate both the physical and decision-making aspects of manufacturing operations into autonomous and decentralized systems.One of the key aspects of these systems is a production planning,specifically,Scheduling operations on the machines.To cope with this problem,this paper proposed a Deep Reinforcement Learning with an Actor-Critic algorithm(DRLAC).We model the Job-Shop Scheduling Problem(JSSP)as a Markov Decision Process(MDP),represent the state of a JSSP as simple Graph Isomorphism Networks(GIN)to extract nodes features during scheduling,and derive the policy of optimal scheduling which guides the included node features to the best next action of schedule.In addition,we adopt the Actor-Critic(AC)network’s training algorithm-based reinforcement learning for achieving the optimal policy of the scheduling.To prove the proposed model’s effectiveness,first,we will present a case study that illustrated a conflict between two job scheduling,secondly,we will apply the proposed model to a known benchmark dataset and compare the results with the traditional scheduling methods and trending approaches.The numerical results indicate that the proposed model can be adaptive with real-time production scheduling,where the average percentage deviation(APD)of our model achieved values between 0.009 and 0.21 comparedwith heuristic methods and values between 0.014 and 0.18 compared with other trending approaches.展开更多
Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal s...Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm.展开更多
The dynamicity of available resources and net- work conditions, such as channel capacity and traffic charac- teristics, have posed major challenges to scheduling in wire- less networks. Reinforcement learning (RL) e...The dynamicity of available resources and net- work conditions, such as channel capacity and traffic charac- teristics, have posed major challenges to scheduling in wire- less networks. Reinforcement learning (RL) enables wire- less nodes to observe their respective operating environment, learn, and make optimal or near-optimal scheduling deci- sions. Learning, which is the main intrinsic characteristic of RL, enables wireless nodes to adapt to most forms of dynamicity in the operating environment as time goes by. This paper presents an extensive review on the application of the traditional and enhanced RL approaches to various types of scheduling schemes, namely packet, sleep-wake and task schedulers, in wireless networks, as well as the advantages and performance enhancements brought about by RL. Addi- tionally, it presents how various challenges associated with scheduling schemes have been approached using RL. Finally, we discuss various open issues related to RL-based schedul- ing schemes in wireless networks in order to explore new re- search directions in this area. Discussions in this paper are presented in a tutorial manner in order to establish a founda- tion for further research in this field.展开更多
Due to the increasing need for massive data analysis and machine learning model training at the network edge, as well as the rising concerns about data privacy, a new distrib?uted training framework called federated l...Due to the increasing need for massive data analysis and machine learning model training at the network edge, as well as the rising concerns about data privacy, a new distrib?uted training framework called federated learning (FL) has emerged and attracted much at?tention from both academia and industry. In FL, participating devices iteratively update the local models based on their own data and contribute to the global training by uploading mod?el updates until the training converges. Therefore, the computation capabilities of mobile de?vices can be utilized and the data privacy can be preserved. However, deploying FL in re?source-constrained wireless networks encounters several challenges, including the limited energy of mobile devices, weak onboard computing capability, and scarce wireless band?width. To address these challenges, recent solutions have been proposed to maximize the convergence rate or minimize the energy consumption under heterogeneous constraints. In this overview, we first introduce the backgrounds and fundamentals of FL. Then, the key challenges in deploying FL in wireless networks are discussed, and several existing solu?tions are reviewed. Finally, we highlight the open issues and future research directions in FL scheduling.展开更多
With the booming of electric vehicles(EVs) across the world, their increasing charging demands pose challenges to urban distribution networks. Particularly, due to the further implementation of time-of-use prices, the...With the booming of electric vehicles(EVs) across the world, their increasing charging demands pose challenges to urban distribution networks. Particularly, due to the further implementation of time-of-use prices, the charging behaviors of household EVs are concentrated on low-cost periods, thus generating new load peaks and affecting the secure operation of the medium-and low-voltage grids. This problem is particularly acute in many old communities with relatively poor electricity infrastructure. In this paper, a novel two-stage charging scheduling scheme based on deep reinforcement learning is proposed to improve the power quality and achieve optimal charging scheduling of household EVs simultaneously in active distribution network(ADN) during valley period. In the first stage, the optimal charging profiles of charging stations are determined by solving the optimal power flow with the objective of eliminating peak-valley load differences. In the second stage, an intelligent agent based on proximal policy optimization algorithm is developed to dispatch the household EVs sequentially within the low-cost period considering their discrete nature of arrival. Through powerful approximation of neural network, the challenge of imperfect knowledge is tackled effectively during the charging scheduling process. Finally, numerical results demonstrate that the proposed scheme exhibits great improvement in relieving peak-valley differences as well as improving voltage quality in the ADN.展开更多
Machine Learning concepts have raised executions in all knowledge domains,including the Internet of Thing(IoT)and several business domains.Quality of Service(QoS)has become an important problem in IoT surrounding sinc...Machine Learning concepts have raised executions in all knowledge domains,including the Internet of Thing(IoT)and several business domains.Quality of Service(QoS)has become an important problem in IoT surrounding since there is a vast explosion of connecting sensors,information and usage.Sen-sor data gathering is an efficient solution to collect information from spatially dis-seminated IoT nodes.Reinforcement Learning Mechanism to improve the QoS(RLMQ)and use a Mobile Sink(MS)to minimize the delay in the wireless IoT s proposed in this paper.Here,we use machine learning concepts like Rein-forcement Learning(RL)to improve the QoS and energy efficiency in the Wire-less Sensor Network(WSN).The MS collects the data from the Cluster Head(CH),and the RL incentive values select CH.The incentives value is computed by the QoS parameters such as minimum energy utilization,minimum bandwidth utilization,minimum hop count,and minimum time delay.The MS is used to col-lect the data from CH,thus minimizing the network delay.The sleep and awake scheduling is used for minimizing the CH dead in the WSN.This work is simu-lated,and the results show that the RLMQ scheme performs better than the base-line protocol.Results prove that RLMQ increased the residual energy,throughput and minimized the network delay in the WSN.展开更多
Edge artificial intelligence will empower the ever simple industrial wireless networks(IWNs)supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machin...Edge artificial intelligence will empower the ever simple industrial wireless networks(IWNs)supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices(MTDs)and edge servers.In this paper,we propose a multi-agent deep reinforcement learning based resource allocation(MADRL-RA)algorithm for end-edge orchestrated IWNs to support computation-intensive and delay-sensitive applications.First,we present the system model of IWNs,wherein each MTD is regarded as a self-learning agent.Then,we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption.Next,we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision,computation capacity,and transmission power.To break the time correlation of training data while accelerating the learning process of MADRL-RA,we design a weighted experience replay to store and sample experiences categorically.Furthermore,we propose a step-by-stepε-greedy method to balance exploitation and exploration.Finally,we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments,showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.展开更多
Deploying service nodes hierarchically at the edge of the network can effectively improve the service quality of offloaded task requests and increase the utilization of resources.In this paper,we study the task schedu...Deploying service nodes hierarchically at the edge of the network can effectively improve the service quality of offloaded task requests and increase the utilization of resources.In this paper,we study the task scheduling problem in the hierarchically deployed edge cloud.We first formulate the minimization of the service time of scheduled tasks in edge cloud as a combinatorial optimization problem,blue and then prove the NP-hardness of the problem.Different from the existing work that mostly designs heuristic approximation-based algorithms or policies to make scheduling decision,we propose a newly designed scheduling policy,named Joint Neural Network and Heuristic Scheduling(JNNHSP),which combines a neural network-based method with a heuristic based solution.JNNHSP takes the Sequence-to-Sequence(Seq2Seq)model trained by Reinforcement Learning(RL)as the primary policy and adopts the heuristic algorithm as the auxiliary policy to obtain the scheduling solution,thereby achieving a good balance between the quality and the efficiency of the scheduling solution.In-depth experiments show that compared with a variety of related policies and optimization solvers,JNNHSP can achieve better performance in terms of scheduling error ratio,the degree to which the policy is affected by re-sources limitations,average service latency,and execution efficiency in a typical hierarchical edge cloud.展开更多
In the ‘‘Internet Plus" era, space-based information services require effective and fast image satellite scheduling. Most existing studies consider image satellite scheduling to be an optimization problem to so...In the ‘‘Internet Plus" era, space-based information services require effective and fast image satellite scheduling. Most existing studies consider image satellite scheduling to be an optimization problem to solve with searching algorithms in a batch-wise manner. No real-time speed method for satellite scheduling exists. In this paper, with the idea of building a real-time speed method, satellite scheduling is remodeled based on a Dynamic and Stochastic Knapsack Problem(DSKP), and the objective is to maximize the total expected profit. No existing algorithm could be able to solve this novel scheduling problem properly. With inspiration from the recent achievements in Deep Reinforcement Learning(DRL) in video games, AlphaGo and dynamic controlling,a novel DRL-based method is applied to training a neural network to schedule tasks. The numerical results show that the method proposed in this paper can achieve relatively good performance with real-time speed and immediate respond style.展开更多
Heterogeneous base station deployment enables to provide high capacity and wide area coverage.Network slicing makes it possible to allocate wireless resource for heterogeneous services on demand.These two promising te...Heterogeneous base station deployment enables to provide high capacity and wide area coverage.Network slicing makes it possible to allocate wireless resource for heterogeneous services on demand.These two promising technologies contribute to the unprecedented service in 5G.We establish a multiservice heterogeneous network model,which aims to raise the transmission rate under the delay constraints for active control terminals,and optimize the energy efficiency for passive network terminals.A policygradient-based deep reinforcement learning algorithm is proposed to make decisions on user association and power control in the continuous action space.Simulation results indicate the good convergence of the algorithm,and higher reward is obtained compared with other baselines.展开更多
For sudden drinking water pollution event,reasonable opening or closing valves and hydrants in a water distribution network(WDN),which ensures the isolation and discharge of contaminant as soon as possible,is consider...For sudden drinking water pollution event,reasonable opening or closing valves and hydrants in a water distribution network(WDN),which ensures the isolation and discharge of contaminant as soon as possible,is considered as an effective emergency measure.In this paper,we propose an emergency scheduling algorithm based on evolutionary reinforcement learning(ERL),which can train a good scheduling policy by the combination of the evolutionary computation(EC)and reinforcement learning(RL).Then,the optimal scheduling policy can guide the operation of valves and hydrants in real time based on sensor information,and protect people from the risk of contaminated water.Experiments verify our algorithm can achieve good results and effectively reduce the impact of pollution events.展开更多
The Internet of things(IoT)is a wireless network designed to perform specific tasks and plays a crucial role in various fields such as environmental monitoring,surveillance,and healthcare.To address the limitations im...The Internet of things(IoT)is a wireless network designed to perform specific tasks and plays a crucial role in various fields such as environmental monitoring,surveillance,and healthcare.To address the limitations imposed by inadequate resources,energy,and network scalability,this type of network relies heavily on data aggregation and clustering algorithms.Although various conventional studies have aimed to enhance the lifespan of a network through robust systems,they do not always provide optimal efficiency for real-time applications.This paper presents an approach based on state-of-the-art machine-learning methods.In this study,we employed a novel approach that combines an extended version of principal component analysis(PCA)and a reinforcement learning algorithm to achieve efficient clustering and data reduction.The primary objectives of this study are to enhance the service life of a network,reduce energy usage,and improve data aggregation efficiency.We evaluated the proposed methodology using data collected from sensors deployed in agricultural fields for crop monitoring.Our proposed approach(PQL)was compared to previous studies that utilized adaptive Q-learning(AQL)and regional energy-aware clustering(REAC).Our study outperformed in terms of both network longevity and energy consumption and established a fault-tolerant network.展开更多
Link asymmetry in wireless mesh access networks(WMAN)of Mobile ad-hoc Networks(MANETs)is due mesh routers’transmission range.It is depicted as significant research challenges that pose during the design of network pro...Link asymmetry in wireless mesh access networks(WMAN)of Mobile ad-hoc Networks(MANETs)is due mesh routers’transmission range.It is depicted as significant research challenges that pose during the design of network protocol in wireless networks.Based on the extensive review,it is noted that the substantial link percentage is symmetric,i.e.,many links are unidirectional.It is identified that the synchronous acknowledgement reliability is higher than the asynchronous message.Therefore,the process of establishing bidirectional link quality through asynchronous beacons underrates the link reliability of asym-metric links.It paves the way to exploit an investigation on asymmetric links to enhance network functions through link estimation.Here,a novel Learning-based Dynamic Tree routing(LDTR)model is proposed to improve network performance and delay.For the evaluation of delay measures,asymmetric link,interference,probability of transmission failure is evaluated.The proportion of energy consumed is used for monitoring energy conditions based on the total energy capacity.This learning model is a productive way for resolving the routing issues over the network model during uncertainty.The asymmetric path is chosen to achieve exploitation and exploration iteratively.The learning-based Dynamic Tree routing model is utilized to resolve the multi-objective routing problem.Here,the simulation is done with MATLAB 2020a simulation environment and path with energy-efficiency and lesser E2E delay is evaluated and compared with existing approaches like the Dyna-Q-network model(DQN),asymmetric MAC model(AMAC),and cooperative asymmetric MAC model(CAMAC)model.The simulation outcomes demonstrate that the anticipated LDTR model attains superior network performance compared to others.The average energy consump-tion is 250 J,packet energy consumption is 6.5 J,PRR is 50 bits/sec,95%PDR,average delay percentage is 20%.展开更多
In this paper,we investigate the resource slicing and scheduling problem in the space-terrestrial integrated vehicular networks to support both delay-sensitive services(DSSs)and delay-tolerant services(DTSs).Resource ...In this paper,we investigate the resource slicing and scheduling problem in the space-terrestrial integrated vehicular networks to support both delay-sensitive services(DSSs)and delay-tolerant services(DTSs).Resource slicing and scheduling are to allocate spectrum resources to different slices and determine user association and bandwidth allocation for individual vehicles.To accommodate the dynamic network conditions,we first formulate a joint resource slicing and scheduling(JRSS)problem to minimize the long-term system cost,including the DSS requirement violation cost,DTS delay cost,and slice reconfiguration cost.Since resource slicing and scheduling decisions are interdependent with different timescales,we decompose the JRSS problem into a large-timescale resource slicing subproblem and a small-timescale resource scheduling subproblem.We propose a two-layered reinforcement learning(RL)-based JRSS scheme to find the solutions to the subproblems.In the resource slicing layer,spectrum resources are pre-allocated to different slices via a proximal policy optimization-based RL algorithm.In the resource scheduling layer,spectrum resources in each slice are scheduled to individual vehicles based on dynamic network conditions and service requirements via matching-based algorithms.We conduct extensive trace-driven experiments to demonstrate that the proposed scheme can effectively reduce the system cost while satisfying service quality requirements.展开更多
Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical ...Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.展开更多
Wireless Sensor Network(WSN)is widely utilized in large-scale distributed unmanned detection scenarios due to its low cost and flexible installation.However,WSN data collection encounters challenges in scenarios lacki...Wireless Sensor Network(WSN)is widely utilized in large-scale distributed unmanned detection scenarios due to its low cost and flexible installation.However,WSN data collection encounters challenges in scenarios lacking communication infrastructure.Unmanned aerial vehicle(UAV)offers a novel solution for WSN data collection,leveraging their high mobility.In this paper,we present an efficient UAV-assisted data collection algorithm aimed at minimizing the overall power consumption of the WSN.Firstly,a two-layer UAV-assisted data collection model is introduced,including the ground and aerial layers.The ground layer senses the environmental data by the cluster members(CMs),and the CMs transmit the data to the cluster heads(CHs),which forward the collected data to the UAVs.The aerial network layer consists of multiple UAVs that collect,store,and forward data from the CHs to the data center for analysis.Secondly,an improved clustering algorithm based on K-Means++is proposed to optimize the number and locations of CHs.Moreover,an Actor-Critic based algorithm is introduced to optimize the UAV deployment and the association with CHs.Finally,simulation results verify the effectiveness of the proposed algorithms.展开更多
With the developing demands of massive-data services,the applications that rely on big geographic data play crucial roles in academic and industrial communities.Unmanned aerial vehicles(UAVs),combining with terrestria...With the developing demands of massive-data services,the applications that rely on big geographic data play crucial roles in academic and industrial communities.Unmanned aerial vehicles(UAVs),combining with terrestrial wireless sensor networks(WSN),can provide sustainable solutions for data harvesting.The rising demands for efficient data collection in a larger open area have been posed in the literature,which requires efficient UAV trajectory planning with lower energy consumption methods.Currently,there are amounts of inextricable solutions of UAV planning for a larger open area,and one of the most practical techniques in previous studies is deep reinforcement learning(DRL).However,the overestimated problem in limited-experience DRL quickly throws the UAV path planning process into a locally optimized condition.Moreover,using the central nodes of the sub-WSNs as the sink nodes or navigation points for UAVs to visit may lead to extra collection costs.This paper develops a data-driven DRL-based game framework with two partners to fulfill the above demands.A cluster head processor(CHP)is employed to determine the sink nodes,and a navigation order processor(NOP)is established to plan the path.CHP and NOP receive information from each other and provide optimized solutions after the Nash equilibrium.The numerical results show that the proposed game framework could offer UAVs low-cost data collection trajectories,which can save at least 17.58%of energy consumption compared with the baseline methods.展开更多
Although content caching and recommendation are two complementary approaches to improve the user experience,it is still challenging to provide an integrated paradigm to fully explore their potential,due to the high co...Although content caching and recommendation are two complementary approaches to improve the user experience,it is still challenging to provide an integrated paradigm to fully explore their potential,due to the high complexity and complicated tradeoff relationship.To provide an efficient management framework,the joint design of content delivery and recommendation in wireless content caching networks is studied in this paper.First,a joint transmission scheme of content objects and recommendation lists is designed with edge caching,and an optimization problem is formulated to balance the utility and cost of content caching and recommendation,which is an mixed integer nonlinear programming problem.Second,a reinforcement learning based algorithm is proposed to implement real time management of content caching,recommendation and delivery,which can approach the optimal solution without iterations during each decision epoch.Finally,the simulation results are provided to evaluate the performance of our proposed scheme,which show that it can achieve lower cost than the existing content caching and recommendation schemes.展开更多
基金supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2022-00155885, Artificial Intelligence Convergence Innovation Human Resources Development (Hanyang University ERICA))supported by the National Natural Science Foundation of China under Grant No. 61971264the National Natural Science Foundation of China/Research Grants Council Collaborative Research Scheme under Grant No. 62261160390
文摘Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies.
基金Shaanxi Provincial Key Research and Development Project(2023YBGY095)and Shaanxi Provincial Qin Chuangyuan"Scientist+Engineer"project(2023KXJ247)Fund support.
文摘To solve the sparse reward problem of job-shop scheduling by deep reinforcement learning,a deep reinforcement learning framework considering sparse reward problem is proposed.The job shop scheduling problem is transformed into Markov decision process,and six state features are designed to improve the state feature representation by using two-way scheduling method,including four state features that distinguish the optimal action and two state features that are related to the learning goal.An extended variant of graph isomorphic network GIN++is used to encode disjunction graphs to improve the performance and generalization ability of the model.Through iterative greedy algorithm,random strategy is generated as the initial strategy,and the action with the maximum information gain is selected to expand it to optimize the exploration ability of Actor-Critic algorithm.Through validation of the trained policy model on multiple public test data sets and comparison with other advanced DRL methods and scheduling rules,the proposed method reduces the minimum average gap by 3.49%,5.31%and 4.16%,respectively,compared with the priority rule-based method,and 5.34%compared with the learning-based method.11.97%and 5.02%,effectively improving the accuracy of DRL to solve the approximate solution of JSSP minimum completion time.
文摘Industry 4.0 production environments and smart manufacturing systems integrate both the physical and decision-making aspects of manufacturing operations into autonomous and decentralized systems.One of the key aspects of these systems is a production planning,specifically,Scheduling operations on the machines.To cope with this problem,this paper proposed a Deep Reinforcement Learning with an Actor-Critic algorithm(DRLAC).We model the Job-Shop Scheduling Problem(JSSP)as a Markov Decision Process(MDP),represent the state of a JSSP as simple Graph Isomorphism Networks(GIN)to extract nodes features during scheduling,and derive the policy of optimal scheduling which guides the included node features to the best next action of schedule.In addition,we adopt the Actor-Critic(AC)network’s training algorithm-based reinforcement learning for achieving the optimal policy of the scheduling.To prove the proposed model’s effectiveness,first,we will present a case study that illustrated a conflict between two job scheduling,secondly,we will apply the proposed model to a known benchmark dataset and compare the results with the traditional scheduling methods and trending approaches.The numerical results indicate that the proposed model can be adaptive with real-time production scheduling,where the average percentage deviation(APD)of our model achieved values between 0.009 and 0.21 comparedwith heuristic methods and values between 0.014 and 0.18 compared with other trending approaches.
基金This work was supported by the National Key R&D Program of China(2018AAA0101400)the National Natural Science Foundation of China(62173251,61921004,U1713209)the Natural Science Foundation of Jiangsu Province of China(BK20202006).
文摘Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm.
文摘The dynamicity of available resources and net- work conditions, such as channel capacity and traffic charac- teristics, have posed major challenges to scheduling in wire- less networks. Reinforcement learning (RL) enables wire- less nodes to observe their respective operating environment, learn, and make optimal or near-optimal scheduling deci- sions. Learning, which is the main intrinsic characteristic of RL, enables wireless nodes to adapt to most forms of dynamicity in the operating environment as time goes by. This paper presents an extensive review on the application of the traditional and enhanced RL approaches to various types of scheduling schemes, namely packet, sleep-wake and task schedulers, in wireless networks, as well as the advantages and performance enhancements brought about by RL. Addi- tionally, it presents how various challenges associated with scheduling schemes have been approached using RL. Finally, we discuss various open issues related to RL-based schedul- ing schemes in wireless networks in order to explore new re- search directions in this area. Discussions in this paper are presented in a tutorial manner in order to establish a founda- tion for further research in this field.
基金supported in part by the National Key R&D Program of China under Grant No.2018YFB1800800 and the Nature Science Foundation of China under Grant Nos.61871254,91638204 and 61861136003.
文摘Due to the increasing need for massive data analysis and machine learning model training at the network edge, as well as the rising concerns about data privacy, a new distrib?uted training framework called federated learning (FL) has emerged and attracted much at?tention from both academia and industry. In FL, participating devices iteratively update the local models based on their own data and contribute to the global training by uploading mod?el updates until the training converges. Therefore, the computation capabilities of mobile de?vices can be utilized and the data privacy can be preserved. However, deploying FL in re?source-constrained wireless networks encounters several challenges, including the limited energy of mobile devices, weak onboard computing capability, and scarce wireless band?width. To address these challenges, recent solutions have been proposed to maximize the convergence rate or minimize the energy consumption under heterogeneous constraints. In this overview, we first introduce the backgrounds and fundamentals of FL. Then, the key challenges in deploying FL in wireless networks are discussed, and several existing solu?tions are reviewed. Finally, we highlight the open issues and future research directions in FL scheduling.
基金supported by the National Key R&D Program of China (No.2021ZD0112700)the Key Science and Technology Project of China Southern Power Grid Corporation (No.090000k52210134)。
文摘With the booming of electric vehicles(EVs) across the world, their increasing charging demands pose challenges to urban distribution networks. Particularly, due to the further implementation of time-of-use prices, the charging behaviors of household EVs are concentrated on low-cost periods, thus generating new load peaks and affecting the secure operation of the medium-and low-voltage grids. This problem is particularly acute in many old communities with relatively poor electricity infrastructure. In this paper, a novel two-stage charging scheduling scheme based on deep reinforcement learning is proposed to improve the power quality and achieve optimal charging scheduling of household EVs simultaneously in active distribution network(ADN) during valley period. In the first stage, the optimal charging profiles of charging stations are determined by solving the optimal power flow with the objective of eliminating peak-valley load differences. In the second stage, an intelligent agent based on proximal policy optimization algorithm is developed to dispatch the household EVs sequentially within the low-cost period considering their discrete nature of arrival. Through powerful approximation of neural network, the challenge of imperfect knowledge is tackled effectively during the charging scheduling process. Finally, numerical results demonstrate that the proposed scheme exhibits great improvement in relieving peak-valley differences as well as improving voltage quality in the ADN.
基金support by the Deanship of Scientific Research at King Khalid University under research grant number(RGP.2/241/43)。
文摘Machine Learning concepts have raised executions in all knowledge domains,including the Internet of Thing(IoT)and several business domains.Quality of Service(QoS)has become an important problem in IoT surrounding since there is a vast explosion of connecting sensors,information and usage.Sen-sor data gathering is an efficient solution to collect information from spatially dis-seminated IoT nodes.Reinforcement Learning Mechanism to improve the QoS(RLMQ)and use a Mobile Sink(MS)to minimize the delay in the wireless IoT s proposed in this paper.Here,we use machine learning concepts like Rein-forcement Learning(RL)to improve the QoS and energy efficiency in the Wire-less Sensor Network(WSN).The MS collects the data from the Cluster Head(CH),and the RL incentive values select CH.The incentives value is computed by the QoS parameters such as minimum energy utilization,minimum bandwidth utilization,minimum hop count,and minimum time delay.The MS is used to col-lect the data from CH,thus minimizing the network delay.The sleep and awake scheduling is used for minimizing the CH dead in the WSN.This work is simu-lated,and the results show that the RLMQ scheme performs better than the base-line protocol.Results prove that RLMQ increased the residual energy,throughput and minimized the network delay in the WSN.
基金Project supported by the National Key R&rD Program of China(No.2020YFB1710900)the National Natural Science Foundation of China(Nos.62173322,61803368,and U1908212)+1 种基金the China Postdoctoral Science Foundation(No.2019M661156)the Youth Innovation Promotion Association,Chinese Academy of Sciences(No.2019202)。
文摘Edge artificial intelligence will empower the ever simple industrial wireless networks(IWNs)supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices(MTDs)and edge servers.In this paper,we propose a multi-agent deep reinforcement learning based resource allocation(MADRL-RA)algorithm for end-edge orchestrated IWNs to support computation-intensive and delay-sensitive applications.First,we present the system model of IWNs,wherein each MTD is regarded as a self-learning agent.Then,we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption.Next,we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision,computation capacity,and transmission power.To break the time correlation of training data while accelerating the learning process of MADRL-RA,we design a weighted experience replay to store and sample experiences categorically.Furthermore,we propose a step-by-stepε-greedy method to balance exploitation and exploration.Finally,we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments,showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.
基金Supported by Scientific and Technological Innovation Project of Chongqing(No.cstc2021jxjl20010)The Graduate Student Innovation Program of Chongqing University of Technology(No.clgycx-20203166,No.gzlcx20222061,No.gzlcx20223229)。
文摘Deploying service nodes hierarchically at the edge of the network can effectively improve the service quality of offloaded task requests and increase the utilization of resources.In this paper,we study the task scheduling problem in the hierarchically deployed edge cloud.We first formulate the minimization of the service time of scheduled tasks in edge cloud as a combinatorial optimization problem,blue and then prove the NP-hardness of the problem.Different from the existing work that mostly designs heuristic approximation-based algorithms or policies to make scheduling decision,we propose a newly designed scheduling policy,named Joint Neural Network and Heuristic Scheduling(JNNHSP),which combines a neural network-based method with a heuristic based solution.JNNHSP takes the Sequence-to-Sequence(Seq2Seq)model trained by Reinforcement Learning(RL)as the primary policy and adopts the heuristic algorithm as the auxiliary policy to obtain the scheduling solution,thereby achieving a good balance between the quality and the efficiency of the scheduling solution.In-depth experiments show that compared with a variety of related policies and optimization solvers,JNNHSP can achieve better performance in terms of scheduling error ratio,the degree to which the policy is affected by re-sources limitations,average service latency,and execution efficiency in a typical hierarchical edge cloud.
基金co-supported by the Key Programs of the Chinese Academy of Sciences (No. ZDRW-KT-2016-2)the National High-tech Research and Development Program of China (No. 2015AA7013040)
文摘In the ‘‘Internet Plus" era, space-based information services require effective and fast image satellite scheduling. Most existing studies consider image satellite scheduling to be an optimization problem to solve with searching algorithms in a batch-wise manner. No real-time speed method for satellite scheduling exists. In this paper, with the idea of building a real-time speed method, satellite scheduling is remodeled based on a Dynamic and Stochastic Knapsack Problem(DSKP), and the objective is to maximize the total expected profit. No existing algorithm could be able to solve this novel scheduling problem properly. With inspiration from the recent achievements in Deep Reinforcement Learning(DRL) in video games, AlphaGo and dynamic controlling,a novel DRL-based method is applied to training a neural network to schedule tasks. The numerical results show that the method proposed in this paper can achieve relatively good performance with real-time speed and immediate respond style.
基金supported by the National Natural Science Foundation of China under Grant No.61971057。
文摘Heterogeneous base station deployment enables to provide high capacity and wide area coverage.Network slicing makes it possible to allocate wireless resource for heterogeneous services on demand.These two promising technologies contribute to the unprecedented service in 5G.We establish a multiservice heterogeneous network model,which aims to raise the transmission rate under the delay constraints for active control terminals,and optimize the energy efficiency for passive network terminals.A policygradient-based deep reinforcement learning algorithm is proposed to make decisions on user association and power control in the continuous action space.Simulation results indicate the good convergence of the algorithm,and higher reward is obtained compared with other baselines.
基金This work was supported in part by the National Science Foundation of China(Nos.62073300,U1911205,and 62076225).
文摘For sudden drinking water pollution event,reasonable opening or closing valves and hydrants in a water distribution network(WDN),which ensures the isolation and discharge of contaminant as soon as possible,is considered as an effective emergency measure.In this paper,we propose an emergency scheduling algorithm based on evolutionary reinforcement learning(ERL),which can train a good scheduling policy by the combination of the evolutionary computation(EC)and reinforcement learning(RL).Then,the optimal scheduling policy can guide the operation of valves and hydrants in real time based on sensor information,and protect people from the risk of contaminated water.Experiments verify our algorithm can achieve good results and effectively reduce the impact of pollution events.
文摘The Internet of things(IoT)is a wireless network designed to perform specific tasks and plays a crucial role in various fields such as environmental monitoring,surveillance,and healthcare.To address the limitations imposed by inadequate resources,energy,and network scalability,this type of network relies heavily on data aggregation and clustering algorithms.Although various conventional studies have aimed to enhance the lifespan of a network through robust systems,they do not always provide optimal efficiency for real-time applications.This paper presents an approach based on state-of-the-art machine-learning methods.In this study,we employed a novel approach that combines an extended version of principal component analysis(PCA)and a reinforcement learning algorithm to achieve efficient clustering and data reduction.The primary objectives of this study are to enhance the service life of a network,reduce energy usage,and improve data aggregation efficiency.We evaluated the proposed methodology using data collected from sensors deployed in agricultural fields for crop monitoring.Our proposed approach(PQL)was compared to previous studies that utilized adaptive Q-learning(AQL)and regional energy-aware clustering(REAC).Our study outperformed in terms of both network longevity and energy consumption and established a fault-tolerant network.
文摘Link asymmetry in wireless mesh access networks(WMAN)of Mobile ad-hoc Networks(MANETs)is due mesh routers’transmission range.It is depicted as significant research challenges that pose during the design of network protocol in wireless networks.Based on the extensive review,it is noted that the substantial link percentage is symmetric,i.e.,many links are unidirectional.It is identified that the synchronous acknowledgement reliability is higher than the asynchronous message.Therefore,the process of establishing bidirectional link quality through asynchronous beacons underrates the link reliability of asym-metric links.It paves the way to exploit an investigation on asymmetric links to enhance network functions through link estimation.Here,a novel Learning-based Dynamic Tree routing(LDTR)model is proposed to improve network performance and delay.For the evaluation of delay measures,asymmetric link,interference,probability of transmission failure is evaluated.The proportion of energy consumed is used for monitoring energy conditions based on the total energy capacity.This learning model is a productive way for resolving the routing issues over the network model during uncertainty.The asymmetric path is chosen to achieve exploitation and exploration iteratively.The learning-based Dynamic Tree routing model is utilized to resolve the multi-objective routing problem.Here,the simulation is done with MATLAB 2020a simulation environment and path with energy-efficiency and lesser E2E delay is evaluated and compared with existing approaches like the Dyna-Q-network model(DQN),asymmetric MAC model(AMAC),and cooperative asymmetric MAC model(CAMAC)model.The simulation outcomes demonstrate that the anticipated LDTR model attains superior network performance compared to others.The average energy consump-tion is 250 J,packet energy consumption is 6.5 J,PRR is 50 bits/sec,95%PDR,average delay percentage is 20%.
文摘In this paper,we investigate the resource slicing and scheduling problem in the space-terrestrial integrated vehicular networks to support both delay-sensitive services(DSSs)and delay-tolerant services(DTSs).Resource slicing and scheduling are to allocate spectrum resources to different slices and determine user association and bandwidth allocation for individual vehicles.To accommodate the dynamic network conditions,we first formulate a joint resource slicing and scheduling(JRSS)problem to minimize the long-term system cost,including the DSS requirement violation cost,DTS delay cost,and slice reconfiguration cost.Since resource slicing and scheduling decisions are interdependent with different timescales,we decompose the JRSS problem into a large-timescale resource slicing subproblem and a small-timescale resource scheduling subproblem.We propose a two-layered reinforcement learning(RL)-based JRSS scheme to find the solutions to the subproblems.In the resource slicing layer,spectrum resources are pre-allocated to different slices via a proximal policy optimization-based RL algorithm.In the resource scheduling layer,spectrum resources in each slice are scheduled to individual vehicles based on dynamic network conditions and service requirements via matching-based algorithms.We conduct extensive trace-driven experiments to demonstrate that the proposed scheme can effectively reduce the system cost while satisfying service quality requirements.
基金supported by the National Key Research and Development Program of China(No.2021YFE0116900)National Natural Science Foundation of China(Nos.42275157,62002276,and 41975142)Major Program of the National Social Science Fund of China(No.17ZDA092).
文摘Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.
基金supported by the National Natural Science Foundation of China(NSFC)(61831002,62001076)the General Program of Natural Science Foundation of Chongqing(No.CSTB2023NSCQ-MSX0726,No.cstc2020jcyjmsxmX0878).
文摘Wireless Sensor Network(WSN)is widely utilized in large-scale distributed unmanned detection scenarios due to its low cost and flexible installation.However,WSN data collection encounters challenges in scenarios lacking communication infrastructure.Unmanned aerial vehicle(UAV)offers a novel solution for WSN data collection,leveraging their high mobility.In this paper,we present an efficient UAV-assisted data collection algorithm aimed at minimizing the overall power consumption of the WSN.Firstly,a two-layer UAV-assisted data collection model is introduced,including the ground and aerial layers.The ground layer senses the environmental data by the cluster members(CMs),and the CMs transmit the data to the cluster heads(CHs),which forward the collected data to the UAVs.The aerial network layer consists of multiple UAVs that collect,store,and forward data from the CHs to the data center for analysis.Secondly,an improved clustering algorithm based on K-Means++is proposed to optimize the number and locations of CHs.Moreover,an Actor-Critic based algorithm is introduced to optimize the UAV deployment and the association with CHs.Finally,simulation results verify the effectiveness of the proposed algorithms.
基金the National Natural Science Foundation of China under Grant No.61972230the Natural Science Foundation of Shandong Province of China under Grant No.ZR2021LZH006.
文摘With the developing demands of massive-data services,the applications that rely on big geographic data play crucial roles in academic and industrial communities.Unmanned aerial vehicles(UAVs),combining with terrestrial wireless sensor networks(WSN),can provide sustainable solutions for data harvesting.The rising demands for efficient data collection in a larger open area have been posed in the literature,which requires efficient UAV trajectory planning with lower energy consumption methods.Currently,there are amounts of inextricable solutions of UAV planning for a larger open area,and one of the most practical techniques in previous studies is deep reinforcement learning(DRL).However,the overestimated problem in limited-experience DRL quickly throws the UAV path planning process into a locally optimized condition.Moreover,using the central nodes of the sub-WSNs as the sink nodes or navigation points for UAVs to visit may lead to extra collection costs.This paper develops a data-driven DRL-based game framework with two partners to fulfill the above demands.A cluster head processor(CHP)is employed to determine the sink nodes,and a navigation order processor(NOP)is established to plan the path.CHP and NOP receive information from each other and provide optimized solutions after the Nash equilibrium.The numerical results show that the proposed game framework could offer UAVs low-cost data collection trajectories,which can save at least 17.58%of energy consumption compared with the baseline methods.
基金supported by Beijing Natural Science Foundation(Grant L182039),and National Natural Science Foundation of China(Grant 61971061).
文摘Although content caching and recommendation are two complementary approaches to improve the user experience,it is still challenging to provide an integrated paradigm to fully explore their potential,due to the high complexity and complicated tradeoff relationship.To provide an efficient management framework,the joint design of content delivery and recommendation in wireless content caching networks is studied in this paper.First,a joint transmission scheme of content objects and recommendation lists is designed with edge caching,and an optimization problem is formulated to balance the utility and cost of content caching and recommendation,which is an mixed integer nonlinear programming problem.Second,a reinforcement learning based algorithm is proposed to implement real time management of content caching,recommendation and delivery,which can approach the optimal solution without iterations during each decision epoch.Finally,the simulation results are provided to evaluate the performance of our proposed scheme,which show that it can achieve lower cost than the existing content caching and recommendation schemes.