期刊文献+
共找到414篇文章
< 1 2 21 >
每页显示 20 50 100
Autonomous Vehicle Platoons In Urban Road Networks:A Joint Distributed Reinforcement Learning and Model Predictive Control Approach
1
作者 Luigi D’Alfonso Francesco Giannini +3 位作者 Giuseppe Franzè Giuseppe Fedele Francesco Pupo Giancarlo Fortino 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期141-156,共16页
In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory... In this paper, platoons of autonomous vehicles operating in urban road networks are considered. From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria. To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control. On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction. On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle. Such tasks are here combined into a single framework:the deep reinforcement learning output(action) is translated into a set-point to be tracked by the model predictive controller;conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability. The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes;on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions. To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps(links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged. Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main f eatures of the proposed approach. Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors. 展开更多
关键词 distributed model predictive control distributed reinforcement learning routing decisions urban road networks
下载PDF
Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection
2
作者 Fei Ming Wenyin Gong +1 位作者 Ling Wang Yaochu Jin 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第4期919-931,共13页
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev... Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. 展开更多
关键词 Constrained multi-objective optimization deep Qlearning deep reinforcement learning(drl) evolutionary algorithms evolutionary operator selection
下载PDF
An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network 被引量:2
3
作者 Zhe Chen Ning Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第11期2081-2093,共13页
This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objecti... This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and non-convexity, the Euler-Lagrange equation of the proposed problem is a complicated partial differential equation.Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning(RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework. 展开更多
关键词 distributed optimization MULTI-AGENT optimal control reinforcement learning(RL)
下载PDF
Airport gate assignment problem with deep reinforcement learning 被引量:3
4
作者 Zhao Jiaming Wu Wenjun +3 位作者 Liu Zhiming Han Changhao Zhang Xuanyi Zhang Yanhua 《High Technology Letters》 EI CAS 2020年第1期102-107,共6页
With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time... With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time AGAP algorithm is still an open issue.In this study,a deep reinforcement learning based AGAP(DRL-AGAP)is proposed.The optimization object is to maximize the rate of flights assigned to fixed gates.The real-time AGAP is modeled as a Markov decision process(MDP).The state space,action space,value and rewards have been defined.The DRL-AGAP algorithm is evaluated via simulation and it is compared with the flight pre-assignment results of the optimization software Gurobiand Greedy.Simulation results show that the performance of the proposed DRL-AGAP algorithm is close to that of pre-assignment obtained by the Gurobi optimization solver.Meanwhile,the real-time assignment ability is ensured by the proposed DRL-AGAP algorithm due to the dynamic modeling and lower complexity. 展开更多
关键词 AIRPORT gate ASSIGNMENT problem(AGAP) DEEP reinforcement learning(drl) MARKOV decision process(MDP)
下载PDF
Active control of flow past an elliptic cylinder using an artificial neural network trained by deep reinforcement learning 被引量:1
5
作者 Bofu WANG Qiang WANG +1 位作者 Quan ZHOU Yulu LIU 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2022年第12期1921-1934,共14页
The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°... The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°,15°,30°,and 45°are taken into consideration for a fixed Reynolds number Re=100.The mass flow rates of two synthetic jets imposed on different positions of the cylinderθ1andθ2are trained to control the flow.The optimal jet placement that achieves the highest drag reduction is determined for each case.For a low axis ratio ellipse,i.e.,Γ=1.2,the controlled results atα=0°are similar to those for a circular cylinder with control jets applied atθ1=90°andθ2=270°.It is found that either applying the jets asymmetrically or increasing the angle of attack can achieve a higher drag reduction rate,which,however,is accompanied by increased fluctuation.The control jets elongate the vortex shedding,and reduce the pressure drop.Meanwhile,the flow topology is modified at a high angle of attack.For an ellipse with a relatively higher axis ratio,i.e.,Γ1.6,the drag reduction is achieved for all the angles of attack studied.The larger the angle of attack is,the higher the drag reduction ratio is.The increased fluctuation in the drag coefficient under control is encountered,regardless of the position of the control jets.The control jets modify the flow topology by inducing an external vortex near the wall,causing the drag reduction.The results suggest that the DRL can learn an active control strategy for the present configuration. 展开更多
关键词 drag reduction deep reinforcement learning(drl) elliptical cylinder active control
下载PDF
A new accelerating algorithm for multi-agent reinforcement learning 被引量:1
6
作者 张汝波 仲宇 顾国昌 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第1期48-51,共4页
In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learni... In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm. 展开更多
关键词 distributed reinforcement learning accelerating algorithm machine learning multi-agent system
下载PDF
Distributed Asynchronous Learning for Multipath Data Transmission Based on P-DDQN 被引量:1
7
作者 Kang Liu Wei Quan +3 位作者 Deyun Gao Chengxiao Yu Mingyuan Liu Yuming Zhang 《China Communications》 SCIE CSCD 2021年第8期62-74,共13页
Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of n... Adaptive packet scheduling can efficiently enhance the performance of multipath Data Transmission.However,realizing precise packet scheduling is challenging due to the nature of high dynamics and unpredictability of network link states.To this end,this paper proposes a distributed asynchronous deep reinforcement learning framework to intensify the dynamics and prediction of adaptive packet scheduling.Our framework contains two parts:local asynchronous packet scheduling and distributed cooperative control center.In local asynchronous packet scheduling,an asynchronous prioritized replay double deep Q-learning packets scheduling algorithm is proposed for dynamic adaptive packet scheduling learning,which makes a combination of prioritized replay double deep Q-learning network(P-DDQN)to make the fitting analysis.In distributed cooperative control center,a distributed scheduling learning and neural fitting acceleration algorithm to adaptively update neural network parameters of P-DDQN for more precise packet scheduling.Experimental results show that our solution has a better performance than Random weight algorithm and Round-Robin algorithm in throughput and loss ratio.Further,our solution has 1.32 times and 1.54 times better than Random weight algorithm and Round-Robin algorithm on the stability of multipath data transmission,respectively. 展开更多
关键词 distributed asynchronous learning multipath data transmission deep reinforcement learning
下载PDF
Navigation Method Based on Improved Rapid Exploration Random Tree Star-Smart(RRT^(*)-Smart) and Deep Reinforcement Learning 被引量:1
8
作者 ZHANG Jue LI Xiangjian +3 位作者 LIU Xiaoyan LI Nan YANG Kaiqiang ZHU Heng 《Journal of Donghua University(English Edition)》 CAS 2022年第5期490-495,共6页
A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit ... A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit workers to complete manual operations. Artificial intelligence and robotics, which are rapidly evolving, offer potential solutions to this problem. In this paper, a navigation method dedicated to solving the issues of the inability to pass smoothly at corners in practice and local obstacle avoidance is presented. In the system, a Gaussian fitting smoothing rapid exploration random tree star-smart(GFS RRT^(*)-Smart) algorithm is proposed for global path planning and enhances the performance when the robot makes a sharp turn around corners. In local obstacle avoidance, a deep reinforcement learning determiner mixed actor critic(MAC) algorithm is used for obstacle avoidance decisions. The navigation system is implemented in a scaled-down simulation factory. 展开更多
关键词 rapid exploration random tree star smart(RRT*-Smart) Gaussian fitting deep reinforcement learning(drl) mixed actor critic(MAC)
下载PDF
Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP
9
作者 Yi Xie Zhongyi Liu +1 位作者 Zhao Liu Yijun Gu 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期525-533,共9页
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat... Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP. 展开更多
关键词 reinforcement learning cooperative multi-agent system distributed constraint optimization (DCOP) constraint-reduced DCOP
下载PDF
Hierarchical reinforcement learning guidance with threat avoidance
10
作者 LI Bohao WU Yunjie LI Guofei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1173-1185,共13页
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic... The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively. 展开更多
关键词 guidance law deep reinforcement learning(drl) threat avoidance hierarchical reinforcement learning
下载PDF
Day-ahead scheduling based on reinforcement learning with hybrid action space
11
作者 CAO Jingyu DONG Lu SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第3期693-705,共13页
Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal s... Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm. 展开更多
关键词 day-ahead scheduling active distribution network(ADN) reinforcement learning hybrid action space
下载PDF
Optimizing MDS-coded cache-enable wireless network:a blockchain-based cooperative deep reinforcement learning approach
12
作者 Zhang Zheng Yang Ruizhe +2 位作者 Yu Fei Richard Zhang Yanhua Li Meng 《High Technology Letters》 EI CAS 2021年第2期129-138,共10页
Mobile distributed caching(MDC)as an emerging technology has drawn attentions for its ability to shorten the distance between users and data in the wireless network.However,the DC network state in the existing work is... Mobile distributed caching(MDC)as an emerging technology has drawn attentions for its ability to shorten the distance between users and data in the wireless network.However,the DC network state in the existing work is always assumed to be either static or real-time updated.To be more realistic,a periodically updated wireless network using maximum distance separable(MDS)-coded DC is studied,in each period of which the devices may arrive and leave.For the efficient optimization of the system with large scale,this work proposes a blockchain-based cooperative deep reinforcement learning(DRL)approach,which enhances the efficiency of learning by cooperating and guarantees the security in cooperation by the practical Byzantine fault tolerance(PBFT)-based blockchain mechanism.Numerical results are presented,and it illustrates that the proposed scheme can dramatically reduce the total file download delay in DC network under the guarantee of security and efficiency. 展开更多
关键词 caching technology blockchain deep reinforcement learning(drl)
下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior
13
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(DDQN)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(drl) sparse returns
下载PDF
Study and application of reinforcement learning based on DAI in cooperative strategy of robot soccer
14
作者 郭琦 张达志 杨永田 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2009年第4期513-519,共7页
A dynamic cooperation model of multi-agent is established by combining reinforcement learning with distributed artificial intelligence(DAI),in which the concept of individual optimization loses its meaning because of ... A dynamic cooperation model of multi-agent is established by combining reinforcement learning with distributed artificial intelligence(DAI),in which the concept of individual optimization loses its meaning because of the dependence of repayment on each agent itself and the choice of other agents.Utilizing the idea of DAI,the intellectual unit of each robot and the change of task and environment,each agent can make decisions independently and finish various complicated tasks by communication and reciprocation between each other.The method is superior to other reinforcement learning methods commonly used in the multi-agent system.It can improve the convergence velocity of reinforcement learning,decrease requirements of computer memory,and enhance the capability of computing and logical ratiocinating for agent.The result of a simulated robot soccer match proves that the proposed cooperative strategy is valid. 展开更多
关键词 robot soccer reinforcement learning i cooperative strategy distributed artificial intelligence
下载PDF
A dynamic fusion path planning algorithm for mobile robots incorporating improved IB-RRT∗and deep reinforcement learning
15
作者 刘安东 ZHANG Baixin +2 位作者 CUI Qi ZHANG Dan NI Hongjie 《High Technology Letters》 EI CAS 2023年第4期365-376,共12页
Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl... Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments. 展开更多
关键词 mobile robot improved IB-RRT∗algorithm deep reinforcement learning(drl) real-time dynamic obstacle avoidance
下载PDF
Distributional Reinforcement Learning with Quantum Neural Networks
16
作者 Wei Hu James Hu 《Intelligent Control and Automation》 2019年第2期63-78,共16页
Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learnin... Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment. 展开更多
关键词 Continuous-Variable QUANTUM Computers QUANTUM reinforcement learning Distributional reinforcement learning QUANTILE Regression Distributional Q learning Grid World ENVIRONMENT MDP Chain ENVIRONMENT
下载PDF
Federated Reinforcement Learning with Adaptive Training Times for Edge Caching
17
作者 Shaoshuai Fan Liyun Hu Hui Tian 《China Communications》 SCIE CSCD 2022年第8期57-72,共16页
To relieve the backhaul link stress and reduce the content acquisition delay,mobile edge caching has become one of the promising approaches.In this paper,a novel federated reinforcement learning(FRL)method with adapti... To relieve the backhaul link stress and reduce the content acquisition delay,mobile edge caching has become one of the promising approaches.In this paper,a novel federated reinforcement learning(FRL)method with adaptive training times is proposed for edge caching.Through a new federated learning process with the asynchronous model training process and synchronous global aggregation process,the proposed FRL-based edge caching algorithm mitigates the performance degradation brought by the non-identically and independently distributed(noni.i.d.)characteristics of content popularity among edge nodes.The theoretical bound of the loss function difference is analyzed in the paper,based on which the training times adaption mechanism is proposed to deal with the tradeoff between local training and global aggregation for each edge node in the federation.Numerical simulations have verified that the proposed FRL-based edge caching method outperforms other baseline methods in terms of the caching benefit,the cache hit ratio and the convergence speed. 展开更多
关键词 edge caching federated reinforcement learning(FRL) non-identically and independently distributed(non-i.i.d.)
下载PDF
Boundary Data Augmentation for Offline Reinforcement Learning
18
作者 SHEN Jiahao JIANG Ke TAN Xiaoyang 《ZTE Communications》 2023年第3期29-36,共8页
Offline reinforcement learning(ORL)aims to learn a rational agent purely from behavior data without any online interaction.One of the major challenges encountered in ORL is the problem of distribution shift,i.e.,the m... Offline reinforcement learning(ORL)aims to learn a rational agent purely from behavior data without any online interaction.One of the major challenges encountered in ORL is the problem of distribution shift,i.e.,the mismatch between the knowledge of the learned policy and the reality of the underlying environment.Recent works usually handle this in a too pessimistic manner to avoid out-of-distribution(OOD)queries as much as possible,but this can influence the robustness of the agents at unseen states.In this paper,we propose a simple but effective method to address this issue.The key idea of our method is to enhance the robustness of the new policy learned offline by weakening its confidence in highly uncertain regions,and we propose to find those regions by simulating them with modified Generative Adversarial Nets(GAN)such that the generated data not only follow the same distribution with the old experience but are very difficult to deal with by themselves,with regard to the behavior policy or some other reference policy.We then use this information to regularize the ORL algorithm to penalize the overconfidence behavior in these regions.Extensive experiments on several publicly available offline RL benchmarks demonstrate the feasibility and effectiveness of the proposed method. 展开更多
关键词 offline reinforcement learning out‐of‐distribution state ROBUSTNESS UNCERTAINTY
下载PDF
Distributed Deep Reinforcement Learning:A Survey and a Multi-player Multi-agent Learning Toolbox
19
作者 Qiyue Yin Tongtong Yu +6 位作者 Shengqi Shen Jun Yang Meijing Zhao Wancheng Ni Kaiqi Huang Bin Liang Liang Wang 《Machine Intelligence Research》 EI CSCD 2024年第3期411-430,共20页
With the breakthrough of AlphaGo,deep reinforcement learning has become a recognized technique for solving sequential decision-making problems.Despite its reputation,data inefficiency caused by its trial and error lea... With the breakthrough of AlphaGo,deep reinforcement learning has become a recognized technique for solving sequential decision-making problems.Despite its reputation,data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning difficult to apply in a wide range of areas.Many methods have been developed for sample efficient deep reinforcement learning,such as environment modelling,experience transfer,and distributed modifications,among which distributed deep reinforcement learning has shown its potential in various applications,such as human-computer gaming and intelligent transportation.In this paper,we conclude the state of this exciting field,by comparing the classical distributed deep reinforcement learning methods and studying important components to achieve efficient distributed learning,covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning.Furthermore,we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions.By analysing their strengths and weaknesses,a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released,which is further validated on Wargame,a complex environment,showing the usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games.Finally,we try to point out challenges and future trends,hoping that this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning. 展开更多
关键词 Deep reinforcement learning distributed machine learning self-play population-play TOOLBOX
原文传递
Automatic Generation Control in a Distributed Power Grid Based on Multi-step Reinforcement Learning
20
作者 Wenmeng Zhao Tuo Zeng +3 位作者 Zhihong Liu Lihui Xie Lei Xi Hui Ma 《Protection and Control of Modern Power Systems》 SCIE EI 2024年第4期39-50,共12页
The increasing use of renewable energy in the power system results in strong stochastic disturbances and degrades the control performance of the distributed power grids.In this paper,a novel multi-agent collaborative ... The increasing use of renewable energy in the power system results in strong stochastic disturbances and degrades the control performance of the distributed power grids.In this paper,a novel multi-agent collaborative reinforcement learning algorithm is proposed with automatic optimization,namely,Dyna-DQL,to quickly achieve an optimal coordination solution for the multi-area distributed power grids.The proposed Dyna framework is combined with double Q-learning to collect and store the environmental samples.This can iteratively update the agents through buffer replay and real-time data.Thus the environmental data can be fully used to enhance the learning speed of the agents.This mitigates the negative impact of heavy stochastic disturbances caused by the integration of renewable energy on the control performance.Simulations are conducted on two different models to validate the effectiveness of the proposed algorithm.The results demonstrate that the proposed Dyna-DQL algorithm exhibits superior stability and robustness compared to other reinforcement learning algorithms. 展开更多
关键词 Automatic generation control Dyna framework distributed power grid MULTI-AGENT mod-el-based reinforcement learning
原文传递
上一页 1 2 21 下一页 到第
使用帮助 返回顶部