期刊文献+
共找到406篇文章
< 1 2 21 >
每页显示 20 50 100
Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection
1
作者 Fei Ming Wenyin Gong +1 位作者 Ling Wang Yaochu Jin 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第4期919-931,共13页
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev... Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. 展开更多
关键词 Constrained multi-objective optimization deep Qlearning deep reinforcement learning(drl) evolutionary algorithms evolutionary operator selection
下载PDF
A dynamic fusion path planning algorithm for mobile robots incorporating improved IB-RRT∗and deep reinforcement learning
2
作者 刘安东 ZHANG Baixin +2 位作者 CUI Qi ZHANG Dan NI Hongjie 《High Technology Letters》 EI CAS 2023年第4期365-376,共12页
Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl... Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments. 展开更多
关键词 mobile robot improved IB-RRT∗algorithm deep reinforcement learning(drl) real-time dynamic obstacle avoidance
下载PDF
Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量:15
3
作者 Dimitri P.Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor... In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. 展开更多
关键词 reinforcement learning dynamic programming Markovian decision problems AGGREGATION feature-based ARCHITECTURES policy ITERATION deep neural networks rollout algorithms
下载PDF
Airport gate assignment problem with deep reinforcement learning 被引量:3
4
作者 Zhao Jiaming Wu Wenjun +3 位作者 Liu Zhiming Han Changhao Zhang Xuanyi Zhang Yanhua 《High Technology Letters》 EI CAS 2020年第1期102-107,共6页
With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time... With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time AGAP algorithm is still an open issue.In this study,a deep reinforcement learning based AGAP(DRL-AGAP)is proposed.The optimization object is to maximize the rate of flights assigned to fixed gates.The real-time AGAP is modeled as a Markov decision process(MDP).The state space,action space,value and rewards have been defined.The DRL-AGAP algorithm is evaluated via simulation and it is compared with the flight pre-assignment results of the optimization software Gurobiand Greedy.Simulation results show that the performance of the proposed DRL-AGAP algorithm is close to that of pre-assignment obtained by the Gurobi optimization solver.Meanwhile,the real-time assignment ability is ensured by the proposed DRL-AGAP algorithm due to the dynamic modeling and lower complexity. 展开更多
关键词 AIRPORT gate ASSIGNMENT problem(AGAP) deep reinforcement learning(drl) MARKOV decision process(MDP)
下载PDF
Active control of flow past an elliptic cylinder using an artificial neural network trained by deep reinforcement learning 被引量:1
5
作者 Bofu WANG Qiang WANG +1 位作者 Quan ZHOU Yulu LIU 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2022年第12期1921-1934,共14页
The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°... The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°,15°,30°,and 45°are taken into consideration for a fixed Reynolds number Re=100.The mass flow rates of two synthetic jets imposed on different positions of the cylinderθ1andθ2are trained to control the flow.The optimal jet placement that achieves the highest drag reduction is determined for each case.For a low axis ratio ellipse,i.e.,Γ=1.2,the controlled results atα=0°are similar to those for a circular cylinder with control jets applied atθ1=90°andθ2=270°.It is found that either applying the jets asymmetrically or increasing the angle of attack can achieve a higher drag reduction rate,which,however,is accompanied by increased fluctuation.The control jets elongate the vortex shedding,and reduce the pressure drop.Meanwhile,the flow topology is modified at a high angle of attack.For an ellipse with a relatively higher axis ratio,i.e.,Γ1.6,the drag reduction is achieved for all the angles of attack studied.The larger the angle of attack is,the higher the drag reduction ratio is.The increased fluctuation in the drag coefficient under control is encountered,regardless of the position of the control jets.The control jets modify the flow topology by inducing an external vortex near the wall,causing the drag reduction.The results suggest that the DRL can learn an active control strategy for the present configuration. 展开更多
关键词 drag reduction deep reinforcement learning(drl) elliptical cylinder active control
下载PDF
Navigation Method Based on Improved Rapid Exploration Random Tree Star-Smart(RRT^(*)-Smart) and Deep Reinforcement Learning 被引量:1
6
作者 ZHANG Jue LI Xiangjian +3 位作者 LIU Xiaoyan LI Nan YANG Kaiqiang ZHU Heng 《Journal of Donghua University(English Edition)》 CAS 2022年第5期490-495,共6页
A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit ... A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit workers to complete manual operations. Artificial intelligence and robotics, which are rapidly evolving, offer potential solutions to this problem. In this paper, a navigation method dedicated to solving the issues of the inability to pass smoothly at corners in practice and local obstacle avoidance is presented. In the system, a Gaussian fitting smoothing rapid exploration random tree star-smart(GFS RRT^(*)-Smart) algorithm is proposed for global path planning and enhances the performance when the robot makes a sharp turn around corners. In local obstacle avoidance, a deep reinforcement learning determiner mixed actor critic(MAC) algorithm is used for obstacle avoidance decisions. The navigation system is implemented in a scaled-down simulation factory. 展开更多
关键词 rapid exploration random tree star smart(RRT*-Smart) Gaussian fitting deep reinforcement learning(drl) mixed actor critic(MAC)
下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior
7
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(DDQN)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(drl) sparse returns
下载PDF
Optimizing MDS-coded cache-enable wireless network:a blockchain-based cooperative deep reinforcement learning approach
8
作者 Zhang Zheng Yang Ruizhe +2 位作者 Yu Fei Richard Zhang Yanhua Li Meng 《High Technology Letters》 EI CAS 2021年第2期129-138,共10页
Mobile distributed caching(MDC)as an emerging technology has drawn attentions for its ability to shorten the distance between users and data in the wireless network.However,the DC network state in the existing work is... Mobile distributed caching(MDC)as an emerging technology has drawn attentions for its ability to shorten the distance between users and data in the wireless network.However,the DC network state in the existing work is always assumed to be either static or real-time updated.To be more realistic,a periodically updated wireless network using maximum distance separable(MDS)-coded DC is studied,in each period of which the devices may arrive and leave.For the efficient optimization of the system with large scale,this work proposes a blockchain-based cooperative deep reinforcement learning(DRL)approach,which enhances the efficiency of learning by cooperating and guarantees the security in cooperation by the practical Byzantine fault tolerance(PBFT)-based blockchain mechanism.Numerical results are presented,and it illustrates that the proposed scheme can dramatically reduce the total file download delay in DC network under the guarantee of security and efficiency. 展开更多
关键词 caching technology blockchain deep reinforcement learning(drl)
下载PDF
Reliable Scheduling Method for Sensitive Power Business Based on Deep Reinforcement Learning
9
作者 Shen Guo Jiaying Lin +2 位作者 Shuaitao Bai Jichuan Zhang Peng Wang 《Intelligent Automation & Soft Computing》 SCIE 2023年第7期1053-1066,共14页
The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services r... The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services related to dispatching data networks and the transmission of fault information or feeder automation have high requirements for delay.If processing time is prolonged,a power business cascade reaction may be triggered.In order to solve the above problems,this paper establishes an edge object-linked agent business deployment model for power communication network to unify the management of data collection,resource allocation and task scheduling within the system,realizes the virtualization of object-linked agent computing resources through Docker container technology,designs the target model of network latency and energy consumption,and introduces A3C algorithm in deep reinforcement learning,improves it according to scene characteristics,and sets corresponding optimization strategies.Mini-mize network delay and energy consumption;At the same time,to ensure that sensitive power business is handled in time,this paper designs the business dispatch model and task migration model,and solves the problem of server failure.Finally,the corresponding simulation program is designed to verify the feasibility and validity of this method,and to compare it with other existing mechanisms. 展开更多
关键词 Power communication network dispatching data networks resource allocation A3C algorithm deep reinforcement learning
下载PDF
Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning 被引量:4
10
作者 Bo Li Jingyi Huang +4 位作者 Shuangxia Bai Zhigang Gan Shiyang Liang Neretin Evgeny Shouwen Yao 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第1期64-81,共18页
Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Crit... Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Critic(SAC)algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process.At the same time,the complexity of the proposed algorithm is calculated,and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function.This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm(PSP-SAC)to improve the generalisation performance of UAV control decisions.Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training. 展开更多
关键词 air combat decision deep reinforcement learning parallel self-play SAC algorithm UAV
下载PDF
Evolutionary-assisted reinforcement learning for reservoir real-time production optimization under uncertainty 被引量:1
11
作者 Zhong-Zheng Wang Kai Zhang +6 位作者 Guo-Dong Chen Jin-Ding Zhang Wen-Dong Wang Hao-Chen Wang Li-Ming Zhang Xia Yan Jun Yao 《Petroleum Science》 SCIE EI CAS CSCD 2023年第1期261-276,共16页
Production optimization has gained increasing attention from the smart oilfield community because it can increase economic benefits and oil recovery substantially.While existing methods could produce high-optimality r... Production optimization has gained increasing attention from the smart oilfield community because it can increase economic benefits and oil recovery substantially.While existing methods could produce high-optimality results,they cannot be applied to real-time optimization for large-scale reservoirs due to high computational demands.In addition,most methods generally assume that the reservoir model is deterministic and ignore the uncertainty of the subsurface environment,making the obtained scheme unreliable for practical deployment.In this work,an efficient and robust method,namely evolutionaryassisted reinforcement learning(EARL),is proposed to achieve real-time production optimization under uncertainty.Specifically,the production optimization problem is modeled as a Markov decision process in which a reinforcement learning agent interacts with the reservoir simulator to train a control policy that maximizes the specified goals.To deal with the problems of brittle convergence properties and lack of efficient exploration strategies of reinforcement learning approaches,a population-based evolutionary algorithm is introduced to assist the training of agents,which provides diverse exploration experiences and promotes stability and robustness due to its inherent redundancy.Compared with prior methods that only optimize a solution for a particular scenario,the proposed approach trains a policy that can adapt to uncertain environments and make real-time decisions to cope with unknown changes.The trained policy,represented by a deep convolutional neural network,can adaptively adjust the well controls based on different reservoir states.Simulation results on two reservoir models show that the proposed approach not only outperforms the RL and EA methods in terms of optimization efficiency but also has strong robustness and real-time decision capacity. 展开更多
关键词 Production optimization deep reinforcement learning Evolutionary algorithm Real-time optimization Optimization under uncertainty
下载PDF
Hierarchical reinforcement learning guidance with threat avoidance
12
作者 LI Bohao WU Yunjie LI Guofei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1173-1185,共13页
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic... The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively. 展开更多
关键词 guidance law deep reinforcement learning(drl) threat avoidance hierarchical reinforcement learning
下载PDF
Practical Meta-Reinforcement Learning of Evolutionary Strategy with Quantum Neural Networks for Stock Trading
13
作者 Erik Sorensen Wei Hu 《Journal of Quantum Information Science》 2020年第3期43-71,共29页
We show the practicality of two existing meta-learning algorithms Model-</span></span><span><span><span> </span></span></span><span><span><span><spa... We show the practicality of two existing meta-learning algorithms Model-</span></span><span><span><span> </span></span></span><span><span><span><span style="font-family:Verdana;">Agnostic Meta-Learning and Fast Context Adaptation Via Meta-learning using an evolutionary strategy for parameter optimization, as well as propose two novel quantum adaptations of those algorithms using continuous quantum neural networks, for learning to trade portfolios of stocks on the stock market. The goal of meta-learning is to train a model on a variety of tasks, such that it can solve new learning tasks using only a small number of training samples. In our classical approach, we trained our meta-learning models on a variety of portfolios that contained 5 randomly sampled Consumer Cyclical stocks from a pool of 60. In our quantum approach, we trained our </span><span style="font-family:Verdana;">quantum meta-learning models on a simulated quantum computer with</span><span style="font-family:Verdana;"> portfolios containing 2 randomly sampled Consumer Cyclical stocks. Our findings suggest that both classical models could learn a new portfolio with 0.01% of the number of training samples to learn the original portfolios and can achieve a comparable performance within 0.1% Return on Investment of the Buy and Hold strategy. We also show that our much smaller quantum meta-learned models with only 60 model parameters and 25 training epochs </span><span style="font-family:Verdana;">have a similar learning pattern to our much larger classical meta-learned</span><span style="font-family:Verdana;"> models that have over 250,000 model parameters and 2500 training epochs. Given these findings</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">,</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> we also discuss the benefits of scaling up our experiments from a simulated quantum computer to a real quantum computer. To the best of our knowledge, we are the first to apply the ideas of both classical meta-learning as well as quantum meta-learning to enhance stock trading. 展开更多
关键词 reinforcement learning deep learning META-learning Evolutionary Strategy Quantum Computing Quantum Machine learning Stock Market algorithmic Trading
下载PDF
MADDPG-D2: An Intelligent Dynamic Task Allocation Algorithm Based on Multi-Agent Architecture Driven by Prior Knowledge
14
作者 Tengda Li Gang Wang Qiang Fu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第9期2559-2586,共28页
Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinfor... Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA. 展开更多
关键词 deep reinforcement learning dynamic task allocation intelligent decision-making multi-agent system MADDPG-D2 algorithm
下载PDF
Enhanced UAV Pursuit-Evasion Using Boids Modelling:A Synergistic Integration of Bird Swarm Intelligence and DRL
15
作者 Weiqiang Jin Xingwu Tian +3 位作者 Bohang Shi Biao Zhao Haibin Duan Hao Wu 《Computers, Materials & Continua》 SCIE EI 2024年第9期3523-3553,共31页
TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving i... TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving intrusion monitoring and interception.To address the challenges of data acquisition,real-world deployment,and the limited intelligence of existing algorithms in UAV pursuit-evasion tasks,we propose an innovative swarm intelligencebased UAV pursuit-evasion control framework,namely“Boids Model-based DRL Approach for Pursuit and Escape”(Boids-PE),which synergizes the strengths of swarm intelligence from bio-inspired algorithms and deep reinforcement learning(DRL).The Boids model,which simulates collective behavior through three fundamental rules,separation,alignment,and cohesion,is adopted in our work.By integrating Boids model with the Apollonian Circles algorithm,significant improvements are achieved in capturing UAVs against simple evasion strategies.To further enhance decision-making precision,we incorporate a DRL algorithm to facilitate more accurate strategic planning.We also leverage self-play training to continuously optimize the performance of pursuit UAVs.During experimental evaluation,we meticulously designed both one-on-one and multi-to-one pursuit-evasion scenarios,customizing the state space,action space,and reward function models for each scenario.Extensive simulations,supported by the PyBullet physics engine,validate the effectiveness of our proposed method.The overall results demonstrate that Boids-PE significantly enhance the efficiency and reliability of UAV pursuit-evasion tasks,providing a practical and robust solution for the real-world application of UAV pursuit-evasion missions. 展开更多
关键词 UAV pursuit-evasion swarm intelligence algorithm Boids model deep reinforcement learning self-play training
下载PDF
Multi-Robot Task Allocation Using Multimodal Multi-Objective Evolutionary Algorithm Based on Deep Reinforcement Learning
16
作者 苗镇华 黄文焘 +1 位作者 张依恋 范勤勤 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期377-387,共11页
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi... The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems. 展开更多
关键词 multi-robot task allocation multi-robot cooperation path planning multimodal multi-objective evo-lutionary algorithm deep reinforcement learning
原文传递
Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems
17
作者 Min Yang Guanjun Liu +1 位作者 Ziyuan Zhou Jiacun Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI 2024年第11期2327-2339,共13页
Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty... Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata,which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications.First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units(PDMUs), and a reverse breadth-first search(BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications. 展开更多
关键词 deep reinforcement learning(drl) performance improvement framework probabilistic automata real-time monitoring the key probabilistic decision-making units(PDMU)-action pair
下载PDF
Sum rate optimizing for multi-IRS-assisted UAV downlink transmission system using deep reinforcement learning
18
作者 Long Yuan He Xiaoli +1 位作者 Ye Yang Zhang Bo 《The Journal of China Universities of Posts and Telecommunications》 EI 2024年第5期23-33,共11页
By leveraging the high maneuverability of the unmanned aerial vehicle(UAV) and the expansive coverage of the intelligent reflecting surface(IRS), a multi-IRS-assisted UAV communication system aimed at maximizing the s... By leveraging the high maneuverability of the unmanned aerial vehicle(UAV) and the expansive coverage of the intelligent reflecting surface(IRS), a multi-IRS-assisted UAV communication system aimed at maximizing the sum data rate of all users was investigated in this paper. This is achieved through the joint optimization of the trajectory and transmit beamforming of the UAV, as well as the passive phase shift of the IRS. Nevertheless, the initial problem exhibits a high degree of non-convexity, posing challenges for conventional mathematical optimization techniques in delivering solutions that are both quick and efficient while maintaining low complexity. To address this issue, a novel scheme called the deep reinforcement learning(DRL)-based enhanced cooperative reflection network(DCRN) was proposed. This scheme effectively identifies optimal strategies, with the long short-term memory(LSTM) network improving algorithm convergence by extracting hidden state information. Simulation results demonstrate that the proposed scheme outperforms the baseline scheme, manifesting substantial enhancements in sum rate and superior performance. 展开更多
关键词 intelligent reflecting surface(IRS) unmanned aerial vehicle(UAV)communication deep reinforcement learning(drl) trajectory optimization
原文传递
基于DRL的四轮独立驱动电动车辆的侧向车速估计 被引量:1
19
作者 郑阳俊 贺帅 +5 位作者 帅志斌 李建秋 盖江涛 李勇 张颖 李国辉 《汽车安全与节能学报》 CAS CSCD 北大核心 2022年第2期309-316,共8页
为精确估计车辆行驶状态,提出了一种四轮独立驱动电动车辆侧向车速估计方法。基于深度强化学习(DRL)范式,设计了侧向车速估计方法的架构;基于深度确定性策略梯度(DDPG)算法,设计了DRL智能体;采用循环神经网络,搭建了DDPG算法中的Actor... 为精确估计车辆行驶状态,提出了一种四轮独立驱动电动车辆侧向车速估计方法。基于深度强化学习(DRL)范式,设计了侧向车速估计方法的架构;基于深度确定性策略梯度(DDPG)算法,设计了DRL智能体;采用循环神经网络,搭建了DDPG算法中的Actor网络和Critic网络。基于设计的奖励函数和训练场景,借助Matlab/Simulink软件,完成了算法的实现和训练;并通过在车辆双车道变换等实际行驶工况的仿真,进行了验证。结果表明:在经过了630次的学习训练之后,与扩展Kalman滤波方法相比,本文方法的估计精度提升40%。因而,本文方法能够在常用行驶工况中对车辆侧向车速进行估计。 展开更多
关键词 车辆动力学控制 四轮独立驱动电动车辆 侧向车速估计 深度强化学习(drl) 深度确定性策略梯度(DDPG)
下载PDF
基于DRL的飞行自组网自适应多模式路由算法 被引量:1
20
作者 黄凯 邱修林 +1 位作者 殷俊 杨余旺 《计算机工程与应用》 CSCD 北大核心 2023年第14期268-274,共7页
针对传统飞行自组网协议自适应能力不强、大规模网络应用场景效果不佳的问题,提出了一种基于深度强化学习的多模式路由算法。该算法综合利用系统吞吐量、分组递交率和平均端到端时延等参数构建价值函数,通过智能体自动调节各个无人机的... 针对传统飞行自组网协议自适应能力不强、大规模网络应用场景效果不佳的问题,提出了一种基于深度强化学习的多模式路由算法。该算法综合利用系统吞吐量、分组递交率和平均端到端时延等参数构建价值函数,通过智能体自动调节各个无人机的路由工作模式,将大型网络分解为主体网络和数个与之相连的小型异构网络,降低了系统复杂度,局部性能达到最优,提升了整个网络的性能。使用NS3仿真平台测试了算法和传统协议AODV、DSDV的性能指标。仿真结果表明,算法显著优于传统协议,且网络规模越大、负载越高则优势越明显,平均吞吐量提升了55.46%,分组递交率提升了39.85%,平均端到端时延降低了60.94%。 展开更多
关键词 飞行自组网 深度强化学习 自适应路由算法 混合路由
下载PDF
上一页 1 2 21 下一页 到第
使用帮助 返回顶部