期刊文献+
共找到15篇文章
< 1 >
每页显示 20 50 100
Proximal policy optimization with an integral compensator for quadrotor control 被引量:6
1
作者 Huan HU Qing-ling WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第5期777-795,共19页
We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled ... We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled by four learned neural networks,which directly map the system states to control commands in an end-to-end style.By introducing an integral compensator into the actor-critic framework,the speed tracking accuracy and robustness have been greatly enhanced.In addition,a two-phase learning scheme which includes both offline-and online-learning is developed for practical use.A model with strong generalization ability is learned in the offline phase.Then,the flight policy of the model is continuously optimized in the online learning phase.Finally,the performances of our proposed algorithm are compared with those of the traditional PID algorithm. 展开更多
关键词 Reinforcement learning proximal policy optimization Quadrotor control Neural network
原文传递
基于多智能体深度强化学习的无人机路径规划 被引量:4
2
作者 司鹏搏 吴兵 +2 位作者 杨睿哲 李萌 孙艳华 《北京工业大学学报》 CAS CSCD 北大核心 2023年第4期449-458,共10页
为解决多无人机(unmanned aerial vehicle, UAV)在复杂环境下的路径规划问题,提出一个多智能体深度强化学习UAV路径规划框架.该框架首先将路径规划问题建模为部分可观测马尔可夫过程,采用近端策略优化算法将其扩展至多智能体,通过设计UA... 为解决多无人机(unmanned aerial vehicle, UAV)在复杂环境下的路径规划问题,提出一个多智能体深度强化学习UAV路径规划框架.该框架首先将路径规划问题建模为部分可观测马尔可夫过程,采用近端策略优化算法将其扩展至多智能体,通过设计UAV的状态观测空间、动作空间及奖赏函数等实现多UAV无障碍路径规划;其次,为适应UAV搭载的有限计算资源条件,进一步提出基于网络剪枝的多智能体近端策略优化(network pruning-based multi-agent proximal policy optimization, NP-MAPPO)算法,提高了训练效率.仿真结果验证了提出的多UAV路径规划框架在各参数配置下的有效性及NP-MAPPO算法在训练时间上的优越性. 展开更多
关键词 无人机(unmanned aerial vehicle UAV) 复杂环境 路径规划 马尔可夫决策过程 多智能体近端策略优化算法(multi-agent proximal policy optimization MAPPO) 网络剪枝(network pruning NP)
下载PDF
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:3
3
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents Deep reinforcement learning proximal policy optimization(PPO)
下载PDF
Multi-agent reinforcement learning for edge information sharing in vehicular networks 被引量:3
4
作者 Ruyan Wang Xue Jiang +5 位作者 Yujie Zhou Zhidu Li Dapeng Wu Tong Tang Alexander Fedotov Vladimir Badenko 《Digital Communications and Networks》 SCIE CSCD 2022年第3期267-277,共11页
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape... To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments. 展开更多
关键词 Vehicular networks Edge information sharing Delay guarantee Multi-agent reinforcement learning proximal policy optimization
下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
5
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles Multi-agent deep reinforcement learning Cooperative hunting Feature embedding proximal policy optimization
下载PDF
Two-Stage Client Selection Scheme for Blockchain-Enabled Federated Learning in IoT
6
作者 Xiaojun Jin Chao Ma +2 位作者 Song Luo Pengyi Zeng Yifei Wei 《Computers, Materials & Continua》 SCIE EI 2024年第11期2317-2336,共20页
Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical o... Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical operation,there are still some problems with federated learning applications.Blockchain has the characteristics of decentralization,distribution,and security.The blockchain-enabled federated learning further improve the security and performance of model training,while also expanding the application scope of federated learning.Blockchain has natural financial attributes that help establish a federated learning data market.However,the data of federated learning tasks may be distributed across a large number of resource-constrained IoT devices,which have different computing,communication,and storage resources,and the data quality of each device may also vary.Therefore,how to effectively select the clients with the data required for federated learning task is a research hotspot.In this paper,a two-stage client selection scheme for blockchain-enabled federated learning is proposed,which first selects clients that satisfy federated learning task through attribute-based encryption,protecting the attribute privacy of clients.Then blockchain nodes select some clients for local model aggregation by proximal policy optimization algorithm.Experiments show that the model performance of our two-stage client selection scheme is higher than that of other client selection algorithms when some clients are offline and the data quality is poor. 展开更多
关键词 Blockchain federated learning attribute-based encryption client selection proximal policy optimization
下载PDF
A hybrid policy gradient and rule-based control framework for electric vehicle charging
7
作者 Brida V.Mbuwir Lennert Vanmunster +1 位作者 Klaas Thoelen Geert Deconinck 《Energy and AI》 2021年第2期1-15,共15页
Recent years have seen a significant increase in the adoption of electric vehicles,and investments in electric vehicle charging infrastructure and rooftop photo-voltaic installations.The ability to delay electric vehi... Recent years have seen a significant increase in the adoption of electric vehicles,and investments in electric vehicle charging infrastructure and rooftop photo-voltaic installations.The ability to delay electric vehicle charging provides inherent flexibility that can be used to compensate for the intermittency of photo-voltaic generation and optimize against fluctuating electricity prices.Exploiting this flexibility,however,requires smart control algorithms capable of handling uncertainties from photo-voltaic generation,electric vehicle energy demand and user’s behaviour.This paper proposes a control framework combining the advantages of reinforcement learning and rule-based control to coordinate the charging of a fleet of electric vehicles in an office building.The control objective is to maximize self-consumption of locally generated electricity and consequently,minimize the electricity cost of electric vehicle charging.The performance of the proposed framework is evaluated on a real-world data set from EnergyVille,a Belgian research institute.Simulation results show that the proposed control framework achieves a 62.5%electricity cost reduction compared to a business-as-usual or passive charging strategy.In addition,only a 5%performance gap is achieved in comparison to a theoretical near-optimal strategy that assumes perfect knowledge on the required energy and user behaviour of each electric vehicle. 展开更多
关键词 Electric vehicles Smart charging proximal policy optimization Reinforcement learning
原文传递
Loyal wingman task execution for future aerial combat:A hierarchical prior-based reinforcement learning approach 被引量:1
8
作者 Jiandong ZHANG Dinghan WANG +4 位作者 Qiming YANG Zhuoyong SHI Longmeng JI Guoqing SHI Yong WU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第5期462-481,共20页
In modern Beyond-Visual-Range(BVR)aerial combat,unmanned loyal wingmen are pivotal,yet their autonomous capabilities are limited.Our study introduces an advanced control algorithm based on hierarchical reinforcement l... In modern Beyond-Visual-Range(BVR)aerial combat,unmanned loyal wingmen are pivotal,yet their autonomous capabilities are limited.Our study introduces an advanced control algorithm based on hierarchical reinforcement learning to enhance these capabilities for critical missions like target search,positioning,and relay guidance.Structured on a dual-layer model,the algorithm’s lower layer manages basic aircraft maneuvers for optimal flight,while the upper layer processes battlefield dynamics,issuing precise navigational commands.This approach enables accurate navigation and effective reconnaissance for lead aircraft.Notably,our Hierarchical Prior-augmented Proximal Policy Optimization(HPE-PPO)algorithm employs a prior-based training,prior-free execution method,accelerating target positioning training and ensuring robust target reacquisition.This paper also improves missile relay guidance and promotes the effective guidance.By integrating this system with a human-piloted lead aircraft,this paper proposes a potent solution for cooperative aerial warfare.Rigorous experiments demonstrate enhanced survivability and efficiency of loyal wingmen,marking a significant contribution to Unmanned Aerial Vehicles(UAV)formation control research.This advancement is poised to drive substantial interest and progress in the related technological fields. 展开更多
关键词 Beyond-visual-range Loyal wingmen Hierarchical prior-augmented proximal policy optimization Unmanned aerial vehicles Warfare
原文传递
Efficient and fair PPO-based integrated scheduling method for multiple tasks of SATech-01 satellite
9
作者 Qi SHI Lu LI +5 位作者 Ziruo FANG Xingzi BI Huaqiu LIU Xiaofeng ZHANG Wen CHEN Jinpei YU 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第2期417-430,共14页
SATech-01 is an experimental satellite for space science exploration and on-orbit demonstration of advanced technologies.The satellite is equipped with 16 experimental payloads and supports multiple working modes to m... SATech-01 is an experimental satellite for space science exploration and on-orbit demonstration of advanced technologies.The satellite is equipped with 16 experimental payloads and supports multiple working modes to meet the observation requirements of various payloads.Due to the limitation of platform power supply and data storage systems,proposing reasonable mission planning schemes to improve scientific revenue of the payloads becomes a critical issue.In this article,we formulate the integrated task scheduling of SATech-01 as a multi-objective optimization problem and propose a novel Fair Integrated Scheduling with Proximal Policy Optimization(FIS-PPO)algorithm to solve it.We use multiple decision heads to generate decisions for each task and design the action mask to ensure the schedule meeting the platform constraints.Experimental results show that FIS-PPO could push the capability of the platform to the limit and improve the overall observation efficiency by 31.5%compared to rule-based plans currently used.Moreover,fairness is considered in the reward design and our method achieves much better performance in terms of equal task opportunities.Because of its low computational complexity,our task scheduling algorithm has the potential to be directly deployed on board for real-time task scheduling in future space projects. 展开更多
关键词 Satellite observatories SATech-01 Multi-modes platform Scheduling algorithms Reinforcement learning proximal policy optimization(PPO)
原文传递
A Data-driven Method for Fast AC Optimal Power Flow Solutions via Deep Reinforcement Learning 被引量:8
10
作者 Yuhao Zhou Bei Zhang +5 位作者 Chunlei Xu Tu Lan Ruisheng Diao Di Shi Zhiwei Wang Wei-Jen Lee 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2020年第6期1128-1139,共12页
With the increasing penetration of renewable energy,power grid operators are observing both fast and large fluctuations in power and voltage profiles on a daily basis.Fast and accurate control actions derived in real ... With the increasing penetration of renewable energy,power grid operators are observing both fast and large fluctuations in power and voltage profiles on a daily basis.Fast and accurate control actions derived in real time are vital to ensure system security and economics.To this end,solving alternating current(AC)optimal power flow(OPF)with operational constraints remains an important yet challenging optimization problem for secure and economic operation of the power grid.This paper adopts a novel method to derive fast OPF solutions using state-of-the-art deep reinforcement learning(DRL)algorithm,which can greatly assist power grid operators in making rapid and effective decisions.The presented method adopts imitation learning to generate initial weights for the neural network(NN),and a proximal policy optimization algorithm to train and test stable and robust artificial intelligence(AI)agents.Training and testing procedures are conducted on the IEEE 14-bus and the Illinois 200-bus systems.The results show the effectiveness of the method with significant potential for assisting power grid operators in real-time operations. 展开更多
关键词 Alternating current(AC)optimal power flow(OPF) deep reinforcement learning(DRL) imitation learning proximal policy optimization
原文传递
Deep Reinforcement Learning Based Real-time AC Optimal Power Flow Considering Uncertainties 被引量:4
11
作者 Yuhao Zhou Wei-Jen Lee +1 位作者 Ruisheng Diao Di Shi 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2022年第5期1098-1109,共12页
Modern power systems are experiencing larger fluctuations and more uncertainties caused by increased penetration of renewable energy sources(RESs) and power electronics equipment. Therefore, fast and accurate correcti... Modern power systems are experiencing larger fluctuations and more uncertainties caused by increased penetration of renewable energy sources(RESs) and power electronics equipment. Therefore, fast and accurate corrective control actions in real time are needed to ensure the system security and economics. This paper presents a novel method to derive realtime alternating current(AC) optimal power flow(OPF) solutions considering the uncertainties including varying renewable energy and topology changes by using state-of-the-art deep reinforcement learning(DRL) algorithm, which can effectively assist grid operators in making rapid and effective real-time decisions. The presented DRL-based approach first adopts a supervised-learning method from deep learning to generate good initial weights for neural networks, and then the proximal policy optimization(PPO) algorithm is applied to train and test the artificial intelligence(AI) agents for stable and robust performance. An ancillary classifier is designed to identify the feasibility of the AC OPF problem. Case studies conducted on the Illinois 200-bus system with wind generation variation and N-1 topology changes validate the effectiveness of the proposed method and demonstrate its great potential in promoting sustainable energy integration into the power system. 展开更多
关键词 Alternating current(AC)optimal power flow(OPF) deep learning deep reinforcement learning(DRL) renewable integration proximal policy optimization
原文传递
Deep Reinforcement Learning Based Charging Scheduling for Household Electric Vehicles in Active Distribution Network 被引量:2
12
作者 Taoyi Qi Chengjin Ye +2 位作者 Yuming Zhao Lingyang Li Yi Ding 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2023年第6期1890-1901,共12页
With the booming of electric vehicles(EVs) across the world, their increasing charging demands pose challenges to urban distribution networks. Particularly, due to the further implementation of time-of-use prices, the... With the booming of electric vehicles(EVs) across the world, their increasing charging demands pose challenges to urban distribution networks. Particularly, due to the further implementation of time-of-use prices, the charging behaviors of household EVs are concentrated on low-cost periods, thus generating new load peaks and affecting the secure operation of the medium-and low-voltage grids. This problem is particularly acute in many old communities with relatively poor electricity infrastructure. In this paper, a novel two-stage charging scheduling scheme based on deep reinforcement learning is proposed to improve the power quality and achieve optimal charging scheduling of household EVs simultaneously in active distribution network(ADN) during valley period. In the first stage, the optimal charging profiles of charging stations are determined by solving the optimal power flow with the objective of eliminating peak-valley load differences. In the second stage, an intelligent agent based on proximal policy optimization algorithm is developed to dispatch the household EVs sequentially within the low-cost period considering their discrete nature of arrival. Through powerful approximation of neural network, the challenge of imperfect knowledge is tackled effectively during the charging scheduling process. Finally, numerical results demonstrate that the proposed scheme exhibits great improvement in relieving peak-valley differences as well as improving voltage quality in the ADN. 展开更多
关键词 Household electric vehicles deep reinforcement learning proximal policy optimization charging scheduling active distribution network time-of-use price
原文传递
Low-carbon Economic Dispatch of Electricity-Heat-Gas Integrated Energy Systems Based on Deep Reinforcement Learning 被引量:1
13
作者 Yuxian Zhang Yi Han +1 位作者 Deyang Liu Xiao Dong 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2023年第6期1827-1841,共15页
The optimal dispatch methods of integrated energy systems(IESs) currently struggle to address the uncertainties resulting from renewable energy generation and energy demand. Moreover, the increasing intensity of the g... The optimal dispatch methods of integrated energy systems(IESs) currently struggle to address the uncertainties resulting from renewable energy generation and energy demand. Moreover, the increasing intensity of the greenhouse effect renders the reduction of IES carbon emissions a priority. To address these issues, a deep reinforcement learning(DRL)-based method is proposed to optimize the low-carbon economic dispatch model of an electricity-heat-gas IES. In the DRL framework, the optimal dispatch model of the IES is formulated as a Markov decision process(MDP). A reward function based on the reward-penalty ladder-type carbon trading mechanism(RPLT-CTM) is introduced to enable the DRL agents to learn more effective dispatch strategies. Moreover, a distributed proximal policy optimization(DPPO) algorithm, which is a novel policy-based DRL algorithm, is employed to train the DRL agents. The multithreaded architecture enhances the exploration ability of the DRL agents in complex environments. Experimental results illustrate that the proposed DPPO-based IES dispatch method can mitigate carbon emissions and reduce the total economic cost. The RPLT-CTM-based reward function outperforms the CTM-based methods, providing a 4.42% and 6.41% decrease in operating cost and carbon emission, respectively. Furthermore, the superiority and computational efficiency of DPPO compared with other DRL-based methods are demonstrated by a decrease of more than 1.53% and 3.23% in the operating cost and carbon emissions of the IES, respectively. 展开更多
关键词 Integrated energy system(IES) carbon trading optimal dispatch deep reinforcement learning(DRL) distributed proximal policy optimization
原文传递
A Reinforcement Learning approach for the continuous electricity market ofGermany: Trading from the perspective of a wind park operator 被引量:1
14
作者 Malte Lehna Björn Hoppmann +1 位作者 Christoph Scholz RenéHeinrich 《Energy and AI》 2022年第2期67-78,共12页
With the rising extension of renewable energies, the intraday electricity markets have recorded a growingpopularity amongst traders as well as electric utilities to cope with the induced volatility of the energysupply... With the rising extension of renewable energies, the intraday electricity markets have recorded a growingpopularity amongst traders as well as electric utilities to cope with the induced volatility of the energysupply. Through their short trading horizon and continuous nature, the intraday markets offer the abilityto adjust trading decisions from the day-ahead market or reduce trading risk in a short-term notice. Producersof renewable energies utilize the intraday market to lower their forecast risk, by modifying their providedcapacities based on current forecasts. However, the market dynamics are complex due to the fact that thepower grids have to remain stable and electricity is only partly storable. Consequently, robust and intelligenttrading strategies are required that are capable to operate in the intraday market. In this work, we proposea novel autonomous trading approach based on Deep Reinforcement Learning (DRL) algorithms as a possiblesolution. For this purpose, we model the intraday trade as a Markov Decision Process (MDP) and employ theProximal Policy Optimization (PPO) algorithm as our DRL approach. A simulation framework is introducedthat enables the trading of the continuous intraday price in a resolution of one minute steps. We test ourframework in a case study from the perspective of a wind park operator. We include next to general tradeinformation both price and wind forecasts. On a test scenario of German intraday trading results from 2018,we are able to outperform multiple baselines with at least 45.24% improvement, showing the advantage of theDRL algorithm. However, we also discuss limitations and enhancements of the DRL agent, in order to increasethe performance in future works. 展开更多
关键词 Deep Reinforcement Learning German intraday electricity trading Deep neural networks Markov Decision Process proximal policy optimization Electricity price forecast
原文传递
Real-time security margin control using deep reinforcement leamning
15
作者 Hannes Hagmar Robert Eriksson Le Anh Tuan 《Energy and AI》 2023年第3期52-63,共12页
This paper develops a real-time control method based on deep reinforcement learning aimed to determine the optimal control actions to maintain a sufficient secure operating limit.The secure operating limit refers to t... This paper develops a real-time control method based on deep reinforcement learning aimed to determine the optimal control actions to maintain a sufficient secure operating limit.The secure operating limit refers to the limit to the most stressed pre-contingency operating point of an electric power system that can withstand a set of credible contingencies without violating stability criteria.The developed deep reinforcement learning method uses a hybrid control scheme that is capable of simultaneously adjusting both discrete and continuous action variables.The performance is evaluated on a modified version of the Nordic32 test system.The results show that the developed deep reinforcement learning method quickly learns an effective control policy to ensure a sufficient secure operating limit for a range of different system scenarios.The performance is also compared to a control based on a rule-based look-up table and a deep reinforcement learning control adapted for discrete action spaces.The hybrid deep reinforcement learning control managed to achieve significantly better on all of the defined test sets,indicating that the possibility of adjusting both discrete and continuous action variables resulted in a more flexible and efficient control policy. 展开更多
关键词 Deep reinforcement learning Preventive control proximal policy optimization Secure operating limit
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部