期刊文献+
共找到779篇文章
< 1 2 39 >
每页显示 20 50 100
Unleashing the Power of Multi-Agent Reinforcement Learning for Algorithmic Trading in the Digital Financial Frontier and Enterprise Information Systems
1
作者 Saket Sarin Sunil K.Singh +4 位作者 Sudhakar Kumar Shivam Goyal Brij Bhooshan Gupta Wadee Alhalabi Varsha Arya 《Computers, Materials & Continua》 SCIE EI 2024年第8期3123-3138,共16页
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading... In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess. 展开更多
关键词 Neurodynamic Fintech multi-agent reinforcement learning algorithmic trading digital financial frontier
下载PDF
Service Function Chain Deployment Algorithm Based on Multi-Agent Deep Reinforcement Learning
2
作者 Wanwei Huang Qiancheng Zhang +2 位作者 Tao Liu YaoliXu Dalei Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4875-4893,共19页
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S... Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%. 展开更多
关键词 Network function virtualization service function chain Markov decision process multi-agent reinforcement learning
下载PDF
Performance Evaluation ofMulti-Agent Reinforcement Learning Algorithms
3
作者 Abdulghani M.Abdulghani Mokhles M.Abdulghani +1 位作者 Wilbur L.Walters Khalid H.Abed 《Intelligent Automation & Soft Computing》 2024年第2期337-352,共16页
Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation... Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation scenarios are explored in recreational cooperative augmented reality environments,as well as realworld scenarios in robotics.In this paper,we explore the realm of MARL and its potential applications in cooperative assignments.Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage.To accomplish this,we utilize the StarCraftMulti-Agent Challenge(SMAC)environment and train four MARL algorithms:Q-learning with Mixtures of Experts(QMIX),Value-DecompositionNetwork(VDN),Multi-agent Proximal PolicyOptimizer(MAPPO),andMulti-Agent Actor Attention Critic(MAA2C).These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission.Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario,while the VDN algorithm achieves the best results in the defending scenario.Specifically,the VDNalgorithmreaches the highest value of battle wonmean and the lowest value of dead alliesmean.Our research demonstrates the potential forMARL algorithms to be used in real-world applications,such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do.The SMAC environment provides a unique opportunity to test and evaluate MARL algorithms in a challenging and dynamic environment,and our results show that these algorithms can be used to achieve victory with minimal damage. 展开更多
关键词 reinforcement learning RL multi-agent MARL SMAC VDN QMIX MAPPO
下载PDF
A new accelerating algorithm for multi-agent reinforcement learning 被引量:1
4
作者 张汝波 仲宇 顾国昌 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第1期48-51,共4页
In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learni... In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm. 展开更多
关键词 distributed reinforcement learning accelerating algorithm machine learning multi-agent system
下载PDF
Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
5
作者 童亮 陆际联 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期133-137,共5页
Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on... Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm. 展开更多
关键词 multi-agent system reinforcement learning action prediction ROBOT
下载PDF
A Multi-Agent Reinforcement Learning-Based Collaborative Jamming System: Algorithm Design and Software-Defined Radio Implementation 被引量:1
6
作者 Luguang Wang Fei Song +5 位作者 Gui Fang Zhibin Feng Wen Li Yifan Xu Chen Pan Xiaojing Chu 《China Communications》 SCIE CSCD 2022年第10期38-54,共17页
In multi-agent confrontation scenarios, a jammer is constrained by the single limited performance and inefficiency of practical application. To cope with these issues, this paper aims to investigate the multi-agent ja... In multi-agent confrontation scenarios, a jammer is constrained by the single limited performance and inefficiency of practical application. To cope with these issues, this paper aims to investigate the multi-agent jamming problem in a multi-user scenario, where the coordination between the jammers is considered. Firstly, a multi-agent Markov decision process (MDP) framework is used to model and analyze the multi-agent jamming problem. Secondly, a collaborative multi-agent jamming algorithm (CMJA) based on reinforcement learning is proposed. Finally, an actual intelligent jamming system is designed and built based on software-defined radio (SDR) platform for simulation and platform verification. The simulation and platform verification results show that the proposed CMJA algorithm outperforms the independent Q-learning method and provides a better jamming effect. 展开更多
关键词 multi-agent reinforcement learning intelligent jamming collaborative jamming software-defined radio platform
下载PDF
A Multi-Agent System for Environmental Monitoring Using Boolean Networks and Reinforcement Learning 被引量:7
7
作者 Hanzhong Zheng Dejie Shi 《Journal of Cyber Security》 2020年第2期85-96,共12页
Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a pa... Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a particular event,Wireless sensor networks,consisting of a large number of interacting sensors,have been successful in a variety of applications where they are able to share information using different transmission protocols through the communication network.However,the irregular and dynamic environment requires traditional wireless sensor networks to have frequent communications to exchange the most recent information,which can easily generate high communication cost through the collaborative data collection and data transmission.High frequency communication also has high probability of failure because of long distance data transmission.In this paper,we developed a novel approach to multi-sensor environment monitoring network using the idea of distributed system.Its communication network can overcome the difficulties of high communication cost and Single Point of Failure(SPOF)through the decentralized approach,which performs in-network computation.Our approach makes use of Boolean networks that allows for a non-complex method of corroboration and retains meaningful information regarding the dynamics of the communication network.Our approach also reduces the complexity of data aggregation process and employee a reinforcement learning algorithm to predict future event inside the environment through the pattern recognition. 展开更多
关键词 multi-agent system reinforcement learning environment monitoring
下载PDF
Risk-sensitive reinforcement learning algorithms with generalized average criterion
8
作者 殷苌茗 王汉兴 赵飞 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI 2007年第3期405-416,共12页
A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robnsticity of solutions. The robnsticity of solutions maybe becomes a very important property for a learning s... A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robnsticity of solutions. The robnsticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or rain) is applied to study a class of important learning algorithms, dynamic prOgramming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robnsticity of reinforcement learning algorithms theoretically. 展开更多
关键词 reinforcement learning risk-sensitive generalized average algorithm convergence
下载PDF
A dynamic fusion path planning algorithm for mobile robots incorporating improved IB-RRT∗and deep reinforcement learning
9
作者 刘安东 ZHANG Baixin +2 位作者 CUI Qi ZHANG Dan NI Hongjie 《High Technology Letters》 EI CAS 2023年第4期365-376,共12页
Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl... Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments. 展开更多
关键词 mobile robot improved IB-RRT∗algorithm deep reinforcement learning(DRL) real-time dynamic obstacle avoidance
下载PDF
Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning
10
作者 Kun Jiang Wenzhang Liu +2 位作者 Yuanda Wang Lu Dong Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第7期1591-1604,共14页
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ... Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms. 展开更多
关键词 Latent variable model maximum entropy multi-agent reinforcement learning(MARL) multi-agent system
下载PDF
A survey on multi-agent reinforcement learning and its application
11
作者 Zepeng Ning Lihua Xie 《Journal of Automation and Intelligence》 2024年第2期73-91,共19页
Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and di... Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications. 展开更多
关键词 Benchmark environments multi-agent reinforcement learning multi-agent systems Stochastic games
下载PDF
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
12
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
下载PDF
Regional Multi-Agent Cooperative Reinforcement Learning for City-Level Traffic Grid Signal Control
13
作者 Yisha Li Ya Zhang +1 位作者 Xinde Li Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第9期1987-1998,共12页
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight... This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models. 展开更多
关键词 Human-machine cooperation mixed domain attention mechanism multi-agent reinforcement learning spatio-temporal feature traffic signal control
下载PDF
Collision-free parking recommendation based on multi-agent reinforcement learning in vehicular crowdsensing
14
作者 Xin Li Xinghua Lei +1 位作者 Xiuwen Liu Hang Xiao 《Digital Communications and Networks》 SCIE CSCD 2024年第3期609-619,共11页
The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle parti... The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle participation.However,instead of being an isolated module,the incentive mechanism usually interacts with other modules.Based on this,we capture this synergy and propose a Collision-free Parking Recommendation(CPR),a novel VCS system framework that integrates an incentive mechanism,a non-cooperative VCS game,and a multi-agent reinforcement learning algorithm,to derive an optimal parking strategy in real time.Specifically,we utilize an LSTM method to predict parking areas roughly for recommendations accurately.Its incentive mechanism is designed to motivate vehicle participation by considering dynamically priced parking tasks and social network effects.In order to cope with stochastic parking collisions,its non-cooperative VCS game further analyzes the uncertain interactions between vehicles in parking decision-making.Then its multi-agent reinforcement learning algorithm models the VCS campaign as a multi-agent Markov decision process that not only derives the optimal collision-free parking strategy for each vehicle independently,but also proves that the optimal parking strategy for each vehicle is Pareto-optimal.Finally,numerical results demonstrate that CPR can accomplish parking tasks at a 99.7%accuracy compared with other baselines,efficiently recommending parking spaces. 展开更多
关键词 Incentive mechanism Non-cooperative VCS game multi-agent reinforcement learning Collision-free parking strategy Vehicular crowdsensing
下载PDF
Safety-Constrained Multi-Agent Reinforcement Learning for Power Quality Control in Distributed Renewable Energy Networks
15
作者 Yongjiang Zhao Haoyi Zhong Chang Cyoon Lim 《Computers, Materials & Continua》 SCIE EI 2024年第4期449-471,共23页
This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature i... This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems. 展开更多
关键词 Power quality control multi-agent reinforcement learning safety-constrained MARL
下载PDF
Trading in Fast-ChangingMarkets withMeta-Reinforcement Learning
16
作者 Yutong Tian Minghan Gao +1 位作者 Qiang Gao Xiao-Hong Peng 《Intelligent Automation & Soft Computing》 2024年第2期175-188,共14页
How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop tradi... How to find an effective trading policy is still an open question mainly due to the nonlinear and non-stationary dynamics in a financial market.Deep reinforcement learning,which has recently been used to develop trading strategies by automatically extracting complex features from a large amount of data,is struggling to deal with fastchanging markets due to sample inefficiency.This paper applies the meta-reinforcement learning method to tackle the trading challenges faced by conventional reinforcement learning(RL)approaches in non-stationary markets for the first time.In our work,the history trading data is divided into multiple task data and for each of these data themarket condition is relatively stationary.Then amodel agnosticmeta-learning(MAML)-based tradingmethod involving a meta-learner and a normal learner is proposed.A trading policy is learned by the meta-learner across multiple task data,which is then fine-tuned by the normal learner through a small amount of data from a new market task before trading in it.To improve the adaptability of the MAML-based method,an ordered multiplestep updating mechanism is also proposed to explore the changing dynamic within a task market.The simulation results demonstrate that the proposed MAML-based trading methods can increase the annualized return rate by approximately 180%,200%,and 160%,increase the Sharpe ratio by 180%,90%,and 170%,and decrease the maximum drawdown by 30%,20%,and 40%,compared to the traditional RL approach in three stock index future markets,respectively. 展开更多
关键词 algorithmic trading reinforcement learning fast-changing market meta-reinforcement learning
下载PDF
Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features 被引量:2
17
作者 Wenzhang Liu Lu Dong +1 位作者 Dan Niu Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第9期1673-1686,共14页
In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action space.Hence,it is challenging for th... In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action space.Hence,it is challenging for the multi-agent teams to explore in the environment.Agents may achieve suboptimal policies and fail to solve some complex tasks.To improve the exploring efficiency as well as the performance of MARL tasks,in this paper,we propose a new approach by transferring the knowledge across tasks.Differently from the traditional MARL algorithms,we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of taskspecific weights.Then,we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further use.Finally,once the weights for target tasks are available,it will be easier to get a well-performed policy to explore in the target domain.Hence,the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned previously.We evaluate the proposed algorithm on two challenging MARL tasks:cooperative boxpushing and non-monotonic predator-prey.The experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms. 展开更多
关键词 Knowledge transfer multi-agent systems reinforcement learning successor features
下载PDF
Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning
18
作者 Yaoming ZHOU Fan YANG +2 位作者 Chaoyue ZHANG Shida LI Yongchao WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第8期311-328,共18页
Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance ... Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms. 展开更多
关键词 Unmanned combat aerial vehicle(UCAV)formation DECISION-MAKING Beyond-visual-range(BVR)air combat Advantage highlight multi-agent reinforcement learning(MARL)
原文传递
Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP
19
作者 Yi Xie Zhongyi Liu +1 位作者 Zhao Liu Yijun Gu 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期525-533,共9页
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat... Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP. 展开更多
关键词 reinforcement learning cooperative multi-agent system distributed constraint optimization (DCOP) constraint-reduced DCOP
下载PDF
Event-triggered H_(∞) consensus control for input-constrained multi-agent systems via reinforcement learning
20
作者 Jinxuan Zhang Chang-E Ren 《Control Theory and Technology》 EI CSCD 2024年第1期25-38,共14页
This article presents an event-triggered H_(∞) consensus control scheme using reinforcement learning (RL) for nonlinear second-order multi-agent systems (MASs) with control constraints. First, considering control con... This article presents an event-triggered H_(∞) consensus control scheme using reinforcement learning (RL) for nonlinear second-order multi-agent systems (MASs) with control constraints. First, considering control constraints, the constrained H_(∞) consensus problem is transformed into a multi-player zero-sum game with non-quadratic performance functions. Then, an event-triggered control method is presented to conserve communication resources and a new triggering condition is developed for each agent to make the triggering threshold independent of the disturbance attenuation level. To derive the optimal controller that can minimize the cost function in the case of worst disturbance, a constrained Hamilton–Jacobi–Bellman (HJB) equation is defined. Since it is difficult to solve analytically due to its strongly non-linearity, reinforcement learning (RL) is implemented to obtain the optimal controller. In specific, the optimal performance function and the worst-case disturbance are approximated by a time-triggered critic network;meanwhile, the optimal controller is approximated by event-triggered actor network. After that, Lyapunov analysis is utilized to prove the uniformly ultimately bounded (UUB) stability of the system and that the network weight errors are UUB. Finally, a simulation example is utilized to demonstrate the effectiveness of the control strategy provided. 展开更多
关键词 H_(∞)optimal control Input constrains multi-agent systems(MASs) reinforcement learning(RL)
原文传递
上一页 1 2 39 下一页 到第
使用帮助 返回顶部