期刊文献+
共找到1,298篇文章
< 1 2 65 >
每页显示 20 50 100
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:3
1
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents Deep reinforcement learning Proximal policy optimization(PPO)
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
2
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
Knowledge Reasoning Method Based on Deep Transfer Reinforcement Learning:DTRLpath
3
作者 Shiming Lin Ling Ye +4 位作者 Yijie Zhuang Lingyun Lu Shaoqiu Zheng Chenxi Huang Ng Yin Kwee 《Computers, Materials & Continua》 SCIE EI 2024年第7期299-317,共19页
In recent years,with the continuous development of deep learning and knowledge graph reasoning methods,more and more researchers have shown great interest in improving knowledge graph reasoning methods by inferring mi... In recent years,with the continuous development of deep learning and knowledge graph reasoning methods,more and more researchers have shown great interest in improving knowledge graph reasoning methods by inferring missing facts through reasoning.By searching paths on the knowledge graph and making fact and link predictions based on these paths,deep learning-based Reinforcement Learning(RL)agents can demonstrate good performance and interpretability.Therefore,deep reinforcement learning-based knowledge reasoning methods have rapidly emerged in recent years and have become a hot research topic.However,even in a small and fixed knowledge graph reasoning action space,there are still a large number of invalid actions.It often leads to the interruption of RL agents’wandering due to the selection of invalid actions,resulting in a significant decrease in the success rate of path mining.In order to improve the success rate of RL agents in the early stages of path search,this article proposes a knowledge reasoning method based on Deep Transfer Reinforcement Learning path(DTRLpath).Before supervised pre-training and retraining,a pre-task of searching for effective actions in a single step is added.The RL agent is first trained in the pre-task to improve its ability to search for effective actions.Then,the trained agent is transferred to the target reasoning task for path search training,which improves its success rate in searching for target task paths.Finally,based on the comparative experimental results on the FB15K-237 and NELL-995 datasets,it can be concluded that the proposed method significantly improves the success rate of path search and outperforms similar methods in most reasoning tasks. 展开更多
关键词 Intelligent agent knowledge graph reasoning reinforcement transfer learning
下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:4
4
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents
下载PDF
Collaborative multi-agent reinforcement learning based on experience propagation 被引量:5
5
作者 Min Fang Frans C.A. Groen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第4期683-689,共7页
For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with c... For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance. 展开更多
关键词 multi-agent Q learning state list extracting experience sharing.
下载PDF
Exploring Local Chemical Space in De Novo Molecular Generation Using Multi-Agent Deep Reinforcement Learning 被引量:2
6
作者 Wei Hu 《Natural Science》 2021年第9期412-424,共13页
Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also ... Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules. 展开更多
关键词 multi-agent reinforcement learning Actor-Critic Molecule Design SARS-CoV-2 COVID-19
下载PDF
Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
7
作者 童亮 陆际联 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期133-137,共5页
Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on... Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm. 展开更多
关键词 multi-agent system reinforcement learning action prediction ROBOT
下载PDF
Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP
8
作者 Yi Xie Zhongyi Liu +1 位作者 Zhao Liu Yijun Gu 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期525-533,共9页
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat... Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP. 展开更多
关键词 reinforcement learning cooperative multi-agent system distributed constraint optimization (DCOP) constraint-reduced DCOP
下载PDF
Exploring Deep Reinforcement Learning with Multi Q-Learning 被引量:26
9
作者 Ethan Duryea Michael Ganger Wei Hu 《Intelligent Control and Automation》 2016年第4期129-144,共16页
Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but... Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58. 展开更多
关键词 reinforcement learning Deep learning multi Q-learning
下载PDF
Multi-task Coalition Parallel Formation Strategy Based on Reinforcement Learning 被引量:6
10
作者 JIANG Jian-Guo SU Zhao-Pin +1 位作者 QI Mei-Bin ZHANG Guo-Fu 《自动化学报》 EI CSCD 北大核心 2008年第3期349-352,共4页
代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证... 代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证明。而且,学习的加强被用来解决多工联盟平行的代理人行为策略,和这个过程形成被描述。在多工面向的领域,策略罐头有效地并且平行形式多工联盟。 展开更多
关键词 强化学习 多任务合并 平行排列 马尔可夫决策过程
下载PDF
Incorporation of Perception-based Information in Robot Learning Using Fuzzy Reinforcement Learning Agents
11
作者 ZHOUChangjiu MENGQingchun +2 位作者 GUOZhongwen QUWeifen YINBo 《Journal of Ocean University of Qingdao》 2002年第1期93-100,共8页
Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope ... Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perception-based and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation. The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity. 展开更多
关键词 Robot learning reinforcement learning agents neural-fuzzy systems genetic algorithms biped robot
下载PDF
Co op erative Iterative Learning Control of Linear Multi-agent Systems with a Dynamic Leader under Directed Top ologies 被引量:1
12
作者 PENG Zhou-Hua WANG Dan WANG Hao WANG Wei 《自动化学报》 EI CSCD 北大核心 2014年第11期2595-2601,共7页
关键词 迭代学习控制器 LYAPUNOV-KRASOVSKII泛函 多智能体系统 领袖 线性 多代理系统 输出信息 未知输入
下载PDF
Pass-ball trainning based on genetic reinforcement learning
13
作者 褚海涛 洪炳熔 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2001年第3期279-282,共4页
Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and rob... Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and robust of reinforcement learning algorithm and the property that genetic algorithm is suitable to the problem with high dimension,large collectivity, complex environment, and concludes that through proper training, the result verifies that this method is available in the complex multi agent environment. 展开更多
关键词 reinforcement GENETIC multi agent genetic reinforcement learning
下载PDF
AGVs Dispatching Using Multiple Cooperative Learning Agents
14
作者 李晓萌 YangYupu 《High Technology Letters》 EI CAS 2002年第3期83-87,共5页
AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs... AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs dispatching progress, so the AGVs system in this paper is treated as a cooperative learning multiagent system, in which each agent adopts multilevel decision method, which includes two level decisions: the option level and the action level. On the option level, an agent learns a policy to execute a subtask with the best response to the other AGVs’ current options. On the action level, an agent learns an optimal policy of actions for achieving his planned option. The method is applied to a AGVs’ dispatching simulation, and the performance of the AGVs system based on this method is verified. 展开更多
关键词 multi agent reinforcement learning multilevel decision AGVs dispatching
下载PDF
Research on active defense decision-making method for cloud boundary networks based on reinforcement learning of intelligent agent
15
作者 Huan Wang Yunlong Tang +4 位作者 Yan Wang Ning Wei Junyi Deng Zhiyan Bin Weilong Li 《High-Confidence Computing》 EI 2024年第2期50-61,共12页
The cloud boundary network environment is characterized by a passive defense strategy,discrete defense actions,and delayed defense feedback in the face of network attacks,ignoring the influence of the external environ... The cloud boundary network environment is characterized by a passive defense strategy,discrete defense actions,and delayed defense feedback in the face of network attacks,ignoring the influence of the external environment on defense decisions,thus resulting in poor defense effectiveness.Therefore,this paper proposes a cloud boundary network active defense model and decision method based on the reinforcement learning of intelligent agent,designs the network structure of the intelligent agent attack and defense game,and depicts the attack and defense game process of cloud boundary network;constructs the observation space and action space of reinforcement learning of intelligent agent in the non-complete information environment,and portrays the interaction process between intelligent agent and environment;establishes the reward mechanism based on the attack and defense gain,and encourage intelligent agents to learn more effective defense strategies.the designed active defense decision intelligent agent based on deep reinforcement learning can solve the problems of border dynamics,interaction lag,and control dispersion in the defense decision process of cloud boundary networks,and improve the autonomy and continuity of defense decisions. 展开更多
关键词 Active defense decision-making Cloud boundary network security Intelligent agent reinforcement learning Offensive and defensive game
原文传递
基于情感计算和Q-learning的agent自主追逐行为过程研究 被引量:3
16
作者 李木军 刘箴 +1 位作者 林君焕 于力鹏 《计算机应用研究》 CSCD 北大核心 2014年第6期1710-1713,1718,共5页
针对目前智能体间追逐过程中对智能体的情感因素考虑不充分的问题,提出一种新的解决方案:首先通过情感建模将个性、情感融入以两个智能体为基元的追逐行为中,使其运动更有多样性;其次通过博弈论引导决策的选取;最后收集对方运动的轨迹点... 针对目前智能体间追逐过程中对智能体的情感因素考虑不充分的问题,提出一种新的解决方案:首先通过情感建模将个性、情感融入以两个智能体为基元的追逐行为中,使其运动更有多样性;其次通过博弈论引导决策的选取;最后收集对方运动的轨迹点,用Q-learning加强学习方式学习归纳,以寻找最优追逐运动路径。在Visual Studio 2012编译环境下得到整个具有可信度的运动动画以及智能体的情感、体力等因素的变化规律图像。演示结果表明,此解决方案对于智能体间高效的追逐有很好的促进作用。 展开更多
关键词 情感计算 Q学习 博弈论 多智能体 自主追逐
下载PDF
基于多Agent深度强化学习的无人机协作规划方法
17
作者 王娜 马利民 +1 位作者 姜云春 宗成国 《计算机应用与软件》 北大核心 2024年第9期83-89,96,共8页
人机协作控制是多无人机任务规划的重要方式。考虑多无人机任务环境协同解释和策略控制一致性需求,提出基于多Agent深度强化学习的无人机协作规划方法。依据任务知识和行为状态,构建基于任务分配Agent的任务规划器,生成人机交互的相互... 人机协作控制是多无人机任务规划的重要方式。考虑多无人机任务环境协同解释和策略控制一致性需求,提出基于多Agent深度强化学习的无人机协作规划方法。依据任务知识和行为状态,构建基于任务分配Agent的任务规划器,生成人机交互的相互依赖关系;设计一种深度学习强化方法,解决群体行为最优策略和协同控制方法,并利用混合主动行为选择机制评估学习策略。实验结果表明:作为人机交互实例,所提方法通过深度强化学习使群体全局联合动作表现较好,学习速度和稳定性均能优于确定性策略梯度方法。同时,在跟随、自主和混合主动3种模式比较下,可以较好地控制无人机飞行路径和任务,为无人机集群任务执行提供了智能决策依据。 展开更多
关键词 agent规划 深度强化学习 无人机协同规划 混合主动行为
下载PDF
基于Q-learning的不确定环境BDI Agent最优策略规划研究 被引量:7
18
作者 万谦 刘玮 +1 位作者 徐龙龙 郭竞知 《计算机工程与科学》 CSCD 北大核心 2019年第1期166-172,共7页
BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问... BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问题,提出基于强化学习Q-learning算法来实现BDI Agent学习和规划的方法,并针对BDI的实现模型ASL的决策机制做出了改进,最后在ASL的仿真平台Jason上建立了迷宫的仿真,仿真实验表明,在加入Q-learning学习机制后的新的ASL系统中,Agent在不确定环境下依然可以完成任务。 展开更多
关键词 BDIagent 强化学习 Q-learning ASL JASON 规划
下载PDF
基于Q-learning的一种多Agent系统结构模型 被引量:2
19
作者 许培 薛伟 《计算机与数字工程》 2011年第8期8-11,共4页
多Agent系统是近年来比较热门的一个研究领域,而Q-learning算法是强化学习算法中比较著名的算法,也是应用最广泛的一种强化学习算法。以单Agent强化学习Q-learning算法为基础,提出了一种新的学习协作算法,并根据此算法提出了一种新的多A... 多Agent系统是近年来比较热门的一个研究领域,而Q-learning算法是强化学习算法中比较著名的算法,也是应用最广泛的一种强化学习算法。以单Agent强化学习Q-learning算法为基础,提出了一种新的学习协作算法,并根据此算法提出了一种新的多Agent系统体系结构模型,该结构的最大特点是提出了知识共享机制、团队结构思想和引入了服务商概念,最后通过仿真实验说明了该结构体系的优越性。 展开更多
关键词 agent系统 强化学习 Q学习 体系结构 知识共享
下载PDF
竞争与合作视角下的多Agent强化学习研究进展
20
作者 田小禾 李伟 +3 位作者 许铮 刘天星 戚骁亚 甘中学 《计算机应用与软件》 北大核心 2024年第4期1-15,共15页
随着深度学习和强化学习研究取得长足的进展,多Agent强化学习已成为解决大规模复杂序贯决策问题的通用方法。为了推动该领域的发展,从竞争与合作的视角收集并总结近期相关的研究成果。该文介绍单Agent强化学习;分别介绍多Agent强化学习... 随着深度学习和强化学习研究取得长足的进展,多Agent强化学习已成为解决大规模复杂序贯决策问题的通用方法。为了推动该领域的发展,从竞争与合作的视角收集并总结近期相关的研究成果。该文介绍单Agent强化学习;分别介绍多Agent强化学习的基本理论框架——马尔可夫博弈以及扩展式博弈,并重点阐述了其在竞争、合作和混合三种场景下经典算法及其近期研究进展;讨论多Agent强化学习面临的核心挑战——环境的不稳定性,并通过一个例子对其解决思路进行总结与展望。 展开更多
关键词 深度学习 强化学习 agent强化学习 环境的不稳定性
下载PDF
上一页 1 2 65 下一页 到第
使用帮助 返回顶部