期刊文献+
共找到1,144篇文章
< 1 2 58 >
每页显示 20 50 100
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:1
1
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents Deep reinforcement learning Proximal policy optimization(PPO)
下载PDF
Investigation of nano-talc as a filling material and a reinforcing agent in high density polyethylene (HDPE) 被引量:1
2
作者 CHEN Nanchun MA Lei ZHANG Tao 《Rare Metals》 SCIE EI CAS CSCD 2006年第z1期422-425,共4页
An experiment of producing high density polyethylene (HDPE) nano-composite filled with 4wt.% talc was presented. Acting as filler and a reinforcing agent in the HDPE, talc powder, sized at around 5 μm, was surface-tr... An experiment of producing high density polyethylene (HDPE) nano-composite filled with 4wt.% talc was presented. Acting as filler and a reinforcing agent in the HDPE, talc powder, sized at around 5 μm, was surface-treated with aluminum diethylene glycol dinitrate coupling agent before adding to the HDPE. Analyses of the reinforced HDPE nano-composite show significant improvement in its mechanical properties including, tensile strength (>26 MPa), break elongation (<1.1%), flexural strength (>22 MPa), and friction coefficients<0.11. The results demonstrate that, after surface-treated, talc can be used as a promising filling material and a reinforcing agent in making HDPE nano-composite. 展开更多
关键词 HDPE TALC FILLING material reinforcing agent NANO-COMPOSITE mechanical properties
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
3
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
Effect of Silane Coupling Agent Concentration on Interfacial Properties of Basalt Fiber Reinforced Composites
4
作者 Takao Ota 《材料科学与工程(中英文A版)》 2023年第2期36-42,共7页
The purpose of this study is to investigate the effect of the concentration of silane coupling solution on the tensile strength of basalt fiber and the interfacial properties of basalt fiber reinforced polymer composi... The purpose of this study is to investigate the effect of the concentration of silane coupling solution on the tensile strength of basalt fiber and the interfacial properties of basalt fiber reinforced polymer composites.The surface treatment of basalt fibers was carried out using an aqueous alcohol solution method.Basalt fibers were subjected to surface treatment with 3-Methacryloxypropyl trimethoxy silane at 0.5 wt.%,1 wt.%,2 wt.%,4 wt.%and 10 wt.%.The basalt monofilament tensile tests were carried out to investigate the variation in strength with the concentration of the silane coupling agent.The microdroplet test was performed to examine the effect of the concentration of the silane coupling agent on interfacial strength of basalt reinforced polymer composites.The film was formed on the surface of the basalt fiber treated silane coupling agent solution.The tensile strength of basalt fiber increased because the damaged fiber surface was repaired by the firm of silane coupling agent.The firm was effective in not only the surface protection of basalt fiber but also the improvement on the interfacial strength of fiber-matrix interface.However,the surface treatment using the high concentration silane coupling agent solution has an adverse effect on the mechanical properties of the composite materials,because of causing the degradation of the interfacial strength of the composite materials. 展开更多
关键词 Natural MINERAL FIBER reinforced composites BASALT FIBER SILANE coupling agent interface fiber/matrix BOND
下载PDF
竞争与合作视角下的多Agent强化学习研究进展
5
作者 田小禾 李伟 +3 位作者 许铮 刘天星 戚骁亚 甘中学 《计算机应用与软件》 北大核心 2024年第4期1-15,共15页
随着深度学习和强化学习研究取得长足的进展,多Agent强化学习已成为解决大规模复杂序贯决策问题的通用方法。为了推动该领域的发展,从竞争与合作的视角收集并总结近期相关的研究成果。该文介绍单Agent强化学习;分别介绍多Agent强化学习... 随着深度学习和强化学习研究取得长足的进展,多Agent强化学习已成为解决大规模复杂序贯决策问题的通用方法。为了推动该领域的发展,从竞争与合作的视角收集并总结近期相关的研究成果。该文介绍单Agent强化学习;分别介绍多Agent强化学习的基本理论框架——马尔可夫博弈以及扩展式博弈,并重点阐述了其在竞争、合作和混合三种场景下经典算法及其近期研究进展;讨论多Agent强化学习面临的核心挑战——环境的不稳定性,并通过一个例子对其解决思路进行总结与展望。 展开更多
关键词 深度学习 强化学习 agent强化学习 环境的不稳定性
下载PDF
Interfacial reinforcement of core-shell HMX@energetic polymer composites featuring enhanced thermal and safety performance
6
作者 Binghui Duan Hongchang Mo +3 位作者 Bojun Tan Xianming Lu Bozhou Wang Ning Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期387-399,共13页
The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves... The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves the pretreatment of HMX to endow—OH groups on the surface via polyalcohol bonding agent modification and in situ coating with nitrate ester-containing polymer,was proposed to address the problem.Two types of energetic polyether—glycidyl azide polymer(GAP)and nitrate modified GAP(GNP)were grafted onto HMX crystal based on isocyanate addition reaction bridged through neutral polymeric bonding agent(NPBA)layer.The morphology and structure of the HMX-based composites were characterized in detail and the core-shell structure was validated.The grafted polymers obviously enhanced the adhesion force between HMX crystals and fluoropolymer(F2314)binder.Due to the interfacial reinforcement among the components,the two HMX-based composites exhibited a remarkable increment of phase transition peak temperature by 10.2°C and 19.6°C with no more than 1.5%shell content,respectively.Furthermore,the impact and friction sensitivity of the composites decreased significantly as a result of the barrier produced by the grafted polymers.These findings will enhance the future prospects for the interface design of energetic composites aiming to solve the weak interface and safety concerns. 展开更多
关键词 HMX crystals Polyalcohol bonding agent Energetic polymer Core-shell structure Interfacial reinforcement
下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:1
7
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents
下载PDF
A new accelerating algorithm for multi-agent reinforcement learning 被引量:1
8
作者 张汝波 仲宇 顾国昌 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第1期48-51,共4页
In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learni... In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm. 展开更多
关键词 运算法则 机械学习能力 人工智能系统 数学模拟技术 机器人
下载PDF
Exploring Local Chemical Space in De Novo Molecular Generation Using Multi-Agent Deep Reinforcement Learning 被引量:2
9
作者 Wei Hu 《Natural Science》 2021年第9期412-424,共13页
Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also ... Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules. 展开更多
关键词 Multi-agent reinforcement Learning Actor-Critic Molecule Design SARS-CoV-2 COVID-19
下载PDF
Multi-agent reinforcement learning with cooperation based on eligibility traces
10
作者 杨玉君 程君实 陈佳品 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2004年第5期564-568,共5页
The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent′s behavio... The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent′s behavior usually affects the others′ behaviors. In traditional reinforcement learning, one agent takes the others location, so it is difficult to consider the others′ behavior, which decreases the learning efficiency. This paper proposes multi-agent reinforcement learning with cooperation based on eligibility traces, i.e. one agent estimates the other agent′s behavior with the other agent′s eligibility traces. The results of this simulation prove the validity of the proposed learning method. 展开更多
关键词 人工智能 机器学习 多主体增强学习系统 学习方法
下载PDF
Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
11
作者 童亮 陆际联 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期133-137,共5页
Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on... Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm. 展开更多
关键词 multi-agent system reinforcement learning action prediction ROBOT
下载PDF
Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP
12
作者 Yi Xie Zhongyi Liu +1 位作者 Zhao Liu Yijun Gu 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期525-533,共9页
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat... Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP. 展开更多
关键词 reinforcement learning cooperative multi-agent system distributed constraint optimization (DCOP) constraint-reduced DCOP
下载PDF
Incorporation of Perception-based Information in Robot Learning Using Fuzzy Reinforcement Learning Agents
13
作者 ZHOUChangjiu MENGQingchun +2 位作者 GUOZhongwen QUWeifen YINBo 《Journal of Ocean University of Qingdao》 2002年第1期93-100,共8页
Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope ... Robot learning in unstructured environments has been proved to be an extremely challenging problem, mainly because of many uncertainties always present in the real world. Human beings, on the other hand, seem to cope very well with uncertain and unpredictable environments, often relying on perception-based information. Furthermore, humans beings can also utilize perceptions to guide their learning on those parts of the perception-action space that are actually relevant to the task. Therefore, we conduct a research aimed at improving robot learning through the incorporation of both perception-based and measurement-based information. For this reason, a fuzzy reinforcement learning (FRL) agent is proposed in this paper. Based on a neural-fuzzy architecture, different kinds of information can be incorporated into the FRL agent to initialise its action network, critic network and evaluation feedback module so as to accelerate its learning. By making use of the global optimisation capability of GAs (genetic algorithms), a GA-based FRL (GAFRL) agent is presented to solve the local minima problem in traditional actor-critic reinforcement learning. On the other hand, with the prediction capability of the critic network, GAs can perform a more effective global search. Different GAFRL agents are constructed and verified by using the simulation model of a physical biped robot. The simulation analysis shows that the biped learning rate for dynamic balance can be improved by incorporating perception-based information on biped balancing and walking evaluation. The biped robot can find its application in ocean exploration, detection or sea rescue activity, as well as military maritime activity. 展开更多
关键词 机器人 感应装置 遗传算法 模糊神经
下载PDF
作战Agent的学习算法研究进展与发展趋势
14
作者 王步云 刘聚 《兵工自动化》 2023年第9期74-78,96,共6页
针对作战Agent适应性问题,梳理遗传算法、强化学习、神经网络等方法在实现作战Agent适应性方面的成果,总结每种方法的特点;介绍深度强化学习方法在实现作战Agent适应性方面的应用情况,讨论深度强化学习在该方面应用的发展趋势和研究重... 针对作战Agent适应性问题,梳理遗传算法、强化学习、神经网络等方法在实现作战Agent适应性方面的成果,总结每种方法的特点;介绍深度强化学习方法在实现作战Agent适应性方面的应用情况,讨论深度强化学习在该方面应用的发展趋势和研究重点。该研究可为后续相关研究提供参考。 展开更多
关键词 作战agent 适应性 强化学习 深度学习 神经网络
下载PDF
煤矿井下掘进机器人路径规划方法研究
15
作者 张旭辉 郑西利 +4 位作者 杨文娟 李语阳 麻兵 董征 陈鑫 《煤田地质与勘探》 EI CAS CSCD 北大核心 2024年第4期152-163,共12页
针对煤矿非全断面巷道条件下掘进机器人移机难度大、效率低下等问题,分析了煤矿井下非结构化环境特征及掘进机器人运动特性,提出了基于深度强化学习的掘进机器人机身路径规划方法。利用深度相机将巷道环境实时重建,在虚拟环境中建立掘... 针对煤矿非全断面巷道条件下掘进机器人移机难度大、效率低下等问题,分析了煤矿井下非结构化环境特征及掘进机器人运动特性,提出了基于深度强化学习的掘进机器人机身路径规划方法。利用深度相机将巷道环境实时重建,在虚拟环境中建立掘进机器人与巷道环境的碰撞检测模型,并使用层次包围盒法进行虚拟环境碰撞检测,形成巷道边界受限下的避障策略。考虑到掘进机器人形体大小且路径规划过程目标单一,在传统SAC算法的基础上引入后见经验回放技术,提出HER-SAC算法,该算法通过环境初始目标得到的轨迹扩展目标子集,以增加训练样本、提高训练速度。在此基础上,基于奖惩机制建立智能体,根据掘进机器人运动特性定义其状态空间与动作空间,在同一场景下分别使用3种算法对智能体进行训练,综合平均奖励值、最高奖励值、达到最高奖励值的步数以及鲁棒性4项性能指标进行对比分析。为进一步验证所提方法的可靠性,采用虚实结合的方式,通过调整目标位置设置2种实验场景进行掘进机器人的路径规划,并将传统SAC算法和HER-SAC算法的路径结果进行对比。结果表明:相较于PPO算法和SAC算法,HER-SAC算法收敛速度更快、综合性能达到最优;在2种实验场景下,HER-SAC算法相比传统SAC算法规划出的路径更加平滑、路径长度更短、路径终点与目标位置的误差在3.53 cm以内,能够有效地完成移机路径规划任务。该方法为煤矿掘进机器人的自主移机控制奠定了理论基础,为煤矿掘进设备自动化提供了新方法。 展开更多
关键词 掘进机器人 路径规划 深度强化学习 智能体 虚实结合 改进SAC算法 煤矿
下载PDF
移动性感知下基于负载均衡的任务迁移方案
16
作者 鲜永菊 韩瑞寅 +1 位作者 左维昊 汪帅鸽 《电讯技术》 北大核心 2024年第3期333-342,共10页
针对移动边缘计算中用户移动性导致服务器间负载分布不均,用户服务质量(Quality of Service,QoS)下降的问题,提出了一种移动性感知下的分布式任务迁移方案。首先,以优化网络中性能最差的用户QoS为目标,建立了一个长期极大极小化公平性问... 针对移动边缘计算中用户移动性导致服务器间负载分布不均,用户服务质量(Quality of Service,QoS)下降的问题,提出了一种移动性感知下的分布式任务迁移方案。首先,以优化网络中性能最差的用户QoS为目标,建立了一个长期极大极小化公平性问题(Max Min Fairness,MMF),利用李雅普诺夫(Lyapunov)优化将原问题转化解耦。然后,将其建模为去中心化部分可观测马尔可夫决策过程(Decentralized Partially Observable Markov Decision Process,Dec-POMDP),提出一种基于多智能体柔性演员-评论家(Soft Actor-Critic,SAC)的分布式任务迁移算法,将奖励函数解耦为节点奖励和用户个体奖励,分别基于节点负载均衡度和用户QoS施加奖励。仿真结果表明,相比于现有任务迁移方案,所提算法能够在保证用户QoS的前提下降低任务迁移率,保证系统负载均衡。 展开更多
关键词 移动边缘计算(MEC) 移动性感知 任务迁移 多智能体强化学习(MARL)
下载PDF
利用A2C-ac的城轨车车通信资源分配算法
17
作者 王瑞峰 张明 +1 位作者 黄子恒 何涛 《电子与信息学报》 EI CAS CSCD 北大核心 2024年第4期1306-1313,共8页
在城市轨道交通列车控制系统中,车车(T2T)通信作为新一代列车通信模式,利用列车间直接通信来降低通信时延,提高列车运行效率。在T2T通信与车地(T2G)通信并存场景下,针对复用T2G链路产生的干扰问题,在保证用户通信质量的前提下,该文提出... 在城市轨道交通列车控制系统中,车车(T2T)通信作为新一代列车通信模式,利用列车间直接通信来降低通信时延,提高列车运行效率。在T2T通信与车地(T2G)通信并存场景下,针对复用T2G链路产生的干扰问题,在保证用户通信质量的前提下,该文提出一种基于多智能体深度强化学习(MADRL)的改进优势演员-评论家(A2C-ac)资源分配算法。首先以系统吞吐量为优化目标,以T2T通信发送端为智能体,策略网络采用分层输出结构指导智能体选择需复用的频谱资源和功率水平,然后智能体做出相应动作并与T2T通信环境交互,得到该时隙下T2G用户和T2T用户吞吐量,价值网络对两者分别评价,利用权重因子β为每个智能体定制化加权时序差分(TD)误差,以此来灵活优化神经网络参数。最后,智能体根据训练好的模型联合选出最佳的频谱资源和功率水平。仿真结果表明,该算法相较于A2C算法和深度Q网络(DQN)算法,在收敛速度、T2T成功接入率、吞吐量等方面均有明显提升。 展开更多
关键词 城市轨道交通 资源分配 T2T通信 多智能体深度强化学习 A2C-ac算法
下载PDF
基于异构多智能体自注意力网络的路网信号协调顺序优化方法
18
作者 陈喜群 朱奕璋 +2 位作者 谢宁珂 耿茂思 吕朝锋 《交通运输系统工程与信息》 EI CSCD 北大核心 2024年第3期114-126,共13页
针对路网交通信号控制的复杂性,本文提出基于异构多智能体自注意力网络的路网信号协调顺序优化方法,提升路网范围内多交叉口信号控制策略性能。首先,模型考虑多交叉口交通流的空间相关性,采用基于自注意力机制的价值编码器学习交通观测... 针对路网交通信号控制的复杂性,本文提出基于异构多智能体自注意力网络的路网信号协调顺序优化方法,提升路网范围内多交叉口信号控制策略性能。首先,模型考虑多交叉口交通流的空间相关性,采用基于自注意力机制的价值编码器学习交通观测表征,实现路网级通信;其次,面向多智能体策略更新的非稳态环境,模型在前序智能体的联合动作基础上,基于多智能体优势分解的策略解码器,顺序决策最优反应动作;最后,设计基于有效行驶车辆的动作掩码机制,在时效完备区间自适应调节决策频率,并提出考虑等待公平性的时空压力奖励函数,进一步提高策略性能与实用性。在杭州路网数据集上验证模型有效性,结果表明:所提模型在2个数据集和5个性能指标上均优于基准模型;相比最优基准模型,所提模型平均行程时间降低10.89%,平均排队长度降低18.84%,平均等待时间降低22.21%。此外,所提模型的泛化能力更强,且显著减少车辆等待时间过长的情形。 展开更多
关键词 智能交通 深度强化学习 路网信号控制 异构多智能体 时空压力奖励
下载PDF
无人机集群对抗决策算法研究综述
19
作者 李潍 黄诗怡 +1 位作者 刘宏明 孙张俊 《航空科学技术》 2024年第4期9-17,共9页
无人机集群博弈对抗已经成为未来战争的发展趋势,无人机对抗决策算法的选择对提升无人机集群作战能力至关重要。本文深入探讨了基于规则的、基于博弈论的和基于神经网络的三大类无人机集群博弈对抗决策算法,并对它们的优势和局限性进行... 无人机集群博弈对抗已经成为未来战争的发展趋势,无人机对抗决策算法的选择对提升无人机集群作战能力至关重要。本文深入探讨了基于规则的、基于博弈论的和基于神经网络的三大类无人机集群博弈对抗决策算法,并对它们的优势和局限性进行了全面分析与总结。在此基础上,提出将“基于多智能体强化学习的信用分配模型”和“基于角色的多智能体强化学习模型”应用于无人机集群博弈对抗的研究思路。最后,强调了选择适当的决策算法对于提高无人机集群作战效能的重要性,并为未来无人机对抗决策的发展提出了有益的建议,为相关领域的研究和应用提供了深入见解。 展开更多
关键词 无人机集群 博弈对抗 专家系统 博弈论 多智能体强化学习
下载PDF
基于智能体建模的新型电力系统下火电企业市场交易策略 被引量:1
20
作者 李超英 檀勤良 《中国电力》 CSCD 北大核心 2024年第2期212-225,共14页
高比例新能源渗透情景下火电企业竞价策略研究对保障火电企业运营和推进新型电力系统建设具有重要意义。基于智能体建模框架,建立电力现货市场仿真模型和机组自学习决策模型。其中,环境模块建立了考虑源荷双侧不确定性的风光火储多方参... 高比例新能源渗透情景下火电企业竞价策略研究对保障火电企业运营和推进新型电力系统建设具有重要意义。基于智能体建模框架,建立电力现货市场仿真模型和机组自学习决策模型。其中,环境模块建立了考虑源荷双侧不确定性的风光火储多方参与的电力现货市场出清模型;智能体模块将火电机组投标决策过程刻画为部分观测马尔科夫决策过程,采用深度确定性策略梯度算法求解。以HRP-38节点系统为例进行仿真分析,明晰高比例新能源下火电企业市场交易策略。结果表明:在不考虑火电机组提供辅助服务的前提下,随着新能源渗透率的提高,仍有部分位置独特且具有成本优势的火电机组拥有竞争力;预测误差增大将使大容量火电机组投标策略趋于保守,而小容量机组投标策略相反;火电机组在各类场景下均具有隐性共谋倾向,即彼此隐藏信息时仍同时提高报价。 展开更多
关键词 电力市场 多智能体建模 强化学习 报价策略 辅助决策
下载PDF
上一页 1 2 58 下一页 到第
使用帮助 返回顶部