期刊文献+
共找到892篇文章
< 1 2 45 >
每页显示 20 50 100
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
1
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
Exploring Local Chemical Space in De Novo Molecular Generation Using Multi-Agent Deep Reinforcement Learning 被引量:2
2
作者 Wei Hu 《Natural Science》 2021年第9期412-424,共13页
Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also ... Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules. 展开更多
关键词 multi-agent reinforcement learning Actor-Critic Molecule Design SARS-CoV-2 COVID-19
下载PDF
Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
3
作者 童亮 陆际联 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期133-137,共5页
Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on... Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm. 展开更多
关键词 multi-agent system reinforcement learning action prediction ROBOT
下载PDF
Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP
4
作者 Yi Xie Zhongyi Liu +1 位作者 Zhao Liu Yijun Gu 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期525-533,共9页
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat... Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP. 展开更多
关键词 reinforcement learning cooperative multi-agent system distributed constraint optimization (DCOP) constraint-reduced DCOP
下载PDF
Exploring Deep Reinforcement Learning with Multi Q-Learning 被引量:25
5
作者 Ethan Duryea Michael Ganger Wei Hu 《Intelligent Control and Automation》 2016年第4期129-144,共16页
Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but... Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58. 展开更多
关键词 reinforcement learning Deep learning multi Q-learning
下载PDF
Multi-task Coalition Parallel Formation Strategy Based on Reinforcement Learning 被引量:6
6
作者 JIANG Jian-Guo SU Zhao-Pin +1 位作者 QI Mei-Bin ZHANG Guo-Fu 《自动化学报》 EI CSCD 北大核心 2008年第3期349-352,共4页
代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证... 代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证明。而且,学习的加强被用来解决多工联盟平行的代理人行为策略,和这个过程形成被描述。在多工面向的领域,策略罐头有效地并且平行形式多工联盟。 展开更多
关键词 强化学习 多任务合并 平行排列 马尔可夫决策过程
下载PDF
Pass-ball trainning based on genetic reinforcement learning
7
作者 褚海涛 洪炳熔 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2001年第3期279-282,共4页
Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and rob... Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and robust of reinforcement learning algorithm and the property that genetic algorithm is suitable to the problem with high dimension,large collectivity, complex environment, and concludes that through proper training, the result verifies that this method is available in the complex multi agent environment. 展开更多
关键词 reinforcement GENETIC multi agent genetic reinforcement learning
下载PDF
AGVs Dispatching Using Multiple Cooperative Learning Agents
8
作者 李晓萌 YangYupu 《High Technology Letters》 EI CAS 2002年第3期83-87,共5页
AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs... AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs dispatching progress, so the AGVs system in this paper is treated as a cooperative learning multiagent system, in which each agent adopts multilevel decision method, which includes two level decisions: the option level and the action level. On the option level, an agent learns a policy to execute a subtask with the best response to the other AGVs’ current options. On the action level, an agent learns an optimal policy of actions for achieving his planned option. The method is applied to a AGVs’ dispatching simulation, and the performance of the AGVs system based on this method is verified. 展开更多
关键词 multi agent reinforcement learning multilevel decision AGVs dispatching
下载PDF
DP-Q(λ):大规模Web3D场景中Multi-agent实时路径规划算法 被引量:2
9
作者 闫丰亭 贾金原 《系统仿真学报》 CAS CSCD 北大核心 2019年第1期16-26,共11页
大规模场景中Multi-agent可视化路径规划算法,需要在Web3D上实现实时、稳定的碰撞避让。提出了动态概率单链收敛回溯DP-Q(λ)算法,采用方向启发约束,使用高奖赏或重惩罚训练方法,在单智能体上采用概率p(0-1随机数)调节奖罚值,决定下一... 大规模场景中Multi-agent可视化路径规划算法,需要在Web3D上实现实时、稳定的碰撞避让。提出了动态概率单链收敛回溯DP-Q(λ)算法,采用方向启发约束,使用高奖赏或重惩罚训练方法,在单智能体上采用概率p(0-1随机数)调节奖罚值,决定下一步的寻路策略,同时感知下一位置是否空闲,完成行走过程的避碰行为,将单智能体的路径规划方案扩展到多智能体路径规划方案中,并进一步在Web3D上实现了这一方案。实验结果表明:该算法实现的多智能体实时路径规划具备了在Web3D上自主学习的高效性和稳定性的要求。 展开更多
关键词 WEB3D 大规模未知环境 多智能体 强化学习 动态奖赏p 路径规划
下载PDF
智能家居系统的Multi-Agent建模研究 被引量:1
10
作者 曲宗峰 《家电科技》 2022年第5期16-21,共6页
以智能家居系统作为研究对象,通过Multi-Agent理论建立了相关模型,并采用价值分解网络(Value Decomposition Networks,VDN)作为模型算法对Q函数进行了优化分析。以此提出了建立智能家居中各智能体规则库和知识库,并系统式开放建立各智能... 以智能家居系统作为研究对象,通过Multi-Agent理论建立了相关模型,并采用价值分解网络(Value Decomposition Networks,VDN)作为模型算法对Q函数进行了优化分析。以此提出了建立智能家居中各智能体规则库和知识库,并系统式开放建立各智能体Agent的BDI(信念-需求-意图)集合的建议,从哲学逻辑到收益建模对智能家居使用效用进行量化分析及优化。 展开更多
关键词 多智能体系统 智能家居 场景 强化学习 Q学习
下载PDF
基于Q-learning的一种多Agent系统结构模型 被引量:2
11
作者 许培 薛伟 《计算机与数字工程》 2011年第8期8-11,共4页
多Agent系统是近年来比较热门的一个研究领域,而Q-learning算法是强化学习算法中比较著名的算法,也是应用最广泛的一种强化学习算法。以单Agent强化学习Q-learning算法为基础,提出了一种新的学习协作算法,并根据此算法提出了一种新的多A... 多Agent系统是近年来比较热门的一个研究领域,而Q-learning算法是强化学习算法中比较著名的算法,也是应用最广泛的一种强化学习算法。以单Agent强化学习Q-learning算法为基础,提出了一种新的学习协作算法,并根据此算法提出了一种新的多Agent系统体系结构模型,该结构的最大特点是提出了知识共享机制、团队结构思想和引入了服务商概念,最后通过仿真实验说明了该结构体系的优越性。 展开更多
关键词 多AGENT系统 强化学习 Q学习 体系结构 知识共享
下载PDF
基于Q-Learning的多功能雷达认知干扰决策方法 被引量:15
12
作者 张柏开 朱卫纲 《电讯技术》 北大核心 2020年第2期129-136,共8页
针对多功能雷达和认知电子战的快速发展所导致传统干扰决策方法难以适应现代化战争的问题,提出了一种基于Q-Learning的多功能雷达认知干扰决策方法。通过对比认知思想和干扰决策原理,将Q-Learning运用于认知干扰决策中并提出了认知干扰... 针对多功能雷达和认知电子战的快速发展所导致传统干扰决策方法难以适应现代化战争的问题,提出了一种基于Q-Learning的多功能雷达认知干扰决策方法。通过对比认知思想和干扰决策原理,将Q-Learning运用于认知干扰决策中并提出了认知干扰决策的算法步骤。以某多功能雷达为基础,通过分析其工作状态及对应干扰样式构建雷达状态转移图,通过仿真试验分析了各参数对决策性能的影响,为应对实际战场提供参考。仿真了在新状态加入下的决策过程、实际战场中转移概率对决策路径的影响以及四种主要干扰决策方法的决策性能对比。试验表明,该方法能够通过自主学习干扰效果完成干扰决策,更加贴合实际战场,对认知电子战的发展有一定的借鉴意义。 展开更多
关键词 多功能雷达 认知电子战 干扰决策 Q-learning 强化学习
下载PDF
Centralized Dynamic Spectrum Allocation in Cognitive Radio Networks Based on Fuzzy Logic and Q-Learning 被引量:4
13
作者 张文柱 刘栩辰 《China Communications》 SCIE CSCD 2011年第7期46-54,共9页
A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such ... A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such as linear programming or convex optimization, the new approach obtains the capability of iteratively on-line learning environment performance by using Reinforcement Learning (RL) algorithm after observing the variability and uncertainty of the heterogeneous wireless networks. Appropriate decision-making access actions can then be obtained by employing Fuzzy Inference System (FIS) which ensures the strategy being able to explore the possible status and exploit the experiences sufficiently. The new approach considers multi-objective such as spectrum efficiency and fairness between CR Access Points (AP) effectively. By interacting with the environment and accumulating comprehensive advantages, it can achieve the largest long-term reward expected on the desired objectives and implement the best action. Moreover, the present algorithm is relatively simple and does not require complex calculations. Simulation results show that the proposed approach can get better performance with respect to fixed frequency planning scheme or general dynamic spectrum allocation policy. 展开更多
关键词 cognitive radio dynamic spectrum allocation fuzzy inference reinforce learning multi-OBJECTIVE
下载PDF
基于Q-Learning的自动入侵响应决策方法 被引量:3
14
作者 刘璟 张玉臣 张红旗 《信息网络安全》 CSCD 北大核心 2021年第6期26-35,共10页
针对现有自动入侵响应决策自适应性差的问题,文章提出一种基于Q-Learning的自动入侵响应决策方法——Q-AIRD。Q-AIRD基于攻击图对网络攻防中的状态和动作进行形式化描述,通过引入攻击模式层识别不同能力的攻击者,从而做出有针对性的响... 针对现有自动入侵响应决策自适应性差的问题,文章提出一种基于Q-Learning的自动入侵响应决策方法——Q-AIRD。Q-AIRD基于攻击图对网络攻防中的状态和动作进行形式化描述,通过引入攻击模式层识别不同能力的攻击者,从而做出有针对性的响应动作;针对入侵响应的特点,采用Softmax算法并通过引入安全阈值θ、稳定奖励因子μ和惩罚因子ν进行响应策略的选取;基于投票机制实现对策略的多响应目的评估,满足多响应目的的需求,在此基础上设计了基于Q-Learning的自动入侵响应决策算法。仿真实验表明,Q-AIRD具有很好的自适应性,能够实现及时、有效的入侵响应决策。 展开更多
关键词 强化学习 自动入侵响应 Softmax算法 多目标决策
下载PDF
基于两层Q-Learning算法的多智能体协作方法研究
15
作者 王帅 《煤矿机电》 2013年第5期74-76,共3页
为使多智能体系统更能适应复杂环境,将分层方法引入强化学习。把两层Q-Learning强化学习算法用于4个智能体协作推动圆盘物体,在未知环境中实现路径规划的计算机模拟中。仿真结果说明该方法的有效性和可行性。
关键词 强化学习 Q学习 多智能体协作 路径规划
下载PDF
基于分层约束强化学习的综合能源多微网系统优化调度 被引量:4
16
作者 董雷 杨子民 +3 位作者 乔骥 陈盛 王新迎 蒲天骄 《电工技术学报》 EI CSCD 北大核心 2024年第5期1436-1453,共18页
构建多微网系统是消纳可再生能源、提升电网稳定性的有效方式。通过各微网的协调调度,可有效提升微网的运行效益以及可再生能源的消纳水平。现有多微网优化问题场景多元,变量众多,再加上源荷不确定性及多微网主体的数据隐私保护等问题,... 构建多微网系统是消纳可再生能源、提升电网稳定性的有效方式。通过各微网的协调调度,可有效提升微网的运行效益以及可再生能源的消纳水平。现有多微网优化问题场景多元,变量众多,再加上源荷不确定性及多微网主体的数据隐私保护等问题,为模型的高效求解带来了巨大挑战。为此,该文提出了一种分层约束强化学习优化方法。首先,构建了多微网分层强化学习优化框架,上层由智能体给出各微网储能优化策略和微网间功率交互策略;下层各微网以上层策略为约束,基于自身状态信息采用数学规划法对各微网内部的分布式电源出力进行自治优化。通过分层架构,减小通信压力,保护微网内部数据隐私,充分发挥强化学习对源荷不确定性的自适应能力,大幅提升了模型求解速度,并有效兼顾了数学规划法的求解精度。此外,将拉格朗日乘子法与传统强化学习方法相结合,提出一种约束强化学习求解方法,有效地解决了传统强化学习方法难以处理的约束越限问题。最后通过算例验证了该方法的有效性和优势。 展开更多
关键词 多微网系统 分层约束强化学习 不确定性 数据隐私保护
下载PDF
基于决策性能评估的多波束低地球轨道卫星网络资源分配算法
17
作者 王朝炜 庞明亮 +4 位作者 王粟 赵玲莉 高飞飞 崔高峰 王卫东 《通信学报》 EI CSCD 北大核心 2024年第7期37-47,共11页
为了解决多波束低地球轨道(LEO)卫星波束间同频干扰、频谱短缺、业务量分布不均等问题,针对单一决策网络缺乏自我修正能力、容易陷入局部最优解、无法充分考虑长期影响等弊端,提出了一种基于决策性能评估的资源分配算法。该算法引入不... 为了解决多波束低地球轨道(LEO)卫星波束间同频干扰、频谱短缺、业务量分布不均等问题,针对单一决策网络缺乏自我修正能力、容易陷入局部最优解、无法充分考虑长期影响等弊端,提出了一种基于决策性能评估的资源分配算法。该算法引入不同用户的业务满足指数来衡量系统的公平性,在考虑公平性的前提下优化系统的吞吐量性能,并将该优化问题建模为多目标优化问题。将具有时间相关性的连续资源分配过程建模为马尔可夫过程,提出基于决策性能评估的网络资源分配算法来解决该问题。所提算法可以根据评估网络的评估结果调整决策网络参数,从而优化资源分配方案,同时更新评估网络自身参数。通过迭代优化的方式,实现决策网络的准确预测。仿真结果表明,所提算法在吞吐量性能和公平性方面优于传统资源分配算法。 展开更多
关键词 多波束卫星 深度强化学习 多目标优化 资源管理
下载PDF
多智能体深度强化学习研究进展
18
作者 丁世飞 杜威 +2 位作者 张健 郭丽丽 丁玲 《计算机学报》 EI CAS CSCD 北大核心 2024年第7期1547-1567,共21页
深度强化学习(Deep Reinforcement Learning,DRL)在近年受到广泛的关注,并在各种领域取得显著的成功.由于现实环境通常包括多个与环境交互的智能体,多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)获得蓬勃的发展... 深度强化学习(Deep Reinforcement Learning,DRL)在近年受到广泛的关注,并在各种领域取得显著的成功.由于现实环境通常包括多个与环境交互的智能体,多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)获得蓬勃的发展,在各种复杂的序列决策任务上取得优异的表现.本文对多智能体深度强化学习的工作进展进行综述,主要内容分为三个部分.首先,我们回顾了几种常见的多智能体强化学习问题表示及其对应的合作、竞争和混合任务.其次,我们对目前的MADRL方法进行了全新的多维度的分类,并对不同类别的方法展开进一步介绍.其中,我们重点综述值函数分解方法,基于通信的MADRL方法以及基于图神经网络的MADRL方法.最后,我们研究了MADRL方法在现实场景中的主要应用.希望本文能够为即将进入这一快速发展领域的新研究人员和希望获得全方位了解并根据最新进展确定新方向的现有领域专家提供帮助. 展开更多
关键词 多智能体深度强化学习 基于值函数 基于策略 通信学习 图神经网络
下载PDF
基于深度强化学习的氢能综合能源系统优化调度方法
19
作者 张磊 吴红斌 +3 位作者 何叶 徐斌 张明星 丁明 《电力系统自动化》 EI CSCD 北大核心 2024年第16期132-141,共10页
为实现碳减排目标,氢能与综合能源系统的结合成为最具潜力的发展方向之一。针对当前氢能综合能源系统调度策略灵活性不足、复杂系统多目标优化求解困难等问题,提出一种基于深度强化学习的氢能综合能源系统优化调度方法。首先,采用耦合... 为实现碳减排目标,氢能与综合能源系统的结合成为最具潜力的发展方向之一。针对当前氢能综合能源系统调度策略灵活性不足、复杂系统多目标优化求解困难等问题,提出一种基于深度强化学习的氢能综合能源系统优化调度方法。首先,采用耦合设备的变工况模型,构建风-光-氢-冷-热-电综合能源系统,拓展设备联合供能空间。其次,考虑系统运行成本、碳排放量、系统自供给平衡度和新能源利用率,基于最优解距离构建多目标优化模型,激发智能体探索性。然后,通过时序片段表征优化深度强化学习算法,增强了智能体对系统状态变化的估计精度。最后,在源荷实测数据的基础上设计仿真算例。结果表明,所提方法可以有效提高氢能综合能源系统调度的灵活性,充分挖掘氢能的碳减排潜力,实现调度经济性和环保性的双重优化。 展开更多
关键词 综合能源系统 氢能 优化调度 深度强化学习 多目标优化 可再生能源
下载PDF
一种联合边缘服务器部署与服务放置的方法
20
作者 张俊娜 韩超臣 +2 位作者 陈家伟 赵晓焱 袁培燕 《计算机工程》 CAS CSCD 北大核心 2024年第10期266-280,共15页
边缘计算(EC)在靠近用户的网络边缘部署边缘服务器(ES),并将服务放置在ES上,从而可以满足用户的服务需求。独立研究ES部署和服务放置问题的成果已有很多,但两者存在高度耦合关系。考虑到EC系统的收益,有必要提供付费服务,使得EC系统处... 边缘计算(EC)在靠近用户的网络边缘部署边缘服务器(ES),并将服务放置在ES上,从而可以满足用户的服务需求。独立研究ES部署和服务放置问题的成果已有很多,但两者存在高度耦合关系。考虑到EC系统的收益,有必要提供付费服务,使得EC系统处理用户服务请求时会获得相应收入。同时,EC系统处理用户服务请求时会产生时延和能耗成本,为了最大化EC系统的收益,在用户服务请求和服务价格不同的约束下,需要合适的服务放置方案来提高EC系统的收益。为此,在ES与基站之间的位置关系、ES部署和服务放置之间的耦合关系、服务副本数和服务价格等约束下,提出一种包括改进的k-means算法和多智能体强化学习算法的两步方法,使EC系统的收益最大化。首先,构建一个联合ES部署和服务放置模型,其中ES部署明确考虑了基站之间的位置关系,服务放置明确考虑了ES部署的位置,以及不同的服务请求和价格;然后,基于基站的位置关系和基站的服务请求负载,通过带约束的k-means算法,在不同约束条件下分别确定最佳的ES部署位置以及ES的协作域;最后,以最大化EC系统收益为目标,通过多智能体强化学习算法在ES上放置服务。实验结果表明,与对比方法相比,所提方法能够提高收益7%~23%。 展开更多
关键词 边缘计算 边缘服务器部署 服务放置 K-MEANS聚类算法 多智能体强化学习算法
下载PDF
上一页 1 2 45 下一页 到第
使用帮助 返回顶部