期刊文献+
共找到66篇文章
< 1 2 4 >
每页显示 20 50 100
Policy Optimization Study Based on Evolutionary Learning
1
作者 刘素平 丁永生 《Journal of Donghua University(English Edition)》 EI CAS 2009年第6期621-624,共4页
In order to achieve an intelligent and automated self-management network,dynamic policy configuration and selection are needed.A certain policy only suits to a certain network environment.If the network environment ch... In order to achieve an intelligent and automated self-management network,dynamic policy configuration and selection are needed.A certain policy only suits to a certain network environment.If the network environment changes,the certain policy does not suit any more.Thereby,the policy-based management should also have similar "natural selection" process.Useful policy will be retained,and policies which have lost their effectiveness are eliminated.A policy optimization method based on evolutionary learning was proposed.For different shooting times,the priority of policy with high shooting times is improved,while policy with a low rate has lower priority,and long-term no shooting policy will be dormant.Thus the strategy for the survival of the fittest is realized,and the degree of self-learning in policy management is improved. 展开更多
关键词 policy-based management evolution learning policy optimization
下载PDF
A Lyapunov characterization of robust policy optimization
2
作者 Leilei Cui Zhong-Ping Jiang 《Control Theory and Technology》 EI CSCD 2023年第3期374-389,共16页
In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each... In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method. 展开更多
关键词 policy optimization policy iteration(PI)-Input-to-state stability(ISS) Lyapunov's direct method
原文传递
Proximal policy optimization with an integral compensator for quadrotor control 被引量:6
3
作者 Huan HU Qing-ling WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第5期777-795,共19页
We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled ... We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled by four learned neural networks,which directly map the system states to control commands in an end-to-end style.By introducing an integral compensator into the actor-critic framework,the speed tracking accuracy and robustness have been greatly enhanced.In addition,a two-phase learning scheme which includes both offline-and online-learning is developed for practical use.A model with strong generalization ability is learned in the offline phase.Then,the flight policy of the model is continuously optimized in the online learning phase.Finally,the performances of our proposed algorithm are compared with those of the traditional PID algorithm. 展开更多
关键词 Reinforcement learning Proximal policy optimization Quadrotor control Neural network
原文传递
STUDY ON THE OPTIMIZATION OF TRANSPORT CONTROL POLICY IN COMMUNICATION NETWORK 被引量:1
4
作者 Fan Shuyan Han Weizhan Lu Ran 《Journal of Electronics(China)》 2010年第2期261-266,共6页
In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies ... In communication networks with policy-based Transport Control on-Demand (TCoD) function,the transport control policies play a great impact on the network effectiveness. To evaluate and optimize the transport policies in communication network,a policy-based TCoD network model is given and a comprehensive evaluation index system of the network effectiveness is put forward from both network application and handling mechanism perspectives. A TCoD network prototype system based on Asynchronous Transfer Mode/Multi-Protocol Label Switching (ATM/MPLS) is introduced and some experiments are performed on it. The prototype system is evaluated and analyzed with the comprehensive evaluation index system. The results show that the index system can be used to judge whether the communication network can meet the application requirements or not,and can provide references for the optimization of the transport policies so as to improve the communication network effectiveness. 展开更多
关键词 Communication network Comprehensive evaluation index system Network Application Effectiveness (NAE) Transport Control on-Demand (TCoD) policy optimization
下载PDF
A STOCHASTIC TRUST-REGION FRAMEWORK FOR POLICY OPTIMIZATION
5
作者 Mingming Zhao Yongfeng Li Zaiwen Wen 《Journal of Computational Mathematics》 SCIE CSCD 2022年第6期1004-1030,共27页
In this paper,we study a few challenging theoretical and numerical issues on the well known trust region policy optimization for deep reinforcement learning.The goal is to find a policy that maximizes the total expect... In this paper,we study a few challenging theoretical and numerical issues on the well known trust region policy optimization for deep reinforcement learning.The goal is to find a policy that maximizes the total expected reward when the agent acts according to the policy.The trust region subproblem is constructed with a surrogate function coherent to the total expected reward and a general distance constraint around the latest policy.We solve the subproblem using a preconditioned stochastic gradient method with a line search scheme to ensure that each step promotes the model function and stays in the trust region.To overcome the bias caused by sampling to the function estimations under the random settings,we add the empirical standard deviation of the total expected reward to the predicted increase in a ratio in order to update the trust region radius and decide whether the trial point is accepted.Moreover,for a Gaussian policy which is commonly used for continuous action space,the maximization with respect to the mean and covariance is performed separately to control the entropy loss.Our theoretical analysis shows that the deterministic version of the proposed algorithm tends to generate a monotonic improvement of the total expected reward and the global convergence is guaranteed under moderate assumptions.Comparisons with the state-of-the–art methods demonstrate the effectiveness and robustness of our method over robotic controls and game playings from OpenAI Gym. 展开更多
关键词 Deep reinforcement learning Stochastic trust region method policy optimization Global convergence Entropy control
原文传递
Two-Stage Client Selection Scheme for Blockchain-Enabled Federated Learning in IoT
6
作者 Xiaojun Jin Chao Ma +2 位作者 Song Luo Pengyi Zeng Yifei Wei 《Computers, Materials & Continua》 SCIE EI 2024年第11期2317-2336,共20页
Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical o... Federated learning enables data owners in the Internet of Things(IoT)to collaborate in training models without sharing private data,creating new business opportunities for building a data market.However,in practical operation,there are still some problems with federated learning applications.Blockchain has the characteristics of decentralization,distribution,and security.The blockchain-enabled federated learning further improve the security and performance of model training,while also expanding the application scope of federated learning.Blockchain has natural financial attributes that help establish a federated learning data market.However,the data of federated learning tasks may be distributed across a large number of resource-constrained IoT devices,which have different computing,communication,and storage resources,and the data quality of each device may also vary.Therefore,how to effectively select the clients with the data required for federated learning task is a research hotspot.In this paper,a two-stage client selection scheme for blockchain-enabled federated learning is proposed,which first selects clients that satisfy federated learning task through attribute-based encryption,protecting the attribute privacy of clients.Then blockchain nodes select some clients for local model aggregation by proximal policy optimization algorithm.Experiments show that the model performance of our two-stage client selection scheme is higher than that of other client selection algorithms when some clients are offline and the data quality is poor. 展开更多
关键词 Blockchain federated learning attribute-based encryption client selection proximal policy optimization
下载PDF
OPTIMAL HARVESTING POLICY FOR INSHORE-OFFSHORE FISHERY MODEL WITH IMPULSIVE DIFFUSION 被引量:7
7
作者 董玲珍 陈兰荪 孙丽华 《Acta Mathematica Scientia》 SCIE CSCD 2007年第2期405-412,共8页
This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. Th... This article studies the inshore-offshore fishery model with impulsive diffusion. The existence and global asymptotic stability of both the trivial periodic solution and the positive periodic solution are obtained. The complexity of this system is also analyzed. Moreover, the optimal harvesting policy are given for the inshore subpopulation, which includes the maximum sustainable yield and the corresponding harvesting effort. 展开更多
关键词 Impulsive diffusion inshore-offshore fishery model global asymptotic stability periodic solution optimal harvesting policy
下载PDF
Optimal switching policy for performance enhancement of distributed parameter systems based on event-driven control 被引量:1
8
作者 穆文英 崔宝同 +1 位作者 楼旭阳 李纹 《Chinese Physics B》 SCIE EI CAS CSCD 2014年第7期211-217,共7页
This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple... This paper aims to improve the performance of a class of distributed parameter systems for the optimal switching of actuators and controllers based on event-driven control. It is assumed that in the available multiple actuators, only one actuator can receive the control signal and be activated over an unfixed time interval, and the other actuators keep dormant. After incorporating a state observer into the event generator, the event-driven control loop and the minimum inter-event time are ultimately bounded. Based on the event-driven state feedback control, the time intervals of unfixed length can be obtained. The optimal switching policy is based on finite horizon linear quadratic optimal control at the beginning of each time subinterval. A simulation example demonstrate the effectiveness of the proposed policy. 展开更多
关键词 distributed parameter systems optimal switching policy EVENT-DRIVEN
下载PDF
RECURSIVE UTILITY,PRODUCTIVE GOVERNMENT EXPENDITURE AND OPTIMAL FISCAL POLICY 被引量:1
9
作者 Wang Haijun Hu Shigeng Zhang Xueqing 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2005年第3期277-288,共12页
This paper employs a stochastic endogenous growth model extended to the case of a recursive utility function which can disentangle intertemporal substitution from risk aversion to analyze productive government expendi... This paper employs a stochastic endogenous growth model extended to the case of a recursive utility function which can disentangle intertemporal substitution from risk aversion to analyze productive government expenditure and optimal fiscal policy, particularly stresses the importance of factor income. First, the explicit solutions of the central planner's stochastic optimization problem are derived, the growth maximizing and welfare-maximizing government expenditure policies are obtained and their standing in conflict or coincidence depends upon intertemporal substitution. Second, the explicit solutions of the representative individual's stochastic optimization problem which permits to tax on capital income and labor income separately are derived ,and it is found that the effect of risk on growth crucially depends on the degree of risk aversion,the intertemporal elasticity of substitution and the capital income share. Finally, a flexible optimal tax policy which can be internally adjusted to a certain extent is derived, and it is found that the distribution of factor income plays an important role in designing the optimal tax policy. 展开更多
关键词 endogenous growth recursive utility productive government expenditure optimal fiscal policy.
下载PDF
The Dragon-shape Strategy of China's Regional Economic Development and Policy Analysis 被引量:1
10
作者 Jiankun Song Wenjie Zhang 《Chinese Business Review》 2004年第7期50-53,共4页
According to this paper, the dragon-shape strategy is the optimized option of China's future strategy with respect to the geographic distribution of regional economy.
关键词 geographic distribution optimization of strategy mode of policy
下载PDF
Optimal policy for controlling two-server queueing systems with jockeying
11
作者 LIN Bing LIN Yuchen BHATNAGAR Rohit 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第1期144-155,共12页
This paper studies the optimal policy for joint control of admission, routing, service, and jockeying in a queueing system consisting of two exponential servers in parallel.Jobs arrive according to a Poisson process.U... This paper studies the optimal policy for joint control of admission, routing, service, and jockeying in a queueing system consisting of two exponential servers in parallel.Jobs arrive according to a Poisson process.Upon each arrival, an admission/routing decision is made, and the accepted job is routed to one of the two servers with each being associated with a queue.After each service completion, the servers have an option of serving a job from its own queue, serving a jockeying job from another queue, or staying idle.The system performance is inclusive of the revenues from accepted jobs, the costs of holding jobs in queues, the service costs and the job jockeying costs.To maximize the total expected discounted return, we formulate a Markov decision process(MDP) model for this system.The value iteration method is employed to characterize the optimal policy as a hedging point policy.Numerical studies verify the structure of the hedging point policy which is convenient for implementing control actions in practice. 展开更多
关键词 queueing system jockeying optimal policy Markov decision process(MDP) dynamic programming
下载PDF
A generalized geometric process based repairable system model with bivariate policy
12
作者 MA Ning YE Jimin WANG Junyuan 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期631-641,共11页
The maintenance model of simple repairable system is studied.We assume that there are two types of failure,namely type Ⅰ failure(repairable failure)and type Ⅱ failure(irrepairable failure).As long as the type Ⅰ fai... The maintenance model of simple repairable system is studied.We assume that there are two types of failure,namely type Ⅰ failure(repairable failure)and type Ⅱ failure(irrepairable failure).As long as the type Ⅰ failure occurs,the system will be repaired immediately,which is failure repair(FR).Between the(n-1)th and the nth FR,the system is supposed to be preventively repaired(PR)as the consecutive working time of the system reaches λ^(n-1) T,where λ and T are specified values.Further,we assume that the system will go on working when the repair is finished and will be replaced at the occurrence of the Nth type Ⅰ failure or the occurrence of the first type Ⅱ failure,whichever occurs first.In practice,the system will degrade with the increasing number of repairs.That is,the consecutive working time of the system forms a decreasing generalized geometric process(GGP)whereas the successive repair time forms an increasing GGP.A simple bivariate policy(T,N)repairable model is introduced based on GGP.The alternative searching method is used to minimize the cost rate function C(N,T),and the optimal(T,N)^(*) is obtained.Finally,numerical cases are applied to demonstrate the reasonability of this model. 展开更多
关键词 renewal reward theorem generalized geometric process(GGP) average cost rate optimal policy replacement
下载PDF
基于多智能体深度强化学习的无人机路径规划 被引量:4
13
作者 司鹏搏 吴兵 +2 位作者 杨睿哲 李萌 孙艳华 《北京工业大学学报》 CAS CSCD 北大核心 2023年第4期449-458,共10页
为解决多无人机(unmanned aerial vehicle, UAV)在复杂环境下的路径规划问题,提出一个多智能体深度强化学习UAV路径规划框架.该框架首先将路径规划问题建模为部分可观测马尔可夫过程,采用近端策略优化算法将其扩展至多智能体,通过设计UA... 为解决多无人机(unmanned aerial vehicle, UAV)在复杂环境下的路径规划问题,提出一个多智能体深度强化学习UAV路径规划框架.该框架首先将路径规划问题建模为部分可观测马尔可夫过程,采用近端策略优化算法将其扩展至多智能体,通过设计UAV的状态观测空间、动作空间及奖赏函数等实现多UAV无障碍路径规划;其次,为适应UAV搭载的有限计算资源条件,进一步提出基于网络剪枝的多智能体近端策略优化(network pruning-based multi-agent proximal policy optimization, NP-MAPPO)算法,提高了训练效率.仿真结果验证了提出的多UAV路径规划框架在各参数配置下的有效性及NP-MAPPO算法在训练时间上的优越性. 展开更多
关键词 无人机(unmanned aerial vehicle UAV) 复杂环境 路径规划 马尔可夫决策过程 多智能体近端策略优化算法(multi-agent proximal policy optimization MAPPO) 网络剪枝(network pruning NP)
下载PDF
Distributive Disturbance and Optimal Policy in Stochastic Control Model
14
作者 汪红初 胡适耕 张学清 《Journal of Southwest Jiaotong University(English Edition)》 2006年第4期408-414,共7页
To investigate the equilibrium relationships between the volatility of capital and income, taxation, and ance in a stochastic control model, the uniqueness of the solution to this model was proved by using the method ... To investigate the equilibrium relationships between the volatility of capital and income, taxation, and ance in a stochastic control model, the uniqueness of the solution to this model was proved by using the method of dynamic programming under the introduction of distributive disturbance and elastic labor supply. Furthermore, the effects of two types of shocks on labor-leisure choice, economic growth rate and welfare were numerically analyzed, and then the optimal tax policy was derived. 展开更多
关键词 Stochastic optimization Dynamic programming Bellman equation Macroeconomic equilibrium Optimal policy
下载PDF
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:3
15
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents Deep reinforcement learning Proximal policy optimization(PPO)
下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
16
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles Multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
下载PDF
B-Spline-Based Curve Fitting to Cam Pitch Curve Using Reinforcement Learning 被引量:1
17
作者 Zhiwei Lin Tianding Chen +3 位作者 Yingtao Jiang Hui Wang Shuqin Lin Ming Zhu 《Intelligent Automation & Soft Computing》 SCIE 2023年第5期2145-2164,共20页
Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing ... Directly applying the B-spline interpolation function to process plate cams in a computer numerical control(CNC)system may produce verbose tool-path codes and unsmooth trajectories.This paper is devoted to addressing the problem of B-splinefitting for cam pitch curves.Considering that the B-spline curve needs to meet the motion law of the follower to approximate the pitch curve,we use the radial error to quantify the effects of thefitting B-spline curve and the pitch curve.The problem thus boils down to solving a difficult global optimization problem tofind the numbers and positions of the control points or data points of the B-spline curve such that the cumulative radial error between thefitting curve and the original curve is minimized,and this problem is attempted in this paper with a double deep Q-network(DDQN)reinforcement learning(RL)algorithm with data points traceability.Specifically,the RL envir-onment,actions set and current states set are designed to facilitate the search of the data points,along with the design of the reward function and the initialization of the neural network.The experimental results show that when the angle division value of the actions set isfixed,the proposed algorithm can maximize the number of data points of the B-spline curve,and accurately place these data points to the right positions,with the minimum average of radial errors.Our work establishes the theoretical foundation for studying splinefitting using the RL method. 展开更多
关键词 B-splinefitting radial error DDQN RL algorithm global optimal policy
下载PDF
Optimal quasi-periodic maintenance policies for two-unit series system 被引量:2
18
作者 高文科 张志胜 +1 位作者 周一帆 甘淑媛 《Journal of Southeast University(English Edition)》 EI CAS 2013年第4期450-455,共6页
To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is per... To investigate the effects of various random factors on the preventive maintenance (PM) decision-making of one type of two-unit series system, an optimal quasi-periodic PM policy is introduced. Assume that PM is perfect for unit 1 and only mechanical service for unit 2 in the model. PM activity is randomly performed according to a dynamic PM plan distributed in each implementation period. A replacement is determined based on the competing results of unplanned and planned replacements. The unplanned replacement is trigged by a catastrophic failure of unit 2, and the planned replacement is executed when the PM number reaches the threshold N. Through modeling and analysis, a solution algorithm for an optimal implementation period and the PM number is given, and optimal process and parametric sensitivity are provided by a numerical example. Results show that the implementation period should be decreased as soon as possible under the condition of meeting the needs of practice, which can increase mean operating time and decrease the long-run cost rate. 展开更多
关键词 maintenance policy optimization quasi-periodic preventive maintenance two-unit series system
下载PDF
Analysis of a POMDP Model for an Optimal Maintenance Problem with Multiple Imperfect Repairs
19
作者 Nobuyuki Tamura 《American Journal of Operations Research》 2023年第6期133-146,共14页
I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replac... I consider a system whose deterioration follows a discrete-time and discrete-state Markov chain with an absorbing state. When the system is put into practice, I may select operation (wait), imperfect repair, or replacement at each discrete-time point. The true state of the system is not known when it is operated. Instead, the system is monitored after operation and some incomplete information concerned with the deterioration is obtained for decision making. Since there are multiple imperfect repairs, I can select one option from them when the imperfect repair is preferable to operation and replacement. To express this situation, I propose a POMDP model and theoretically investigate the structure of an optimal maintenance policy minimizing a total expected discounted cost for an unbounded horizon. Then two stochastic orders are used for the analysis of our problem. 展开更多
关键词 Partially Observable Markov Decision Process Imperfect Repair Stochastic Order Monotone Property Optimal Maintenance policy
下载PDF
Multi-agent reinforcement learning for edge information sharing in vehicular networks 被引量:3
20
作者 Ruyan Wang Xue Jiang +5 位作者 Yujie Zhou Zhidu Li Dapeng Wu Tong Tang Alexander Fedotov Vladimir Badenko 《Digital Communications and Networks》 SCIE CSCD 2022年第3期267-277,共11页
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape... To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments. 展开更多
关键词 Vehicular networks Edge information sharing Delay guarantee Multi-agent reinforcement learning Proximal policy optimization
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部