期刊文献+
共找到164篇文章
< 1 2 9 >
每页显示 20 50 100
A new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning
1
作者 Wendi Chen Qinglai Wei 《Journal of Automation and Intelligence》 2024年第1期34-39,共6页
In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied sy... In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy. 展开更多
关键词 Nonlinear systems reinforcement learning Optimal control Backstepping method
下载PDF
Data-Driven Human-Robot Interaction Without Velocity Measurement Using Off-Policy Reinforcement Learning 被引量:3
2
作者 Yongliang Yang Zihao Ding +2 位作者 Rui Wang Hamidreza Modares Donald C.Wunsch 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第1期47-63,共17页
In this paper,we present a novel data-driven design method for the human-robot interaction(HRI)system,where a given task is achieved by cooperation between the human and the robot.The presented HRI controller design i... In this paper,we present a novel data-driven design method for the human-robot interaction(HRI)system,where a given task is achieved by cooperation between the human and the robot.The presented HRI controller design is a two-level control design approach consisting of a task-oriented performance optimization design and a plant-oriented impedance controller design.The task-oriented design minimizes the human effort and guarantees the perfect task tracking in the outer-loop,while the plant-oriented achieves the desired impedance from the human to the robot manipulator end-effector in the inner-loop.Data-driven reinforcement learning techniques are used for performance optimization in the outer-loop to assign the optimal impedance parameters.In the inner-loop,a velocity-free filter is designed to avoid the requirement of end-effector velocity measurement.On this basis,an adaptive controller is designed to achieve the desired impedance of the robot manipulator in the task space.The simulation and experiment of a robot manipulator are conducted to verify the efficacy of the presented HRI design framework. 展开更多
关键词 Adaptive impedance control data-driven method human-robot interaction(HRI) reinforcement learning velocity-free
下载PDF
Structural Topology Optimization by Combining BESO with Reinforcement Learning 被引量:1
3
作者 Hongbo Sun Ling Ma 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2021年第1期85-96,共12页
In this paper,a new algorithm combining the features of bi-direction evolutionary structural optimization(BESO)and reinforcement learning(RL)is proposed for continuum structural topology optimization(STO).In contrast ... In this paper,a new algorithm combining the features of bi-direction evolutionary structural optimization(BESO)and reinforcement learning(RL)is proposed for continuum structural topology optimization(STO).In contrast to conventional approaches which only generate a certain quasi-optimal solution,the goal of the combined method is to provide more quasi-optimal solutions for designers such as the idea of generative design.Two key components were adopted.First,besides sensitivity,value function updated by Monte-Carlo reinforcement learning was utilized to measure the importance of each element,which made the solving process convergent and closer to the optimum.Second,ε-greedy policy added a random perturbation to the main search direction so as to extend the search ability.Finally,the quality and diversity of solutions could be guaranteed by controlling the value of compliance as well as Intersection-over-Union(IoU).Results of several 2D and 3D compliance minimization problems,including a geometrically nonlinear case,show that the combined method is capable of generating a group of good and different solutions that satisfy various possible requirements in engineering design within acceptable computation cost. 展开更多
关键词 structural topology optimization bi-direction evolutionary structural optimization reinforcement learning first-visit monte-carlo method ε-greedy policy generative design
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
4
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
A Reinforcement Learning System to Dynamic Movement and Multi-Layer Environments
5
作者 Uthai Phommasak Daisuke Kitakoshi +1 位作者 Hiroyuki Shioya Junji Maeda 《Journal of Intelligent Learning Systems and Applications》 2014年第4期176-185,共10页
There are many proposed policy-improving systems of Reinforcement Learning (RL) agents which are effective in quickly adapting to environmental change by using many statistical methods, such as mixture model of Bayesi... There are many proposed policy-improving systems of Reinforcement Learning (RL) agents which are effective in quickly adapting to environmental change by using many statistical methods, such as mixture model of Bayesian Networks, Mixture Probability and Clustering Distribution, etc. However such methods give rise to the increase of the computational complexity. For another method, the adaptation performance to more complex environments such as multi-layer environments is required. In this study, we used profit-sharing method for the agent to learn its policy, and added a mixture probability into the RL system to recognize changes in the environment and appropriately improve the agent’s policy to adjust to the changing environment. We also introduced a clustering that enables a smaller, suitable selection in order to reduce the computational complexity and simultaneously maintain the system’s performance. The results of experiments presented that the agent successfully learned the policy and efficiently adjusted to the changing in multi-layer environment. Finally, the computational complexity and the decline in effectiveness of the policy improvement were controlled by using our proposed system. 展开更多
关键词 reinforcement learning PROFIT-SHARING method MIXTURE PROBABILITY CLUSTERING
下载PDF
基于改进Q-learning算法的移动机器人路径规划
6
作者 井征淼 刘宏杰 周永录 《火力与指挥控制》 CSCD 北大核心 2024年第3期135-141,共7页
针对传统Q-learning算法应用在路径规划中存在收敛速度慢、运行时间长、学习效率差等问题,提出一种将人工势场法和传统Q-learning算法结合的改进Q-learning算法。该算法引入人工势场法的引力函数与斥力函数,通过对比引力函数动态选择奖... 针对传统Q-learning算法应用在路径规划中存在收敛速度慢、运行时间长、学习效率差等问题,提出一种将人工势场法和传统Q-learning算法结合的改进Q-learning算法。该算法引入人工势场法的引力函数与斥力函数,通过对比引力函数动态选择奖励值,以及对比斥力函数计算姿值,动态更新Q值,使移动机器人具有目的性的探索,并且优先选择离障碍物较远的位置移动。通过仿真实验证明,与传统Q-learning算法、引入引力场算法对比,改进Q-learning算法加快了收敛速度,缩短了运行时间,提高了学习效率,降低了与障碍物相撞的概率,使移动机器人能够快速地找到一条无碰撞通路。 展开更多
关键词 移动机器人 路径规划 改进的Q-learning 人工势场法 强化学习
下载PDF
Optimal pivot path of the simplex method for linear programming based on reinforcement learning 被引量:1
7
作者 Anqi Li Tiande Guo +2 位作者 Congying Han Bonan Li Haoran Li 《Science China Mathematics》 SCIE CSCD 2024年第6期1263-1286,共24页
Based on the existing pivot rules,the simplex method for linear programming is not polynomial in the worst case.Therefore,the optimal pivot of the simplex method is crucial.In this paper,we propose the optimal rule to... Based on the existing pivot rules,the simplex method for linear programming is not polynomial in the worst case.Therefore,the optimal pivot of the simplex method is crucial.In this paper,we propose the optimal rule to find all the shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search.Specifically,we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables.Secondly,we propose four reinforcement learning models with two actions and two rewards to make the Monte Carlo tree search suitable for the simplex method.Thirdly,we set a new action selection criterion to ameliorate the inaccurate evaluation in the initial exploration.It is proved that when the number of vertices in the feasible region is C_(n)^(m),our method can generate all the shortest pivot paths,which is the polynomial of the number of variables.In addition,we experimentally validate that the proposed schedule can avoid unnecessary search and provide the optimal pivot path.Furthermore,this method can provide the best pivot labels for all kinds of supervised learning methods to solve linear programming problems. 展开更多
关键词 simplex method linear programming pivot rules reinforcement learning
原文传递
离散四水库问题基准下基于n步Q-learning的水库群优化调度 被引量:4
8
作者 胡鹤轩 钱泽宇 +1 位作者 胡强 张晔 《中国水利水电科学研究院学报(中英文)》 北大核心 2023年第2期138-147,共10页
水库优化调度问题是一个具有马尔可夫性的优化问题。强化学习是目前解决马尔可夫决策过程问题的研究热点,其在解决单个水库优化调度问题上表现优异,但水库群系统的复杂性为强化学习的应用带来困难。针对复杂的水库群优化调度问题,提出... 水库优化调度问题是一个具有马尔可夫性的优化问题。强化学习是目前解决马尔可夫决策过程问题的研究热点,其在解决单个水库优化调度问题上表现优异,但水库群系统的复杂性为强化学习的应用带来困难。针对复杂的水库群优化调度问题,提出一种离散四水库问题基准下基于n步Q-learning的水库群优化调度方法。该算法基于n步Q-learning算法,对离散四水库问题基准构建一种水库群优化调度的强化学习模型,通过探索经验优化,最终生成水库群最优调度方案。试验分析结果表明,当有足够的探索经验进行学习时,结合惩罚函数的一步Q-learning算法能够达到理论上的最优解。用可行方向法取代惩罚函数实现约束,依据离散四水库问题基准约束建立时刻可行状态表和时刻状态可选动作哈希表,有效的对状态动作空间进行降维,使算法大幅度缩短优化时间。不同的探索策略决定探索经验的有效性,从而决定优化效率,尤其对于复杂的水库群优化调度问题,提出了一种改进的ε-greedy策略,并与传统的ε-greedy、置信区间上限UCB、Boltzmann探索三种策略进行对比,验证了其有效性,在其基础上引入n步回报改进为n步Q-learning,确定合适的n步和学习率等超参数,进一步改进算法优化效率。 展开更多
关键词 水库优化调度 强化学习 Q学习 惩罚函数 可行方向法
下载PDF
改进Q-Learning算法在路径规划中的应用 被引量:17
9
作者 高乐 马天录 +1 位作者 刘凯 张宇轩 《吉林大学学报(信息科学版)》 CAS 2018年第4期439-443,共5页
针对Q-Learning算法在离散状态下存在运行效率低、学习速度慢等问题,提出一种改进的Q-Learning算法。改进后的算法在原有算法基础上增加了一层学习过程,对环境进行了深度学习。在栅格环境下进行仿真实验,并成功地应用在多障碍物环境下... 针对Q-Learning算法在离散状态下存在运行效率低、学习速度慢等问题,提出一种改进的Q-Learning算法。改进后的算法在原有算法基础上增加了一层学习过程,对环境进行了深度学习。在栅格环境下进行仿真实验,并成功地应用在多障碍物环境下移动机器人路径规划,结果证明了算法的可行性。改进Q-Learning算法以更快的速度收敛,学习次数明显减少,效率最大可提高20%。同时,该算法框架对解决同类问题具有较强的通用性。 展开更多
关键词 路径规划 改进Q-learning算法 强化学习 栅格法 机器人
下载PDF
Enhancing cut selection through reinforcement learning 被引量:1
10
作者 Shengchao Wang Liang Chen +1 位作者 Lingfeng Niu Yu-Hong Dai 《Science China Mathematics》 SCIE CSCD 2024年第6期1377-1394,共18页
With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from... With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from constructing end-to-end models directly,integrating learning approaches with some modules in the traditional methods for solving MILPs is also a promising direction.The cutting plane method is one of the fundamental algorithms used in modern MILP solvers,and the selection of appropriate cuts from the candidate cuts subset is crucial for enhancing efficiency.Due to the reliance on expert knowledge and problem-specific heuristics,classical cut selection methods are not always transferable and often limit the scalability and generalizability of the cutting plane method.To provide a more efficient and generalizable strategy,we propose a reinforcement learning(RL)framework to enhance cut selection in the solving process of MILPs.Firstly,we design feature vectors to incorporate the inherent properties of MILP and computational information from the solver and represent MILP instances as bipartite graphs.Secondly,we choose the weighted metrics to approximate the proximity of feasible solutions to the convex hull and utilize the learning method to determine the weights assigned to each metric.Thirdly,a graph convolutional neural network is adopted with a self-attention mechanism to predict the value of weighting factors.Finally,we transform the cut selection process into a Markov decision process and utilize RL method to train the model.Extensive experiments are conducted based on a leading open-source MILP solver SCIP.Results on both general and specific datasets validate the effectiveness and efficiency of our proposed approach. 展开更多
关键词 reinforcement learning mixed-integer linear programming cutting plane method cut selection
原文传递
Relevant experience learning:A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments 被引量:17
11
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Yiwei ZHAI Qianglong WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2021年第12期187-204,共18页
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ... Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions. 展开更多
关键词 Autonomous Motion Planning(AMP) Deep Deterministic Policy Gradient(DDPG) Deep reinforcement learning(DRL) Sampling method UAV
原文传递
Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm 被引量:9
12
作者 Lin Cheng Zhenbo Wang Fanghua Jiang 《Astrodynamics》 CSCD 2019年第4期375-386,共12页
In this study,a real-time optimal control approach is proposed using an interactive deep reinforcement learning algorithm for the Moon fuel-optimal landing problem.Considering the remote communication restrictions and... In this study,a real-time optimal control approach is proposed using an interactive deep reinforcement learning algorithm for the Moon fuel-optimal landing problem.Considering the remote communication restrictions and environmental uncertainties,advanced landing control techniques are demanded to meet the high requirements of real-time performance and autonomy in the Moon landing missions.Deep reinforcement learning(DRL)algorithms have been recently developed for real-time optimal control but suffer the obstacles of slow convergence and difficult reward function design.To address these problems,a DRL algorithm is developed using an actor-indirect method architecture to achieve the optimal control of the Moon landing mission.In this DRL algorithm,an indirect method is employed to generate the optimal control actions for the deep neural network(DNN)learning,while the trained DNNs provide good initial guesses for the indirect method to promote the efficiency of training data generation.Through sufficient learning of the state-action relationship,the trained DNNs can approximate the optimal actions and steer the spacecraft to the target in real time.Additionally,a nonlinear feedback controller is developed to improve the terminal landing accuracy.Numerical simulations are given to verify the effectiveness of the proposed DRL algorithm and demonstrate the performance of the developed optimal landing controller. 展开更多
关键词 fuel-optimal landing problem indirect methods deep reinforcement learning interactive network learning real-time optimal control
原文传递
Rich-text document styling restoration via reinforcement learning 被引量:1
13
作者 Hongwei LI Yingpeng HU +2 位作者 Yixuan CAO Ganbin ZHOU Ping LUO 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第4期93-103,共11页
Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside ... Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside them is usually missing,making them improper or even burdensome to be displayed and edited in different formats and platforms.In this study we formulate the task of document styling restoration as an optimization problem,which aims to identify the styling settings on the document elements,e.g.,lines,table cells,text,so that rendering with the output styling settings results in a document,where each element inside it holds the(closely)exact position with the one in the original document.Considering that each styling setting is a decision,this problem can be transformed as a multi-step decision-making task over all the document elements,and then be solved by reinforcement learning.Specifically,Monte-Carlo Tree Search(MCTS)is leveraged to explore the different styling settings,and the policy function is learnt under the supervision of the delayed rewards.As a case study,we restore the styling information inside tables,where structural and functional data in the documents are usually presented.Experiment shows that,our best reinforcement method successfully restores the stylings in 87.65%of the tables,with 25.75%absolute improvement over the greedymethod.We also discuss the tradeoff between the inference time and restoration success rate,and argue that although the reinforcement methods cannot be used in real-time scenarios,it is suitable for the offline tasks with high-quality requirement.Finally,this model has been applied in a PDF parser to support cross-format display. 展开更多
关键词 styling restoration monte-carlo tree search reinforcement learning richly formatted documents TABLES
原文传递
A survey of inverse reinforcement learning techniques 被引量:1
14
作者 Shao Zhifei Er Meng Joo 《International Journal of Intelligent Computing and Cybernetics》 EI 2012年第3期293-311,共19页
Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provi... Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provide a powerful solution for sequential decision making problems under uncertainty.RL uses an agent equipped with a reward function to find a policy through interactions with a dynamic environment.However,one major assumption of existing RL algorithms is that reward function,the most succinct representation of the designer’s intention,needs to be provided beforehand.In practice,the reward function can be very hard to specify and exhaustive to tune for large and complex problems,and this inspires the development of IRL,an extension of RL,which directly tackles this problem by learning the reward function through expert demonstrations.In this paper,the original IRL algorithms and its close variants,as well as their recent advances are reviewed and compared.Findings-This paper can serve as an introduction guide of fundamental theory and developments,as well as the applications of IRL.Originality/value-This paper surveys the theories and applications of IRL,which is the latest development of RL and has not been done so far. 展开更多
关键词 Inverse reinforcement learning Reward function reinforcement learning Artificial intelligence learning methods
原文传递
Embedding expert demonstrations into clustering buffer for effective deep reinforcement learning
15
作者 Shihmin WANG Binqi ZHAO +2 位作者 Zhengfeng ZHANG Junping ZHANG Jian PU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2023年第11期1541-1556,共16页
As one of the most fundamental topics in reinforcement learning(RL),sample efficiency is essential to the deployment of deep RL algorithms.Unlike most existing exploration methods that sample an action from different ... As one of the most fundamental topics in reinforcement learning(RL),sample efficiency is essential to the deployment of deep RL algorithms.Unlike most existing exploration methods that sample an action from different types of posterior distributions,we focus on the policy sampling process and propose an efficient selective sampling approach to improve sample efficiency by modeling the internal hierarchy of the environment.Specifically,we first employ clustering methods in the policy sampling process to generate an action candidate set.Then we introduce a clustering buffer for modeling the internal hierarchy,which consists of on-policy data,off-policy data,and expert data to evaluate actions from the clusters in the action candidate set in the exploration stage.In this way,our approach is able to take advantage of the supervision information in the expert demonstration data.Experiments on six different continuous locomotion environments demonstrate superior reinforcement learning performance and faster convergence of selective sampling.In particular,on the LGSVL task,our method can reduce the number of convergence steps by 46.7%and the convergence time by 28.5%.Furthermore,our code is open-source for reproducibility.The code is available at https://github.com/Shihwin/SelectiveSampling. 展开更多
关键词 reinforcement learning Sample eficiency Sampling process Clustering methods Autonomous driving
原文传递
面向光伏集群的配电网模型⁃数据联合驱动无功/电压控制 被引量:2
16
作者 路小俊 吴在军 +2 位作者 李培帅 沈嘉伟 胡敏强 《电力系统自动化》 EI CSCD 北大核心 2024年第9期97-106,共10页
传统配电网的无功/电压控制(VVC)方法,难以兼顾控制决策的全局最优性与实时响应能力,分布式光伏(DPV)的分散化、高比例并网导致该矛盾日益突出。结合模型优化的寻优能力与深度强化学习的在线决策效率,提出了面向光伏(PV)集群的配电网模... 传统配电网的无功/电压控制(VVC)方法,难以兼顾控制决策的全局最优性与实时响应能力,分布式光伏(DPV)的分散化、高比例并网导致该矛盾日益突出。结合模型优化的寻优能力与深度强化学习的在线决策效率,提出了面向光伏(PV)集群的配电网模型-数据联合驱动VVC策略。首先,考虑日前优化调度与日内实时控制的运行特征,结合DPV集群划分,构建了配电网分布式两阶段VVC框架;然后,以系统运行网损最低为目标,建立了配电网分布式日前VVC模型,并提出了基于Nesterov加速梯度的分布式求解算法;其次,以日前决策为输入量,建立了基于部分可观马尔可夫博弈的配电网实时VVC模型,并提出了基于迭代终止惩罚函数的改进多智能体深度确定性策略梯度算法;最后,基于MATLAB/PyCharm软件平台进行了算例分析,验证了所提方法的全局趋优性以及实时响应能力,提高了PV高比例接入配电网运行的经济性和安全性。 展开更多
关键词 配电网 光伏集群 无功/电压控制 加速交替方向乘子法 深度强化学习
下载PDF
基于拟牛顿法的深度强化学习在车联网边缘计算中的研究 被引量:1
17
作者 章坚武 芦泽韬 +1 位作者 章谦骅 詹明 《通信学报》 EI CSCD 北大核心 2024年第5期90-100,共11页
为了解决车联网中由于多任务和资源限制导致的任务卸载决策不理想的问题,提出了拟牛顿法的深度强化学习双阶段在线卸载(QNRLO)算法。该算法首先引入批归一化技术优化深度神经网络的训练过程,随后采用拟牛顿法进行优化,有效逼近最优解。... 为了解决车联网中由于多任务和资源限制导致的任务卸载决策不理想的问题,提出了拟牛顿法的深度强化学习双阶段在线卸载(QNRLO)算法。该算法首先引入批归一化技术优化深度神经网络的训练过程,随后采用拟牛顿法进行优化,有效逼近最优解。通过此双阶段优化,算法显著提升了在多任务和动态无线信道条件下的性能,提高了计算效率。通过引入拉格朗日算子和重构的对偶函数,将非凸优化问题转化为对偶函数的凸优化问题,确保算法的全局最优性。此外,算法考虑了车联网模型中的系统传输时间分配,增强了模型的实用性。与现有算法相比,所提算法显著提高了任务卸载的收敛性和稳定性,并能有效处理车联网中的任务卸载问题,具有较高的实用性和可靠性。 展开更多
关键词 车联网 任务卸载 深度强化学习 拟牛顿法
下载PDF
面向稀疏奖励的机器人操作技能学习
18
作者 吴培良 张彦 +2 位作者 毛秉毅 陈雯柏 高国伟 《控制理论与应用》 EI CAS CSCD 北大核心 2024年第1期99-108,共10页
基于深度强化学习的机器人操作技能学习成为研究热点,但由于任务的稀疏奖励性质,学习效率较低.本文提出了基于元学习的双经验池自适应软更新事后经验回放方法,并将其应用于稀疏奖励的机器人操作技能学习问题求解.首先,在软更新事后经验... 基于深度强化学习的机器人操作技能学习成为研究热点,但由于任务的稀疏奖励性质,学习效率较低.本文提出了基于元学习的双经验池自适应软更新事后经验回放方法,并将其应用于稀疏奖励的机器人操作技能学习问题求解.首先,在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数,并加入温度自适应调整策略,动态调整温度参数以适应不同的任务环境;其次,结合元学习思想对经验回放进行分割,训练时动态调整选取真实采样数据和构建虚拟数的比例,提出了DAS-HER方法;然后,将DAS-HER算法应用到机器人操作技能学习中,构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架;最后,在Mujoco下的Fetch和Hand环境中,进行了8项任务的对比实验,实验结果表明,无论是在训练效率还是在成功率方面,本文算法表现均优于其他算法. 展开更多
关键词 机器人操作技能学习 强化学习 稀疏奖励 最大熵方法 自适应温度参数 元学习
下载PDF
潜在空间中的策略搜索强化学习方法
19
作者 赵婷婷 王莹 +3 位作者 孙威 陈亚瑞 王嫄 杨巨成 《计算机科学与探索》 CSCD 北大核心 2024年第4期1032-1046,共15页
策略搜索是深度强化学习领域中一种能够解决大规模连续状态空间和动作空间问题的高效学习方法,被广泛应用在现实问题中。然而,此类方法通常需要花费大量的学习样本和训练时间,且泛化能力较差,学到的策略模型难以泛化至环境中看似微小的... 策略搜索是深度强化学习领域中一种能够解决大规模连续状态空间和动作空间问题的高效学习方法,被广泛应用在现实问题中。然而,此类方法通常需要花费大量的学习样本和训练时间,且泛化能力较差,学到的策略模型难以泛化至环境中看似微小的变化。为了解决上述问题,提出了一种基于潜在空间的策略搜索强化学习方法。将学习状态表示的思想拓展到动作表示上,即在动作表示的潜在空间中学习策略,再将动作表示映射到真实动作空间中。通过表示学习模型的引入,摒弃端到端的训练方式,将整个强化学习任务划分成大规模的表示模型部分和小规模的策略模型部分,使用无监督的学习方法来学习表示模型,使用策略搜索强化学习方法学习小规模的策略模型。大规模的表示模型能保留应有的泛化性和表达能力,小规模的策略模型有助于减轻策略学习的负担,从而在一定程度上缓解深度强化学习领域中样本利用率低、学习效率低和动作选择泛化性弱的问题。最后,在智能控制任务CarRacing和Cheetah中验证了引入潜在空间中的状态表示和动作表示的有效性。 展开更多
关键词 无模型强化学习 策略模型 状态表示 动作表示 连续动作空间 策略搜索强化学习方法
下载PDF
基于可靠性的卫星服务功能链保护方法 被引量:1
20
作者 王忠 吴炀 +1 位作者 宋化宇 李飞龙 《指挥控制与仿真》 2024年第1期147-153,共7页
在基于SDN架构的混合卫星网络上讨论了服务功能链(Service Function Chain,SFC)的可靠性部署问题,首先对SFC可靠性保护的问题进行描述,建立了底层网络与SFC请求模型,然后建立了网络服务功能的可靠性需求模型与低轨卫星链路的可靠性需求... 在基于SDN架构的混合卫星网络上讨论了服务功能链(Service Function Chain,SFC)的可靠性部署问题,首先对SFC可靠性保护的问题进行描述,建立了底层网络与SFC请求模型,然后建立了网络服务功能的可靠性需求模型与低轨卫星链路的可靠性需求模型,明确了优化目标与约束条件。接着提出基于可靠性的卫星服务功能链保护方法,包括基于深度强化学习的可靠性保护算法和基于低轨卫星节点与链路可靠性备份算法。实验表明,提出的基于可靠性的卫星服务功能链保护方法能在SDN架构的混合卫星网络上提高SFC请求接受率,减少平均时延,在不同的SFC可靠性需求的条件下也保持较高的请求接受率。 展开更多
关键词 卫星网络 可靠性保护 服务功能链 可靠性保护算法 深度强化学习
下载PDF
上一页 1 2 9 下一页 到第
使用帮助 返回顶部