期刊文献+
共找到275篇文章
< 1 2 14 >
每页显示 20 50 100
Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications 被引量:4
1
作者 Ding Wang Ning Gao +2 位作者 Derong Liu Jinna Li Frank L.Lewis 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期18-36,共19页
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ... Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence. 展开更多
关键词 Adaptive dynamic programming(ADP) advanced control complex environment data-driven control event-triggered design intelligent control neural networks nonlinear systems optimal control reinforcement learning(RL)
下载PDF
Combining reinforcement learning with mathematical programming:An approach for optimal design of heat exchanger networks
2
作者 Hui Tan Xiaodong Hong +4 位作者 Zuwei Liao Jingyuan Sun Yao Yang Jingdai Wang Yongrong Yang 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2024年第5期63-71,共9页
Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinea... Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinear and combinatorial nature of the HEN problem,it is not easy to find solutions of high quality for large-scale problems.The reinforcement learning(RL)method,which learns strategies through ongoing exploration and exploitation,reveals advantages in such area.However,due to the complexity of the HEN design problem,the RL method for HEN should be dedicated and designed.A hybrid strategy combining RL with mathematical programming is proposed to take better advantage of both methods.An insightful state representation of the HEN structure as well as a customized reward function is introduced.A Q-learning algorithm is applied to update the HEN structure using theε-greedy strategy.Better results are obtained from three literature cases of different scales. 展开更多
关键词 Heat exchanger network reinforcement learning Mathematical programming Process design
下载PDF
Impact of chronic disease self-management programs on type 2 diabetes management in primary care 被引量:6
3
作者 Samuel N Forjuoh Marcia G Ory +2 位作者 Luohua Jiang Ann M Vuong Jane N Bolin 《World Journal of Diabetes》 SCIE CAS 2014年第3期407-414,共8页
AIM: To assess the effectiveness of the Chronic Disease Self-Management Program(CDSMP) on glycated hemoglobin A1c(HbA1c) and selected self-reported measures.METHODS: We compared patients who received a diabetes self-c... AIM: To assess the effectiveness of the Chronic Disease Self-Management Program(CDSMP) on glycated hemoglobin A1c(HbA1c) and selected self-reported measures.METHODS: We compared patients who received a diabetes self-care behavioral intervention, the CDSMP developed at the Stanford University, with controls whoreceived usual care on their HbA1c and selected self-reported measures, including diabetes self-care activities, health-related quality of life(HRQOL), pain and fatigue. The subjects were a subset of participants enrolled in a randomized controlled trial that took place at seven regional clinics of a university-affiliated integrated healthcare system of a multi-specialty group practice between January 2009 and June 2011. The primary outcome was change in HbA1c from randomization to 12 mo. Data were analyzed using multilevel statistical models and linear mixed models to provide unbiased estimates of intervention effects.RESULTS: Demographic and baseline clinical characteristics were generally comparable between the two groups. The average baseline HbA1c values in the CDSMP and control groups were 9.4% and 9.2%, respectively. Significant reductions in HbA1c were seen at 12 mo for the two groups, with adjusted changes around 0.6%(P < 0.0001), but the reductions did not differ significantly between the two groups(P = 0.885). Few significant differences were observed in participants' diabetes self-care activities. No significant differences were observed in the participants' HRQOL, pain, or fatigue measures.CONCLUSION: The CDSMP intervention may not lower HbA1c any better than good routine care in an integrated healthcare system. More research is needed to understand the benefits of self-management programs in primary care in different settings and populations. 展开更多
关键词 Type 2 DIABETES self-management CHRONIC DISEASE self-management program Glycemic control Glycated HEMOGLOBIN CHRONIC DISEASE
下载PDF
Call for papers Journal of Control Theory and Applications Special issue on Approximate dynamic programming and reinforcement learning
4
《控制理论与应用(英文版)》 EI 2010年第2期257-257,共1页
Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
关键词 Call for papers Journal of Control Theory and Applications Special issue on Approximate dynamic programming and reinforcement learning
下载PDF
Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量:15
5
作者 Dimitri P.Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor... In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. 展开更多
关键词 reinforcement learning dynamic programming Markovian decision problems AGGREGATION feature-based ARCHITECTURES policy ITERATION DEEP neural networks rollout algorithms
下载PDF
Multiagent Reinforcement Learning:Rollout and Policy Iteration 被引量:3
6
作者 Dimitri Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第2期249-272,共24页
We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest... We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest method of this type,where just one improved policy is generated.We can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next iteration.In contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line replanning.Moreover,rollout can use as base policy one of the policies produced by PI,thereby improving on that policy.This is the type of scheme underlying the prominently successful Alpha Zero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate agent.This is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state information.Based on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other agents.The amount of total computation required at every stage grows linearly with the number of agents.By contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of agents.Despite the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base policy.We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the cost improvement property,without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems,we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation.For one of our PI algorithms,we prove convergence to an agentby-agent optimal policy,thus establishing a connection with the theory of teams.For another PI algorithm,which is executed over a more complex state space,we prove convergence to an optimal policy.Approximate forms of these algorithms are also given,based on the use of policy and value neural networks.These PI algorithms,in both their exact and their approximate form are strictly off-line methods,but they can be used to provide a base policy for use in an on-line multiagent rollout scheme. 展开更多
关键词 Dynamic programming multiagent problems neuro-dynamic programming policy iteration reinforcement learning rollout
下载PDF
Robotic Knee Tracking Control to Mimic the Intact Human Knee Profile Based on Actor-Critic Reinforcement Learning 被引量:2
7
作者 Ruofan Wu Zhikai Yao +1 位作者 Jennie Si He(Helen)Huang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第1期19-30,共12页
We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee... We address a state-of-the-art reinforcement learning(RL)control approach to automatically configure robotic pros-thesis impedance parameters to enable end-to-end,continuous locomotion intended for transfemoral amputee subjects.Specifically,our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile.This is a significant advance from our previous RL based automatic tuning of prosthesis control parameters which have centered on regulation control with a designer prescribed robotic knee profile as the target.In addition to presenting the tracking control algorithm based on direct heuristic dynamic programming(dHDP),we provide a control performance guarantee including the case of constrained inputs.We show that our proposed tracking control possesses several important properties,such as weight convergence of the learning networks,Bellman(sub)optimality of the cost-to-go value function and control input,and practical stability of the human-robot system.We further provide a systematic simulation of the proposed tracking control using a realistic human-robot system simulator,the OpenSim,to emulate how the dHDP enables level ground walking,walking on different terrains and at different paces.These results show that our proposed dHDP based tracking control is not only theoretically suitable,but also practically useful. 展开更多
关键词 Automatic tracking of intact knee configuration of robotic knee prosthesis direct heuristic dynamic programming(dHDP) reinforcement learning control
下载PDF
Optimal pivot path of the simplex method for linear programming based on reinforcement learning 被引量:1
8
作者 Anqi Li Tiande Guo +2 位作者 Congying Han Bonan Li Haoran Li 《Science China Mathematics》 SCIE CSCD 2024年第6期1263-1286,共24页
Based on the existing pivot rules,the simplex method for linear programming is not polynomial in the worst case.Therefore,the optimal pivot of the simplex method is crucial.In this paper,we propose the optimal rule to... Based on the existing pivot rules,the simplex method for linear programming is not polynomial in the worst case.Therefore,the optimal pivot of the simplex method is crucial.In this paper,we propose the optimal rule to find all the shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search.Specifically,we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables.Secondly,we propose four reinforcement learning models with two actions and two rewards to make the Monte Carlo tree search suitable for the simplex method.Thirdly,we set a new action selection criterion to ameliorate the inaccurate evaluation in the initial exploration.It is proved that when the number of vertices in the feasible region is C_(n)^(m),our method can generate all the shortest pivot paths,which is the polynomial of the number of variables.In addition,we experimentally validate that the proposed schedule can avoid unnecessary search and provide the optimal pivot path.Furthermore,this method can provide the best pivot labels for all kinds of supervised learning methods to solve linear programming problems. 展开更多
关键词 simplex method linear programming pivot rules reinforcement learning
原文传递
PDP: Parallel Dynamic Programming 被引量:15
9
作者 Fei-Yue Wang Jie Zhang +2 位作者 Qinglai Wei Xinhu Zheng Li Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第1期1-5,共5页
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive ... Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming ADP is first presented instead of direct dynamic programming DP , and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence. © 2014 Chinese Association of Automation. 展开更多
关键词 Artificial intelligence Neural networks reinforcement learning
下载PDF
结合数据驱动与物理模型的主动配电网双时间尺度电压协调优化控制 被引量:3
10
作者 张剑 崔明建 何怡刚 《电工技术学报》 EI CSCD 北大核心 2024年第5期1327-1339,共13页
高比例电动汽车、分布式风电、光伏接入配电网,导致电压频繁地剧烈波动。传统调压设备与逆变器动作速度差异巨大,如何协调是难点问题。该文结合数据驱动与物理建模方法,提出一种配电网双时间尺度电压协调优化控制策略。针对短时间尺度(... 高比例电动汽车、分布式风电、光伏接入配电网,导致电压频繁地剧烈波动。传统调压设备与逆变器动作速度差异巨大,如何协调是难点问题。该文结合数据驱动与物理建模方法,提出一种配电网双时间尺度电压协调优化控制策略。针对短时间尺度(min级)电压波动,以静止无功补偿器、分布式电源无功功率为决策变量,以电压二次方偏差最小为目标函数,针对平衡与不平衡配电网,基于支路潮流方程,计及物理约束构建了二次规划模型。针对长时间尺度(h级)电压波动,以电压调节器匝比、可投切电容电抗器挡位、储能系统充放电功率为动作,当前时段配电网节点功率为状态,节点电压二次方偏差为代价,构建了马尔可夫决策过程。为克服连续-离散动作空间维数灾,提出了一种基于松弛-预报-校正的深度确定性策略梯度强化学习求解算法。最后,采用IEEE 33节点平衡与123节点不平衡配电网验证了所提出方法的有效性。 展开更多
关键词 智能配电网 电压控制 深度强化学习 二次规划 双时间尺度
下载PDF
具有时变输出约束的非线性多智能体系统自适应最优包含控制
11
作者 张天平 刘涛 章恩泽 《控制理论与应用》 EI CAS CSCD 北大核心 2024年第10期1899-1912,共14页
本文对具有时变输出约束和未建模动态的不确定严格反馈非线性多智能体系统,提出了一种最优包含控制方法.利用一种新型积分型障碍Lyapunov函数处理输出约束,利用动态信号处理未建模动态,利用动态面控制方法设计前馈控制器,结合自适应动... 本文对具有时变输出约束和未建模动态的不确定严格反馈非线性多智能体系统,提出了一种最优包含控制方法.利用一种新型积分型障碍Lyapunov函数处理输出约束,利用动态信号处理未建模动态,利用动态面控制方法设计前馈控制器,结合自适应动态规划和积分强化学习方法设计最优反馈控制器,利用神经网络在线逼近相应代价函数,并设计权重更新律.理论分析证明了所有跟随者的输出收敛到领导者生成的凸包中,全部跟随者组成的闭环系统是半全局一致最终有界的,同时,跟随者的输出保持在给定的约束集中,代价函数达到最小.仿真结果验证了所提出方法的有效性. 展开更多
关键词 自适应动态规划 积分强化学习 最优控制 动态面控制 积分型障碍Lyapunov函数 多智能体系统
下载PDF
集成Vissim-Python和Qt的信控交叉口DRL配时仿真系统设计
12
作者 任安虎 李珊 《计算机应用与软件》 北大核心 2024年第11期53-59,122,共8页
针对目前Vissim-Python联合仿真系统在信控交叉口DRL配时研究中算法移植性不高且难以实际应用的问题,设计一款交叉口配时仿真系统。在该系统中,为满足实际应用的需要,提出同时考虑检测器数据和倒计时的DRL配时模型,根据此模型需求,通过P... 针对目前Vissim-Python联合仿真系统在信控交叉口DRL配时研究中算法移植性不高且难以实际应用的问题,设计一款交叉口配时仿真系统。在该系统中,为满足实际应用的需要,提出同时考虑检测器数据和倒计时的DRL配时模型,根据此模型需求,通过Python设计必要的Vissim组件接口,并封装为Gym强化学习仿真环境。为解决算法移植性不高的问题,基于深度学习框架PyTorch对配时算法的接口规范化,并使用PyQt5设计可视化操作界面,可灵活调整算法参数。为提高系统运行效率,使用四种技巧加速仿真进程。最后以珠海市柠溪路与兴业路交叉口为例进行仿真测试,结果表明,该系统运行良好,既可对实际交叉口的配时效果进行评估,也可作为相关算法研究者的测试平台。 展开更多
关键词 交叉口配时 深度强化学习 VISSIM仿真 Python程序 Qt界面 PyTorch
下载PDF
基于深度强化学习DDDQN的高速列车智能调度调整方法 被引量:1
13
作者 吴卫 阴佳腾 +1 位作者 陈照森 唐涛 《铁道科学与工程学报》 EI CAS CSCD 北大核心 2024年第4期1298-1308,共11页
在高速铁路系统的日常运营中,列车经常受到各种突发事件的干扰而导致晚点,严重影响旅客出行体验。为在短时间内制定出列车运行调整方案并尽可能缩短列车晚点时间,提出一种将深度强化学习与整数规划模型相结合的列车智能调度调整方法(DDD... 在高速铁路系统的日常运营中,列车经常受到各种突发事件的干扰而导致晚点,严重影响旅客出行体验。为在短时间内制定出列车运行调整方案并尽可能缩短列车晚点时间,提出一种将深度强化学习与整数规划模型相结合的列车智能调度调整方法(DDDQN)。首先,将线路划分为多个轨道区段相连接的形式,并基于车间作业调度问题,以最小化所有列车总晚点时间为目标,构建描述列车运行过程的整数规划模型。之后,将各列车视为智能体,根据实际运营需求定义了多智能体的状态、动作以及回报函数,并构造了2个深度神经网络以近似值函数。最后,结合上述整数规划模型设计了DDDQN的训练方法,先利用智能体在仿真环境中探索求出问题可行解,并通过2个神经网络之间的“互馈”机制,实现神经网络参数的更新。在此基础上求解整数规划模型,即可在短时间内得到问题最优解。利用京张高铁实际线路数据和运营数据进行仿真实验,通过比较3种不同求解方法在10个不同突发事件场景下得到的列车总晚点时间和求解时间,验证了所提出的DDDQN模型可以在短时间内得到问题的最优解,可降低至多30.43%的列车晚点时间以及至多68.33%的求解时间。DDDQN为提升高速铁路系统在突发事件下的应急处置能力以及运输组织效率提供了一种智能化的方法与参考。 展开更多
关键词 列车智能调度调整 列车晚点时间 深度强化学习 整数规划模型 神经网络
下载PDF
Enhancing cut selection through reinforcement learning 被引量:1
14
作者 Shengchao Wang Liang Chen +1 位作者 Lingfeng Niu Yu-Hong Dai 《Science China Mathematics》 SCIE CSCD 2024年第6期1377-1394,共18页
With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from... With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from constructing end-to-end models directly,integrating learning approaches with some modules in the traditional methods for solving MILPs is also a promising direction.The cutting plane method is one of the fundamental algorithms used in modern MILP solvers,and the selection of appropriate cuts from the candidate cuts subset is crucial for enhancing efficiency.Due to the reliance on expert knowledge and problem-specific heuristics,classical cut selection methods are not always transferable and often limit the scalability and generalizability of the cutting plane method.To provide a more efficient and generalizable strategy,we propose a reinforcement learning(RL)framework to enhance cut selection in the solving process of MILPs.Firstly,we design feature vectors to incorporate the inherent properties of MILP and computational information from the solver and represent MILP instances as bipartite graphs.Secondly,we choose the weighted metrics to approximate the proximity of feasible solutions to the convex hull and utilize the learning method to determine the weights assigned to each metric.Thirdly,a graph convolutional neural network is adopted with a self-attention mechanism to predict the value of weighting factors.Finally,we transform the cut selection process into a Markov decision process and utilize RL method to train the model.Extensive experiments are conducted based on a leading open-source MILP solver SCIP.Results on both general and specific datasets validate the effectiveness and efficiency of our proposed approach. 展开更多
关键词 reinforcement learning mixed-integer linear programming cutting plane method cut selection
原文传递
基于博弈论与强化学习的多智能体路径规划算法 被引量:1
15
作者 熊文博 郭磊 焦彤宇 《深圳大学学报(理工版)》 CAS CSCD 北大核心 2024年第3期274-282,共9页
针对平面上多个智能体构成的路径规划求解算法普遍存在的速度慢效率低等问题进行研究,将多智能体路径规划问题归结为非零和随机博弈,使用多智能体强化学习算法赢或快速学习-策略爬山(win or learn fast-policy hill-climbing,WoLF-PHC)... 针对平面上多个智能体构成的路径规划求解算法普遍存在的速度慢效率低等问题进行研究,将多智能体路径规划问题归结为非零和随机博弈,使用多智能体强化学习算法赢或快速学习-策略爬山(win or learn fast-policy hill-climbing,WoLF-PHC)得到纳什均衡策略,为各智能体做出无冲突的最优路径决策,提出能够快速自适应的WoLF-PHC(fast adaptive WoLF-PHC,FA-WoLF-PHC)算法,通过构建目标函数,使用梯度下降对学习率进行自适应更新.在猜硬币和自定义收益矩阵2个博弈场景中使用FA-WoLF-PHC,并与策略爬山(policy hill-climbing,PHC)算法和Wolf-PHC算法进行比较.结果表明,FA-WoLF-PHC算法的学习速度较WoLF-PHC算法有所提升,并有效减小了WoLF-PHC算法和PHC算法在学习过程中出现的振荡现象.在多智能体路径规划问题中,FA-WoLF-PHC算法的学习速度比WoLF-PHC算法提高了16.01%.将路径规划问题的环境栅格地图扩大为6×6,智能体数量增加为3个时,FA-WoLF-PHC、WoLF-PSP和多头绒泡菌-人工势场Sarsa(physarum polycephalum-artificial potential state-action-reward-state-action,PP-AP Sarsa)算法在10次实验中学习到最终策略需要的平均时间分别为16.30、20.59和17.72 s.在多智能体路径规划问题中,FA-WoLF-PHC算法能够得到各智能体的纳什均衡策略,学习速度较WoLF-PSP和PP-AP Sarsa算法有显著提高.FA-WoLF-PHC算法在常见的博弈场景中能够快速获得纳什策略,在多智能体路径规划问题中可为多个智能体生成无冲突的最优路径,并且在学习速度等方面较其他算法有显著提高. 展开更多
关键词 人工智能 博弈论 动态规划 纳什均衡策略 强化学习 多智能体路径规划
下载PDF
基于近似动态规划的多级火箭全程任务决策
16
作者 李超兵 包为民 +2 位作者 李忠奎 禹春梅 程晓明 《宇航学报》 EI CAS CSCD 北大核心 2024年第8期1251-1260,共10页
针对火箭发生推力下降故障下的任务决策问题,提出了一种基于近似动态规划的多级火箭全程任务决策方法。首先,通过设置初始状态集合、决策选项、奖励函数、Q函数迭代方法等,建立了火箭任务决策分层强化学习模型,得到对火箭后续飞行进行... 针对火箭发生推力下降故障下的任务决策问题,提出了一种基于近似动态规划的多级火箭全程任务决策方法。首先,通过设置初始状态集合、决策选项、奖励函数、Q函数迭代方法等,建立了火箭任务决策分层强化学习模型,得到对火箭后续飞行进行评价的“评价网络”;然后利用基于凸优化的在线能力评估和轨迹规划方法,得到近似动态规划原理中的“决策生成”模块;最后,通过两者结合完成对火箭故障下后续飞行中连续轨迹和各级飞行段离散轨道根数等的决策。仿真结果表明该方法能够在非致命推力下降故障下实现火箭全程飞行任务决策并给出飞行轨迹。 展开更多
关键词 运载火箭 推力故障 任务决策 近似动态规划 分层强化学习
下载PDF
冻融循环下FRP筋混凝土界面黏结强度预测
17
作者 高旭 黄丽华 《大连理工大学学报》 CAS CSCD 北大核心 2024年第1期57-63,共7页
在冻融、腐蚀等恶劣服役环境下,用纤维增强复合材料(FRP)代替钢筋来提升混凝土结构的耐久性,已越来越多地应用在土木工程中.针对冻融循环下FRP筋混凝土界面黏结机理复杂,反映界面性能的理论模型难以构建问题,基于文献中110组冻融循环下... 在冻融、腐蚀等恶劣服役环境下,用纤维增强复合材料(FRP)代替钢筋来提升混凝土结构的耐久性,已越来越多地应用在土木工程中.针对冻融循环下FRP筋混凝土界面黏结机理复杂,反映界面性能的理论模型难以构建问题,基于文献中110组冻融循环下FRP筋混凝土拉拔试验数据,采用遗传算法优化的反向传播神经网络(GA-BPNN)预测FRP筋混凝土界面黏结强度,通过分析权值矩阵的参数敏感性,筛选界面黏结强度的主要影响参数并以此为变量,运用基因表达式编程(GEP)方法建立界面黏结强度的计算公式.与目前文献中仅有的两个理论模型相比,所提公式在计算冻融循环下FRP筋混凝土界面黏结强度时精度更高、泛化性能更强. 展开更多
关键词 FRP筋混凝土 黏结强度 冻融循环 反向传播神经网络(BPNN) 基因表达式编程(GEP)
下载PDF
多期贝叶斯强化学习鲁棒投资组合选择模型
18
作者 李柔佳 段启宏 +1 位作者 冯卓航 刘嘉 《工程数学学报》 CSCD 北大核心 2024年第2期232-244,共13页
在传统多期分布式鲁棒投资组合选择模型中,不确定集合的估计是一个具有挑战性的难题。使用贝叶斯强化学习方法来动态更新不确定集合中的一、二阶矩等模型参数,进而研究贝叶斯强化学习框架下均值–最坏鲁棒CVaR模型的求解问题。通过结合... 在传统多期分布式鲁棒投资组合选择模型中,不确定集合的估计是一个具有挑战性的难题。使用贝叶斯强化学习方法来动态更新不确定集合中的一、二阶矩等模型参数,进而研究贝叶斯强化学习框架下均值–最坏鲁棒CVaR模型的求解问题。通过结合动态规划和渐进对冲算法,设计了两层分解求解框架。下层通过求解一系列二阶锥规划来得到给定模型参数下子问题的最优策略,上层使用贝叶斯公式得到可实施的非预期投资策略。基于美国股票市场的实证结果表明:多期鲁棒强化学习投资组合选择模型相较传统模型具有更好的样本外投资表现。 展开更多
关键词 贝叶斯强化学习 鲁棒风险度量 投资组合 二阶锥规划
下载PDF
混合动力汽车深度强化学习分层能量管理策略
19
作者 戴科峰 胡明辉 《重庆大学学报》 CAS CSCD 北大核心 2024年第1期41-51,共11页
为了提高混合动力汽车的燃油经济性和控制策略的稳定性,以第三代普锐斯混联式混合动力汽车作为研究对象,提出了一种等效燃油消耗最小策略(equivalent fuel consumption minimization strategy,ECMS)与深度强化学习方法(deep feinforceme... 为了提高混合动力汽车的燃油经济性和控制策略的稳定性,以第三代普锐斯混联式混合动力汽车作为研究对象,提出了一种等效燃油消耗最小策略(equivalent fuel consumption minimization strategy,ECMS)与深度强化学习方法(deep feinforcement learning,DRL)结合的分层能量管理策略。仿真结果证明,该分层控制策略不仅可以让强化学习中的智能体在无模型的情况下实现自适应节能控制,而且能保证混合动力汽车在所有工况下的SOC都满足约束限制。与基于规则的能量管理策略相比,此分层控制策略可以将燃油经济性提高20.83%~32.66%;增加智能体对车速的预测信息,可进一步降低5.12%的燃油消耗;与没有分层的深度强化学习策略相比,此策略可将燃油经济性提高8.04%;与使用SOC偏移惩罚的自适应等效燃油消耗最小策略(A-ECMS)相比,此策略下的燃油经济性将提高5.81%~16.18%。 展开更多
关键词 混合动力汽车 动态规划 强化学习 深度神经网络 等效燃油消耗
下载PDF
随机线性二次问题中一类改进的强化学习方法
20
作者 高晋鹏 《科技创新与应用》 2024年第32期142-145,共4页
随机线性二次问题是一类重要且研究较为成熟的随机控制问题。其中,部分信息条件下的随机线性二次问题是指系统的状态方程或代价函数中存在未知系数的情形,该文在前人工作的基础上,改进部分信息条件下线性二次问题的最优控制在线强化学... 随机线性二次问题是一类重要且研究较为成熟的随机控制问题。其中,部分信息条件下的随机线性二次问题是指系统的状态方程或代价函数中存在未知系数的情形,该文在前人工作的基础上,改进部分信息条件下线性二次问题的最优控制在线强化学习算法。所研究系统方程和代价函数的系数都存在未知量,在此条件下,算法通过可观察的样本轨迹和回报函数求得最优控制以及代价函数中的未知系数,进一步地,我们给出迭代过程收敛性与控制稳定性的证明。 展开更多
关键词 随机线性二次问题 部分信息 李雅普诺夫方程 强化学习 动态规划原理
下载PDF
上一页 1 2 14 下一页 到第
使用帮助 返回顶部