期刊文献+
共找到288篇文章
< 1 2 15 >
每页显示 20 50 100
Approximate Dynamic Programming for Stochastic Resource Allocation Problems 被引量:4
1
作者 Ali Forootani Raffaele Iervolino +1 位作者 Massimo Tipaldi Joshua Neilson 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第4期975-990,共16页
A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource... A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach. 展开更多
关键词 Approximate dynamic programming(ADP) dynamic programming(DP) markov decision processes(MDPs) resource allocation problem
下载PDF
Performance Potential-based Neuro-dynamic Programming for SMDPs 被引量:10
2
作者 TANGHao YUANJi-Bin LUYang CHENGWen-Juan 《自动化学报》 EI CSCD 北大核心 2005年第4期642-645,共4页
An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their... An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided. 展开更多
关键词 决议过程 SMDP 执行电位 神经动力学 markov 优化设计
下载PDF
Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles
3
作者 Xiaoqi Qiu Peng Lai +1 位作者 Changsheng Gao Wuxing Jing 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期457-470,共14页
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u... This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws. 展开更多
关键词 Endoatmospheric interception Missile guidance reinforcement learning markov decision process Recurrent neural networks
下载PDF
Deep Reinforcement Learning for Energy-Efficient Edge Caching in Mobile Edge Networks
4
作者 Meng Deng Zhou Huan +3 位作者 Jiang Kai Zheng Hantong Cao Yue Chen Peng 《China Communications》 SCIE CSCD 2024年第11期243-256,共14页
Edge caching has emerged as a promising application paradigm in 5G networks,and by building edge networks to cache content,it can alleviate the traffic load brought about by the rapid growth of Internet of Things(IoT)... Edge caching has emerged as a promising application paradigm in 5G networks,and by building edge networks to cache content,it can alleviate the traffic load brought about by the rapid growth of Internet of Things(IoT)services and applications.Due to the limitations of Edge Servers(ESs)and a large number of user demands,how to make the decision and utilize the resources of ESs are significant.In this paper,we aim to minimize the total system energy consumption in a heterogeneous network and formulate the content caching optimization problem as a Mixed Integer Non-Linear Programming(MINLP).To address the optimization problem,a Deep Q-Network(DQN)-based method is proposed to improve the overall performance of the system and reduce the backhaul traffic load.In addition,the DQN-based method can effectively solve the limitation of traditional reinforcement learning(RL)in complex scenarios.Simulation results show that the proposed DQN-based method can greatly outperform other benchmark methods,and significantly improve the cache hit rate and reduce the total system energy consumption in different scenarios. 展开更多
关键词 deep reinforcement learning edge caching energy consumption markov decision process
下载PDF
Service Function Chain Deployment Algorithm Based on Multi-Agent Deep Reinforcement Learning
5
作者 Wanwei Huang Qiancheng Zhang +2 位作者 Tao Liu YaoliXu Dalei Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4875-4893,共19页
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S... Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%. 展开更多
关键词 Network function virtualization service function chain markov decision process multi-agent reinforcement learning
下载PDF
Optimal Policies for Quantum Markov Decision Processes 被引量:2
6
作者 Ming-Sheng Ying Yuan Feng Sheng-Gang Ying 《International Journal of Automation and computing》 EI CSCD 2021年第3期410-421,共12页
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro... Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world. 展开更多
关键词 Quantum markov decision processes quantum machine learning reinforcement learning dynamic programming decision making
原文传递
Solving Markov Decision Processes with Downside Risk Adjustment 被引量:1
7
作者 Abhijit Gosavi Anish Parulekar 《International Journal of Automation and computing》 EI CSCD 2016年第3期235-245,共11页
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral cr... Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral criterion in which the expected rewards, either average or discounted, are maximized. There exists some literature on MDPs that takes risks into account. Much of this addresses the exponential utility (EU) function and mechanisms to penalize different forms of variance of the rewards. EU functions have some numerical deficiencies, while variance measures variability both above and below the mean rewards; the variability above mean rewards is usually beneficial and should not be penalized/avoided. As such, risk metrics that account for pre-specified targets (thresholds) for rewards have been considered in the literature, where the goal is to penalize the risks of revenues falling below those targets. Existing work on MDPs that takes targets into account seeks to minimize risks of this nature. Minimizing risks can lead to poor solutions where the risk is zero or near zero, but the average rewards are also rather low. In this paper, hence, we study a risk-averse criterion, in particular the so-called downside risk, which equals the probability of the revenues falling below a given target, where, in contrast to minimizing such risks, we only reduce this risk at the cost of slightly lowered average rewards. A solution where the risk is low and the average reward is quite high, although not at its maximum attainable value, is very attractive in practice. To be more specific, in our formulation, the objective function is the expected value of the rewards minus a scalar times the downside risk. In this setting, we analyze the infinite horizon MDP, the finite horizon MDP, and the infinite horizon semi-MDP (SMDP). We develop dynamic programming and reinforcement learning algorithms for the finite and infinite horizon. The algorithms are tested in numerical studies and show encouraging performance. 展开更多
关键词 Downside risk markov decision processes reinforcement learning dynamic programming TARGETS thresholds.
原文传递
SINGULARLY PERTURBED MARKOV DECISION PROCESSES WITH INCLUSION OF TRANSIENT STATES 被引量:1
8
作者 R.H.Liu Q.Zhang G.Yin 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2001年第2期199-211,共13页
This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed int... This paper is concerned with the continuous-time Markov decision processes (MDP) having weak and strong interactions. Using a hierarchical approach, the state space of the underlying Markov chain can be decomposed into several groups of recurrent states and a group of transient states resulting in a singularly perturbed MDP formulation. Instead of solving the original problem directly, a limit problem that is much simpler to handle is derived. On the basis of the optical solution of the limit problem, nearly optimal decisions are constructed for the original problem. The asymptotic optimality of the constructed control is obtained; the rate of convergence is ascertained. 展开更多
关键词 markov decision process dynamic programming asymptotically optimal control.
原文传递
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
9
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量:14
10
作者 Dimitri P.Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor... In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. 展开更多
关键词 reinforcement learning dynamic programming markovian decision problems AGGREGATION feature-based ARCHITECTURES policy ITERATION DEEP neural networks rollout algorithms
下载PDF
Reinforcement Learning-Based Joint Task Offloading and Migration Schemes Optimization in Mobility-Aware MEC Network 被引量:9
11
作者 Dongyu Wang Xinqiao Tian +1 位作者 Haoran Cui Zhaolin Liu 《China Communications》 SCIE CSCD 2020年第8期31-44,共14页
Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of... Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of the mobility of mobile equipments(MEs), if MEs move among the reach of the small cell networks(SCNs), the offloaded tasks cannot be returned to MEs successfully. As a result, migration incurs additional costs. In this paper, joint task offloading and migration schemes in mobility-aware Mobile Edge Computing(MEC) network based on Reinforcement Learning(RL) are proposed to obtain the maximum system revenue. Firstly, the joint optimization problems of maximizing the total revenue of MEs are put forward, in view of the mobility-aware MEs. Secondly, considering time-varying computation tasks and resource conditions, the mixed integer non-linear programming(MINLP) problem is described as a Markov Decision Process(MDP). Then we propose a novel reinforcement learning-based optimization framework to work out the problem, instead traditional methods. Finally, it is shown that the proposed schemes can obviously raise the total revenue of MEs by giving simulation results. 展开更多
关键词 MEC computation offloading mobility-aware migration scheme markov decision process reinforcement learning
下载PDF
A Heterogeneous Information Fusion Deep Reinforcement Learning for Intelligent Frequency Selection of HF Communication 被引量:6
12
作者 Xin Liu Yuhua Xu +3 位作者 Yunpeng Cheng Yangyang Li Lei Zhao Xiaobo Zhang 《China Communications》 SCIE CSCD 2018年第9期73-84,共12页
The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the cro... The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the crowded spectrum, the time-varying channels, and the malicious intelligent jamming. The existing frequency hopping, automatic link establishment and some new anti-jamming technologies can not completely solve the above problems. In this article, we adopt deep reinforcement learning to solve this intractable challenge. First, the combination of the spectrum state and the channel gain state is defined as the complex environmental state, and the Markov characteristic of defined state is analyzed and proved. Then, considering that the spectrum state and channel gain state are heterogeneous information, a new deep Q network(DQN) framework is designed, which contains multiple sub-networks to process different kinds of information. Finally, aiming to improve the learning speed and efficiency, the optimization targets of corresponding sub-networks are reasonably designed, and a heterogeneous information fusion deep reinforcement learning(HIF-DRL) algorithm is designed for the specific frequency selection. Simulation results show that the proposed algorithm performs well in channel prediction, jamming avoidance and frequency channel selection. 展开更多
关键词 HF communication ANTI-JAMMING intelligent frequency selection markov decision process deep reinforcement learning
下载PDF
A guidance method for coplanar orbital interception based on reinforcement learning 被引量:4
13
作者 ZENG Xin ZHU Yanwei +1 位作者 YANG Leping ZHANG Chengming 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第4期927-938,共12页
This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)mod... This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)model,then a welldesigned RL algorithm,experience based deep deterministic policy gradient(EBDDPG),is proposed to solve it.By taking the advantage of prior information generated through the optimal control model,the proposed algorithm not only resolves the convergence problem of the common RL algorithm,but also successfully trains an efficient deep neural network(DNN)controller for the chaser spacecraft to generate the control sequence.Numerical simulation results show that the proposed algorithm is feasible and the trained DNN controller significantly improves the efficiency over traditional optimization methods by roughly two orders of magnitude. 展开更多
关键词 orbital interception reinforcement learning(RL) markov decision process(MDP) deep neural network(DNN)
下载PDF
Airport gate assignment problem with deep reinforcement learning 被引量:3
14
作者 Zhao Jiaming Wu Wenjun +3 位作者 Liu Zhiming Han Changhao Zhang Xuanyi Zhang Yanhua 《High Technology Letters》 EI CAS 2020年第1期102-107,共6页
With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time... With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time AGAP algorithm is still an open issue.In this study,a deep reinforcement learning based AGAP(DRL-AGAP)is proposed.The optimization object is to maximize the rate of flights assigned to fixed gates.The real-time AGAP is modeled as a Markov decision process(MDP).The state space,action space,value and rewards have been defined.The DRL-AGAP algorithm is evaluated via simulation and it is compared with the flight pre-assignment results of the optimization software Gurobiand Greedy.Simulation results show that the performance of the proposed DRL-AGAP algorithm is close to that of pre-assignment obtained by the Gurobi optimization solver.Meanwhile,the real-time assignment ability is ensured by the proposed DRL-AGAP algorithm due to the dynamic modeling and lower complexity. 展开更多
关键词 AIRPORT gate ASSIGNMENT problem(AGAP) DEEP reinforcement learning(DRL) markov decision process(MDP)
下载PDF
Multi-task Coalition Parallel Formation Strategy Based on Reinforcement Learning 被引量:6
15
作者 JIANG Jian-Guo SU Zhao-Pin +1 位作者 QI Mei-Bin ZHANG Guo-Fu 《自动化学报》 EI CSCD 北大核心 2008年第3期349-352,共4页
代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证... 代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证明。而且,学习的加强被用来解决多工联盟平行的代理人行为策略,和这个过程形成被描述。在多工面向的领域,策略罐头有效地并且平行形式多工联盟。 展开更多
关键词 强化学习 多任务合并 平行排列 马尔可夫决策过程
下载PDF
Reinforcement Learning:A Technical Introduction–Part I
16
作者 Elmar Diederichs 《Journal of Autonomous Intelligence》 2019年第2期25-41,共17页
Reinforcement learning provides a cognitive science perspective to behavior and sequential decision making providedthat reinforcement learning algorithms introduce a computational concept of agency to the learning pro... Reinforcement learning provides a cognitive science perspective to behavior and sequential decision making providedthat reinforcement learning algorithms introduce a computational concept of agency to the learning problem.Hence it addresses an abstract class of problems that can be characterized as follows: An algorithm confronted withinformation from an unknown environment is supposed to find step wise an optimal way to behave based only on somesparse, delayed or noisy feedback from some environment, that changes according to the algorithm’s behavior. Hencereinforcement learning offers an abstraction to the problem of goal-directed learning from interaction. The paper offersan opinionated introduction in the algorithmic advantages and drawbacks of several algorithmic approaches to providealgorithmic design options. 展开更多
关键词 CLASSICAL reinforcement learning markov decision processes Prediction and Adaptive Control in UNKNOWN Environments Algorithmic Design
下载PDF
考虑光伏电源可靠性的新能源配电网数据驱动无功电压优化控制 被引量:1
17
作者 张波 高远 +2 位作者 李铁成 胡雪凯 贾焦心 《中国电机工程学报》 EI CSCD 北大核心 2024年第15期5934-5946,I0008,共14页
充分挖掘分布式光伏电源的无功支撑能力,有助于解决光伏高比例接入带来的配电网电压波动、电压越限以及新能源消纳等问题,但光伏电源无功输出会造成其功率器件结温越限或剧烈波动,严重威胁到光伏电源的可靠运行。为此,提出考虑光伏电源... 充分挖掘分布式光伏电源的无功支撑能力,有助于解决光伏高比例接入带来的配电网电压波动、电压越限以及新能源消纳等问题,但光伏电源无功输出会造成其功率器件结温越限或剧烈波动,严重威胁到光伏电源的可靠运行。为此,提出考虑光伏电源可靠性的新能源配电网数据驱动无功电压优化控制策略。首先,提出一种基于数据驱动的光伏电源可靠性评估方法,该方法采用XGBoost机器学习模型计算IGBT结温,提高了IGBT结温计算效率,避免了评估精度对IGBT参数的依赖;进而建立考虑光伏电源可靠性的配电网无功电压优化模型,将IGBT结温均值和结温波动引入模型优化目标;然后,将该模型进行马尔可夫决策过程转化,并基于深度确定性策略梯度强化学习算法完成智能体训练;最后,通过IEEE33节点系统验证所提策略在无功电压快速优化和光伏电源可靠性提升方面的优势。 展开更多
关键词 配电网 IGBT可靠性 无功电压优化 马尔可夫决策过程 强化学习
下载PDF
A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation
18
作者 James W. Mock Suresh S. Muknahallipatna 《Journal of Intelligent Learning Systems and Applications》 2023年第1期36-56,共21页
Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Poli... Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm. 展开更多
关键词 reinforcement learning Machine learning markov decision Process Domain Randomization
下载PDF
Efficient Temporal Difference Learning with Adaptive λ
19
作者 毕金波 吴沧浦 《Journal of Beijing Institute of Technology》 EI CAS 1999年第3期251-257,共7页
Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive scheme... Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence. 展开更多
关键词 dynamic programming delayed reinforcement learning absorbing markov decision processes temporal difference learning Q learning
下载PDF
基于深度强化学习的综合航电系统安全性优化方法
20
作者 赵长啸 李道俊 +2 位作者 孙亦轩 景鹏 田毅 《中国安全科学学报》 CAS CSCD 北大核心 2024年第7期123-131,共9页
为解决传统基于人工检查的安全性设计方法难以应对航电系统大规模集成带来的可选驻留方案爆炸问题,构建航电系统分区模型、任务模型以及安全关键等级量化模型,将考虑安全性的综合化设计优化问题模型化为马尔可夫决策过程(MDP)问题,并提... 为解决传统基于人工检查的安全性设计方法难以应对航电系统大规模集成带来的可选驻留方案爆炸问题,构建航电系统分区模型、任务模型以及安全关键等级量化模型,将考虑安全性的综合化设计优化问题模型化为马尔可夫决策过程(MDP)问题,并提出一种基于Actor-Critic框架的柔性动作-评价(SAC)算法的优化方法;为得到SAC算法的参数选择和训练结果之间的相关性,针对算法参数灵敏度开展研究;同时,为验证基于SAC算法的优化方法在优化考虑安全性的综合化设计方面的优越性,以深度确定性策略梯度(DDPG)算法和传统分配算法为对象,开展优化对比试验。结果表明:在最佳的参数组合下,使用的SAC算法收敛后的最大奖励相较于其他参数组合提升近8%,同时,收敛时间缩短近16.6%;相较于DDPG算法和传统分配算法,基于SAC算法的优化方法在相同的参数设置下获得的最大奖励、约束累计违背率、分区均衡风险效果、分区资源利用以及求解时间方面最大提升分别为62%、7464%、8370%、2123%和775%。 展开更多
关键词 深度强化学习 综合航电系统 安全性 优化方法 马尔可夫决策过程(MDP) 综合化设计
下载PDF
上一页 1 2 15 下一页 到第
使用帮助 返回顶部