期刊文献+
共找到343篇文章
< 1 2 18 >
每页显示 20 50 100
Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles
1
作者 Xiaoqi Qiu Peng Lai +1 位作者 Changsheng Gao Wuxing Jing 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期457-470,共14页
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u... This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws. 展开更多
关键词 Endoatmospheric interception Missile guidance reinforcement learning markov decision process Recurrent neural networks
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
2
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
Reinforcement Learning-Based Joint Task Offloading and Migration Schemes Optimization in Mobility-Aware MEC Network 被引量:7
3
作者 Dongyu Wang Xinqiao Tian +1 位作者 Haoran Cui Zhaolin Liu 《China Communications》 SCIE CSCD 2020年第8期31-44,共14页
Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of... Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of the mobility of mobile equipments(MEs), if MEs move among the reach of the small cell networks(SCNs), the offloaded tasks cannot be returned to MEs successfully. As a result, migration incurs additional costs. In this paper, joint task offloading and migration schemes in mobility-aware Mobile Edge Computing(MEC) network based on Reinforcement Learning(RL) are proposed to obtain the maximum system revenue. Firstly, the joint optimization problems of maximizing the total revenue of MEs are put forward, in view of the mobility-aware MEs. Secondly, considering time-varying computation tasks and resource conditions, the mixed integer non-linear programming(MINLP) problem is described as a Markov Decision Process(MDP). Then we propose a novel reinforcement learning-based optimization framework to work out the problem, instead traditional methods. Finally, it is shown that the proposed schemes can obviously raise the total revenue of MEs by giving simulation results. 展开更多
关键词 MEC computation offloading mobility-aware migration scheme markov decision process reinforcement learning
下载PDF
A guidance method for coplanar orbital interception based on reinforcement learning 被引量:3
4
作者 ZENG Xin ZHU Yanwei +1 位作者 YANG Leping ZHANG Chengming 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第4期927-938,共12页
This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)mod... This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)model,then a welldesigned RL algorithm,experience based deep deterministic policy gradient(EBDDPG),is proposed to solve it.By taking the advantage of prior information generated through the optimal control model,the proposed algorithm not only resolves the convergence problem of the common RL algorithm,but also successfully trains an efficient deep neural network(DNN)controller for the chaser spacecraft to generate the control sequence.Numerical simulation results show that the proposed algorithm is feasible and the trained DNN controller significantly improves the efficiency over traditional optimization methods by roughly two orders of magnitude. 展开更多
关键词 orbital interception reinforcement learning(RL) markov decision process(MDP) deep neural network(DNN)
下载PDF
Airport gate assignment problem with deep reinforcement learning 被引量:3
5
作者 赵家明 Wu Wenjun +3 位作者 Liu Zhiming Han Changhao Zhang Xuanyi Zhang Yanhua 《High Technology Letters》 EI CAS 2020年第1期102-107,共6页
With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time... With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time AGAP algorithm is still an open issue.In this study,a deep reinforcement learning based AGAP(DRL-AGAP)is proposed.The optimization object is to maximize the rate of flights assigned to fixed gates.The real-time AGAP is modeled as a Markov decision process(MDP).The state space,action space,value and rewards have been defined.The DRL-AGAP algorithm is evaluated via simulation and it is compared with the flight pre-assignment results of the optimization software Gurobiand Greedy.Simulation results show that the performance of the proposed DRL-AGAP algorithm is close to that of pre-assignment obtained by the Gurobi optimization solver.Meanwhile,the real-time assignment ability is ensured by the proposed DRL-AGAP algorithm due to the dynamic modeling and lower complexity. 展开更多
关键词 AIRPORT gate ASSIGNMENT problem(AGAP) DEEP reinforcement learning(DRL) markov decision process(MDP)
下载PDF
Multi-task Coalition Parallel Formation Strategy Based on Reinforcement Learning 被引量:6
6
作者 JIANG Jian-Guo SU Zhao-Pin +1 位作者 QI Mei-Bin ZHANG Guo-Fu 《自动化学报》 EI CSCD 北大核心 2008年第3期349-352,共4页
代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证... 代理人联盟是代理人协作和合作的一种重要方式。形成一个联盟,代理人能提高他们的能力解决问题并且获得更多的实用程序。在这份报纸,新奇多工联盟平行形成策略被介绍,并且多工联盟形成的过程是一个 Markov 决定过程的结论理论上被证明。而且,学习的加强被用来解决多工联盟平行的代理人行为策略,和这个过程形成被描述。在多工面向的领域,策略罐头有效地并且平行形式多工联盟。 展开更多
关键词 强化学习 多任务合并 平行排列 马尔可夫决策过程
下载PDF
A Heterogeneous Information Fusion Deep Reinforcement Learning for Intelligent Frequency Selection of HF Communication 被引量:6
7
作者 Xin Liu Yuhua Xu +3 位作者 Yunpeng Cheng Yangyang Li Lei Zhao Xiaobo Zhang 《China Communications》 SCIE CSCD 2018年第9期73-84,共12页
The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the cro... The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the crowded spectrum, the time-varying channels, and the malicious intelligent jamming. The existing frequency hopping, automatic link establishment and some new anti-jamming technologies can not completely solve the above problems. In this article, we adopt deep reinforcement learning to solve this intractable challenge. First, the combination of the spectrum state and the channel gain state is defined as the complex environmental state, and the Markov characteristic of defined state is analyzed and proved. Then, considering that the spectrum state and channel gain state are heterogeneous information, a new deep Q network(DQN) framework is designed, which contains multiple sub-networks to process different kinds of information. Finally, aiming to improve the learning speed and efficiency, the optimization targets of corresponding sub-networks are reasonably designed, and a heterogeneous information fusion deep reinforcement learning(HIF-DRL) algorithm is designed for the specific frequency selection. Simulation results show that the proposed algorithm performs well in channel prediction, jamming avoidance and frequency channel selection. 展开更多
关键词 频率选择 通讯方法 学习速度 信息 异构 熔化 环境状态 HF
下载PDF
Price-Based Residential Demand Response Management in Smart Grids:A Reinforcement Learning-Based Approach 被引量:1
8
作者 Yanni Wan Jiahu Qin +2 位作者 Xinghuo Yu Tao Yang Yu Kang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第1期123-134,共12页
This paper studies price-based residential demand response management(PB-RDRM)in smart grids,in which non-dispatchable and dispatchable loads(including general loads and plug-in electric vehicles(PEVs))are both involv... This paper studies price-based residential demand response management(PB-RDRM)in smart grids,in which non-dispatchable and dispatchable loads(including general loads and plug-in electric vehicles(PEVs))are both involved.The PB-RDRM is composed of a bi-level optimization problem,in which the upper-level dynamic retail pricing problem aims to maximize the profit of a utility company(UC)by selecting optimal retail prices(RPs),while the lower-level demand response(DR)problem expects to minimize the comprehensive cost of loads by coordinating their energy consumption behavior.The challenges here are mainly two-fold:1)the uncertainty of energy consumption and RPs;2)the flexible PEVs’temporally coupled constraints,which make it impossible to directly develop a model-based optimization algorithm to solve the PB-RDRM.To address these challenges,we first model the dynamic retail pricing problem as a Markovian decision process(MDP),and then employ a model-free reinforcement learning(RL)algorithm to learn the optimal dynamic RPs of UC according to the loads’responses.Our proposed RL-based DR algorithm is benchmarked against two model-based optimization approaches(i.e.,distributed dual decomposition-based(DDB)method and distributed primal-dual interior(PDI)-based method),which require exact load and electricity price models.The comparison results show that,compared with the benchmark solutions,our proposed algorithm can not only adaptively decide the RPs through on-line learning processes,but also achieve larger social welfare within an unknown electricity market environment. 展开更多
关键词 Demand response management(DRM) markovian decision process(MDP) Monte Carlo simulation reinforcement learning(RL) smart grid
下载PDF
Reinforcement Learning:A Technical Introduction–Part I
9
作者 Elmar Diederichs 《Journal of Autonomous Intelligence》 2019年第2期25-41,共17页
Reinforcement learning provides a cognitive science perspective to behavior and sequential decision making providedthat reinforcement learning algorithms introduce a computational concept of agency to the learning pro... Reinforcement learning provides a cognitive science perspective to behavior and sequential decision making providedthat reinforcement learning algorithms introduce a computational concept of agency to the learning problem.Hence it addresses an abstract class of problems that can be characterized as follows: An algorithm confronted withinformation from an unknown environment is supposed to find step wise an optimal way to behave based only on somesparse, delayed or noisy feedback from some environment, that changes according to the algorithm’s behavior. Hencereinforcement learning offers an abstraction to the problem of goal-directed learning from interaction. The paper offersan opinionated introduction in the algorithmic advantages and drawbacks of several algorithmic approaches to providealgorithmic design options. 展开更多
关键词 CLASSICAL reinforcement learning markov decision processes Prediction and Adaptive Control in UNKNOWN Environments Algorithmic Design
下载PDF
Reinforcement Learning Algorithm for Solving Load Commitment Problem Considering a General Load Model
10
作者 Thythodath Parambath Imthias Ahamed Sayed Danish Maqbool Nazar Hussain Malik 《Journal of Energy and Power Engineering》 2013年第6期1150-1162,共13页
关键词 强化学习算法 负载调度 模型求解 综合负荷 决策问题 消费者 公用事业 问题转化
下载PDF
A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation
11
作者 James W. Mock Suresh S. Muknahallipatna 《Journal of Intelligent Learning Systems and Applications》 2023年第1期36-56,共21页
Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Poli... Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm. 展开更多
关键词 reinforcement learning Machine learning markov decision process Domain Randomization
下载PDF
基于深度强化学习的多自动导引车运动规划 被引量:1
12
作者 孙辉 袁维 《计算机集成制造系统》 EI CSCD 北大核心 2024年第2期708-716,共9页
为解决移动机器人仓储系统中的多自动导引车(AGV)无冲突运动规划问题,建立了Markov决策过程模型,提出一种新的基于深度Q网络(DQN)的求解方法。将AGV的位置作为输入信息,利用DQN估计该状态下采取每个动作所能获得的最大期望累计奖励,并... 为解决移动机器人仓储系统中的多自动导引车(AGV)无冲突运动规划问题,建立了Markov决策过程模型,提出一种新的基于深度Q网络(DQN)的求解方法。将AGV的位置作为输入信息,利用DQN估计该状态下采取每个动作所能获得的最大期望累计奖励,并采用经典的深度Q学习算法进行训练。算例计算结果表明,该方法可以有效克服AGV车队在运动中的碰撞问题,使AGV车队能够在无冲突的情况下完成货架搬运任务。与已有启发式算法相比,该方法求得的AGV运动规划方案所需要的平均最大完工时间更短。 展开更多
关键词 多自动导引车 运动规划 markov决策过程 深度Q网络 深度Q学习
下载PDF
基于近端策略优化模板更新的实时目标跟踪方法
13
作者 孙愉亚 龚声蓉 +2 位作者 钟珊 周立凡 范利 《计算机工程与设计》 北大核心 2024年第5期1499-1507,共9页
基于孪生网络的目标跟踪算法往往采用第一帧的外观特征作为固定模板,难以应对目标外观剧烈变化等问题。为此,所提算法在孪生网络的基础上,引入深度强化学习,将模板更新问题建模为马尔可夫决策过程,采用近端策略优化算法进行优化,减少因... 基于孪生网络的目标跟踪算法往往采用第一帧的外观特征作为固定模板,难以应对目标外观剧烈变化等问题。为此,所提算法在孪生网络的基础上,引入深度强化学习,将模板更新问题建模为马尔可夫决策过程,采用近端策略优化算法进行优化,减少因目标外观变化带来的误差积累。针对孪生网络跟踪算法搜索域太小,无法全局搜索目标的问题,引入全局检测算法,找回丢失的目标。所提跟踪算法能够自适应更新模板和全局检测丢失的目标,在OTB数据集和GOT-10k数据集上进行测试,实验结果表明,该方法较代表性方法,具有实时性强和准确率高的优点,能够很好应对目标外观形变以及目标丢失。 展开更多
关键词 目标跟踪 深度强化学习 近端策略优化 马尔可夫决策过程 全局检测 更新模板 孪生网络
下载PDF
基于Q学习的蜂窝车联网边缘计算系统PC-5/Uu接口联合卸载策略
14
作者 冯伟杨 林思雨 +3 位作者 冯婧涛 李赟 孔繁鹏 艾渤 《电子学报》 EI CAS CSCD 北大核心 2024年第2期385-395,共11页
智能驾驶等智能交通服务对时延要求高,在车辆本身算力不足的情况下,车辆需要周围车辆和路旁边缘计算单元帮助其一起完成任务的计算处理.本文在既有车联网边缘计算卸载策略基础上,考虑了蜂窝车联网系统5G-NR接口与PC-5接口链路的特征差异... 智能驾驶等智能交通服务对时延要求高,在车辆本身算力不足的情况下,车辆需要周围车辆和路旁边缘计算单元帮助其一起完成任务的计算处理.本文在既有车联网边缘计算卸载策略基础上,考虑了蜂窝车联网系统5G-NR接口与PC-5接口链路的特征差异,提出了一种基于Q学习的PC-5/Uu接口联合边缘计算卸载策略.在对蜂窝车联网PC-5链路传输成功率进行建模的基础上,推导了PC-5链路的传输速率表征方法.以最小化蜂窝车联网任务处理时延为目标,以任务车辆发射功率与边缘计算车辆的计算能量损耗为约束,构建了系统时延最小化的有约束马尔科夫决策过程.通过拉格朗日方法,将有约束马尔科夫决策过程问题转化为一个等价的极小极大的无约束马尔科夫决策过程,引入Q学习设计卸载策略,进而提出基于Q学习的蜂窝车联网边缘计算系统卸载策略.仿真结果表明,与其他基线方案相比,本文提出的算法可以降低系统时延27.3%以上. 展开更多
关键词 蜂窝车联网 边缘计算 有约束马尔科夫过程 计算迁移 Q学习
下载PDF
基于Q-learning的虚拟网络功能调度方法 被引量:35
15
作者 王晓雷 陈云杰 +1 位作者 王琛 牛犇 《计算机工程》 CAS CSCD 北大核心 2019年第2期64-69,共6页
针对现有调度方法多数未考虑虚拟网络功能在实例化过程中的虚拟机选择问题,提出一种新的虚拟网络调度方法。建立基于马尔科夫决策过程的虚拟网络功能调度模型,以最小化所有服务功能链的服务延迟时间。通过设计基于Q-learning的动态调度... 针对现有调度方法多数未考虑虚拟网络功能在实例化过程中的虚拟机选择问题,提出一种新的虚拟网络调度方法。建立基于马尔科夫决策过程的虚拟网络功能调度模型,以最小化所有服务功能链的服务延迟时间。通过设计基于Q-learning的动态调度算法,优化虚拟网络功能的调度顺序和虚拟机选择问题,实现最短网络功能虚拟化调度时间。仿真结果表明,与传统的随机虚拟机选择策略相比,该方法能够有效降低虚拟网络功能调度时间,特别是在大规模网络中调度时间可降低约40%。 展开更多
关键词 网络功能虚拟化 服务功能链 调度模型 马尔科夫决策过程 Q-学习
下载PDF
基于距离信息的追逃策略:信念状态连续随机博弈
16
作者 陈灵敏 冯宇 李永强 《自动化学报》 EI CAS CSCD 北大核心 2024年第4期828-840,共13页
追逃问题的研究在对抗、追踪以及搜查等领域极具现实意义.借助连续随机博弈与马尔科夫决策过程(Markov decision process, MDP),研究使用测量距离求解多对一追逃问题的最优策略.在此追逃问题中,追捕群体仅领导者可测量与逃逸者间的相对... 追逃问题的研究在对抗、追踪以及搜查等领域极具现实意义.借助连续随机博弈与马尔科夫决策过程(Markov decision process, MDP),研究使用测量距离求解多对一追逃问题的最优策略.在此追逃问题中,追捕群体仅领导者可测量与逃逸者间的相对距离,而逃逸者具有全局视野.追逃策略求解被分为追博弈与马尔科夫决策两个过程.在求解追捕策略时,通过分割环境引入信念区域状态以估计逃逸者位置,同时使用测量距离对信念区域状态进行修正,构建起基于信念区域状态的连续随机追博弈,并借助不动点定理证明了博弈平稳纳什均衡策略的存在性.在求解逃逸策略时,逃逸者根据全局信息建立混合状态下的马尔科夫决策过程及相应的最优贝尔曼方程.同时给出了基于强化学习的平稳追逃策略求解算法,并通过案例验证了该算法的有效性. 展开更多
关键词 追逃问题 信念区域状态 连续随机博弈 马尔科夫决策过程 强化学习
下载PDF
基于运行成本约束的含碳捕集设备电力系统低碳调度模型
17
作者 陈郑平 韩晔 +4 位作者 孙蕾 李文忠 王芳东 崔晨 张兆功 《黑龙江大学自然科学学报》 CAS 2024年第1期38-47,共10页
发展低碳电力系统是应对全球变暖挑战的基础,如何充分发挥碳捕集设备的作用,且有效地解决含碳捕集设备电力系统的调度问题变得至关重要。针对含碳捕集设备场景下的低碳调度问题,首先建立风电、储能、火电机组的联合调度模型,根据此模型... 发展低碳电力系统是应对全球变暖挑战的基础,如何充分发挥碳捕集设备的作用,且有效地解决含碳捕集设备电力系统的调度问题变得至关重要。针对含碳捕集设备场景下的低碳调度问题,首先建立风电、储能、火电机组的联合调度模型,根据此模型以系统运行成本为优化目标,可以得到不含碳捕集设备场景下的系统最优成本。在此基础上,建立风电、储能、碳捕集以及火电机组的低碳优化调度模型,将无碳捕集设备场景下的系统最优成本作为约束条件之一,以二氧化碳排放量减少为调度优化目标。由于该场景下的低碳调度问题可以视为一个马尔可夫决策过程,最后基于深度强化学习模型对问题进行求解。实验结果表明,所提出的含碳捕集设备场景下的低碳调度模型,不但可以有效地控制系统总运行成本,同时可以进一步降低二氧化碳的排放量。 展开更多
关键词 碳捕集 低碳调度 马尔可夫决策过程 深度强化学习
下载PDF
IRS辅助的UAV无线传感网络数据采集优化方案
18
作者 贾向东 张鑫 +1 位作者 原帅前 李月 《信号处理》 CSCD 北大核心 2024年第6期1041-1051,共11页
针对无线传感网络海量数据采集导致的信息低时效与系统高能耗问题,提出一种智能反射面(Intelligent Reflecting Surface,IRS)辅助的无人机(Unmanned Aerial Vehicle,UAV)数据采集优化方案。其中,多个带有缓冲区的地面传感器收集环境信息... 针对无线传感网络海量数据采集导致的信息低时效与系统高能耗问题,提出一种智能反射面(Intelligent Reflecting Surface,IRS)辅助的无人机(Unmanned Aerial Vehicle,UAV)数据采集优化方案。其中,多个带有缓冲区的地面传感器收集环境信息,能量受限的UAV在IRS协助下采集传感器的状态更新。通过联合考虑系统信息新鲜度与UAV的推进能耗,对UAV的3D飞行轨迹、地面传感器调度与IRS配置进行联合优化,构建平均信息年龄(Age of Information,AoI)与推进能耗加权和优化问题;然后将该非凸优化问题建模为马尔可夫决策过程,并提出基于深度强化学习算法对基于曼哈顿城市模拟环境下的UAV数据采集过程进行优化训练。最终得到UAV的优化3D飞行轨迹与IRS的优化配置。仿真结果表明,所提优化算法可在提高信息新鲜度的同时有效降低系统能耗。当IRS反射单元数量相同时,系统性能相较基准方案最高提升约50.64%。证明了所提数据采集方案的优越性与IRS提高系统性能的有效性。 展开更多
关键词 信息年龄 无人机 3D轨迹优化 智能反射面 深度强化学习
下载PDF
基于深度强化学习的有源配电网实时电压控制策略
19
作者 陈潇潇 周云海 +1 位作者 张泰源 郑培城 《三峡大学学报(自然科学版)》 北大核心 2024年第1期76-84,共9页
大规模分布式光伏的接入给配电网电压控制带来了挑战.针对下垂控制中各光伏逆变器无法实现协同控制的问题,本文提出了一种基于多智能体深度强化学习的有源配电网实时电压控制策略.该控制策略将有源配电网电压控制物理模型转变为分散部... 大规模分布式光伏的接入给配电网电压控制带来了挑战.针对下垂控制中各光伏逆变器无法实现协同控制的问题,本文提出了一种基于多智能体深度强化学习的有源配电网实时电压控制策略.该控制策略将有源配电网电压控制物理模型转变为分散部分可观测的马尔科夫决策过程,并通过多智能体双延迟深度确定性策略梯度算法训练各智能体,在集中式训练-分散式执行的框架下实现光伏逆变器的协同电压控制.该策略无需精确的配电网物理模型,将各光伏逆变器作为强化学习环境中的智能体,与环境交互的过程中学习最优控制策略,能够应对有源配电网中源荷的随机变化,实时开展电压控制.改进的IEEE-33节点算例验证结果表明,本文所提策略具备良好的稳压减损性能. 展开更多
关键词 配电网电压控制 分布式光伏 多智能体深度强化学习 数据驱动 马尔科夫决策过程
下载PDF
基于强化学习算法的神经网络模糊测试技术优化研究
20
作者 张宇豪 关昕 《计算机测量与控制》 2024年第3期131-137,共7页
现有神经网络模糊测试技术在测试样本生成阶段通常对初始样本进行随机变异,导致生成样本质量不高,从而测试覆盖率不高;针对以上问题,提出一种基于强化学习算法的神经网络模糊测试技术,将模糊测试过程建模为马尔可夫决策过程,在该模型中... 现有神经网络模糊测试技术在测试样本生成阶段通常对初始样本进行随机变异,导致生成样本质量不高,从而测试覆盖率不高;针对以上问题,提出一种基于强化学习算法的神经网络模糊测试技术,将模糊测试过程建模为马尔可夫决策过程,在该模型中,测试样本被看作环境状态,不同的变异方法被看作可供选择的动作空间,神经元覆盖率被看作奖励反馈,使用强化学习算法来学习最优的变异策略,指导生成最优测试样本,使其能够获得最高的神经元覆盖率;通过与现有的主流神经网络模糊测试方法的对比实验表明,基于强化学习算法的神经网络模糊测试技术,可以提升在不同粒度下的神经元覆盖。 展开更多
关键词 模糊测试 神经网络 强化学习 马尔科夫决策过程 奖励函数
下载PDF
上一页 1 2 18 下一页 到第
使用帮助 返回顶部