期刊文献+
共找到68篇文章
< 1 2 4 >
每页显示 20 50 100
Transformer-Aided Deep Double Dueling Spatial-Temporal Q-Network for Spatial Crowdsourcing Analysis
1
作者 Yu Li Mingxiao Li +2 位作者 Dongyang Ou Junjie Guo Fangyuan Pan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第4期893-909,共17页
With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ... With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models. 展开更多
关键词 Historical behavior analysis spatial crowdsourcing deep double dueling q-networks
下载PDF
基于Dueling Double DQN的交通信号控制方法
2
作者 叶宝林 陈栋 +2 位作者 刘春元 陈滨 吴维敏 《计算机测量与控制》 2024年第7期154-161,共8页
为了提高交叉口通行效率缓解交通拥堵,深入挖掘交通状态信息中所包含的深层次隐含特征信息,提出了一种基于Dueling Double DQN(D3QN)的单交叉口交通信号控制方法;构建了一个基于深度强化学习Double DQN(DDQN)的交通信号控制模型,对动作... 为了提高交叉口通行效率缓解交通拥堵,深入挖掘交通状态信息中所包含的深层次隐含特征信息,提出了一种基于Dueling Double DQN(D3QN)的单交叉口交通信号控制方法;构建了一个基于深度强化学习Double DQN(DDQN)的交通信号控制模型,对动作-价值函数的估计值和目标值迭代运算过程进行了优化,克服基于深度强化学习DQN的交通信号控制模型存在收敛速度慢的问题;设计了一个新的Dueling Network解耦交通状态和相位动作的价值,增强Double DQN(DDQN)提取深层次特征信息的能力;基于微观仿真平台SUMO搭建了一个单交叉口模拟仿真框架和环境,开展仿真测试;仿真测试结果表明,与传统交通信号控制方法和基于深度强化学习DQN的交通信号控制方法相比,所提方法能够有效减少车辆平均等待时间、车辆平均排队长度和车辆平均停车次数,明显提升交叉口通行效率。 展开更多
关键词 交通信号控制 深度强化学习 dueling Double DQN dueling Network
下载PDF
未知环境下基于Dueling DQN的无人机路径规划研究
3
作者 赵恬恬 孔建国 +1 位作者 梁海军 刘晨宇 《现代计算机》 2024年第5期37-43,共7页
为有效解决无人机在未知环境下的路径规划问题,提出一种基于Dueling DQN的路径规划方法。首先,在DQN的基础上,引入对抗网络架构,从而更好地提高成功率;其次,设计状态空间并定义离散化的动作和适当的奖励函数以引导无人机学习最优路径;... 为有效解决无人机在未知环境下的路径规划问题,提出一种基于Dueling DQN的路径规划方法。首先,在DQN的基础上,引入对抗网络架构,从而更好地提高成功率;其次,设计状态空间并定义离散化的动作和适当的奖励函数以引导无人机学习最优路径;最后在仿真环境中对DQN和Dueling DQN展开训练,结果表明:①Dueling DQN能规划出未知环境下从初始点到目标点的无碰撞路径,且能获得更高的奖励值;②经过50000次训练,Dueling DQN的成功率比DQN提高17.71%,碰撞率减少1.57%,超过最长步长率降低16.14%。 展开更多
关键词 无人机 路径规划 深度强化学习 dueling DQN算法
下载PDF
Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks
4
作者 A.M.Hafiz M.Hassaballah +2 位作者 Abdullah Alqahtani Shtwai Alsubai Mohamed Abdel Hameed 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期2651-2666,共16页
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ... With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games. 展开更多
关键词 deep q-networks ensemble learning reinforcement learning OpenAI Gym environments
下载PDF
UAV Autonomous Navigation for Wireless Powered Data Collection with Onboard Deep Q-Network
5
作者 LI Yuting DING Yi +3 位作者 GAO Jiangchuan LIU Yusha HU Jie YANG Kun 《ZTE Communications》 2023年第2期80-87,共8页
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ... In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design. 展开更多
关键词 unmanned aerial vehicle wireless power transfer deep q-network autonomous navigation
下载PDF
Associative Tasks Computing Offloading Scheme in Internet of Medical Things with Deep Reinforcement Learning
6
作者 Jiang Fan Qin Junwei +1 位作者 Liu Lei Tian Hui 《China Communications》 SCIE CSCD 2024年第4期38-52,共15页
The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-rel... The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance. 展开更多
关键词 associative tasks cache-aided procedure double deep q-network Internet of Medical Things(IoMT) multi-access edge computing(MEC)
下载PDF
一种改进dueling网络的机器人避障方法 被引量:5
7
作者 周翼 陈渤 《西安电子科技大学学报》 EI CAS CSCD 北大核心 2019年第1期46-50,63,共6页
针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并... 针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并利用两个独立训练的dueling网络处理环境数据来预测动作值,在输出层分别输出状态值和动作优势值,并将两者结合输出最终动作值。该模型能处理较高维度数据以适应复杂多变的环境,并输出优势动作供机器人选择以获得更高的累积奖励。实验结果表明,该新算法模型能有效地提升机器人避障性能。 展开更多
关键词 机器人避障 深度增强学习 dueling网络 独立训练
下载PDF
Convolutional Neural Network-Based Deep Q-Network (CNN-DQN) Resource Management in Cloud Radio Access Network 被引量:1
8
作者 Amjad Iqbal Mau-Luen Tham Yoong Choon Chang 《China Communications》 SCIE CSCD 2022年第10期129-142,共14页
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi... The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach. 展开更多
关键词 energy efficiency(EE) markov decision process(MDP) convolutional neural network(CNN) cloud RAN deep q-network(DQN)
下载PDF
基于Dueling Network与RRT的机械臂抓放控制 被引量:2
9
作者 王永 李金泽 《机床与液压》 北大核心 2021年第17期59-64,共6页
针对当前机械臂抓取与放置方式固定、指令单一、难以应对复杂未知情况的不足,提出一种基于深度强化学习与RRT的机械臂抓放控制方法。该方法将物件抓取与放置问题视为马尔科夫过程,通过物件视场要素描述以及改进的深度强化学习算法Duelin... 针对当前机械臂抓取与放置方式固定、指令单一、难以应对复杂未知情况的不足,提出一种基于深度强化学习与RRT的机械臂抓放控制方法。该方法将物件抓取与放置问题视为马尔科夫过程,通过物件视场要素描述以及改进的深度强化学习算法Dueling Network实现对未知物件的自主抓取,经过关键点选取以及RRT算法依据任务需要将物件准确放置于目标位置。实验结果表明:该方法简便有效,机械臂抓取与放置自主灵活,可进一步提升机械臂应对未知物件的自主操控能力,满足对不同物件抓取与放置任务的需求。 展开更多
关键词 机械臂 深度强化学习 dueling Network RRT 抓放控制
下载PDF
Walking Stability Control Method for Biped Robot on Uneven Ground Based on Deep Q-Network
10
作者 Baoling Han Yuting Zhao Qingsheng Luo 《Journal of Beijing Institute of Technology》 EI CAS 2019年第3期598-605,共8页
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ... A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved. 展开更多
关键词 deep q-network (DQN) BIPED robot uneven ground WALKING STABILITY gait control
下载PDF
Multi-Agent Deep Q-Networks for Efficient Edge Federated Learning Communications in Software-Defined IoT
11
作者 Prohim Tam Sa Math +1 位作者 Ahyoung Lee Seokhoon Kim 《Computers, Materials & Continua》 SCIE EI 2022年第5期3319-3335,共17页
Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging ... Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput. 展开更多
关键词 deep q-networks federated learning network functions virtualization quality of service software-defined networking
下载PDF
考虑行为克隆的深度强化学习股票交易策略
12
作者 杨兴雨 陈亮威 +1 位作者 郑萧腾 张永 《系统管理学报》 CSCD 北大核心 2024年第1期150-161,共12页
为提高股票投资的收益并降低风险,将模仿学习中的行为克隆思想引入深度强化学习框架中设计股票交易策略。在策略设计过程中,将对决DQN深度强化学习算法和行为克隆进行结合,使智能体在自主探索的同时模仿事先构造的投资专家的决策。选择... 为提高股票投资的收益并降低风险,将模仿学习中的行为克隆思想引入深度强化学习框架中设计股票交易策略。在策略设计过程中,将对决DQN深度强化学习算法和行为克隆进行结合,使智能体在自主探索的同时模仿事先构造的投资专家的决策。选择不同行业的股票进行数值实验,说明了所设计的交易策略在年化收益率、夏普比率和卡玛比率等收益与风险指标上优于对比策略。研究结果表明:将模仿学习与深度强化学习相结合可以使智能体同时具有探索和模仿能力,从而提高模型的泛化能力和策略的适用性。 展开更多
关键词 股票交易策略 深度强化学习 模仿学习 行为克隆 对决深度Q学习网络
下载PDF
基于改进联邦竞争深度Q网络的多微网能量管理策略
13
作者 黎海涛 刘伊然 +3 位作者 杨艳红 肖浩 谢冬雪 裴玮 《电力系统自动化》 EI CSCD 北大核心 2024年第8期174-184,共11页
目前,基于联邦深度强化学习的微网(MG)能量管理研究未考虑多类型能量转换与MG间电量交易的问题,同时,频繁交互模型参数导致通信时延较大。基于此,以一种包含风、光、电、气等多类型能源的MG为研究对象,构建了支持MG间电量交易和MG内能... 目前,基于联邦深度强化学习的微网(MG)能量管理研究未考虑多类型能量转换与MG间电量交易的问题,同时,频繁交互模型参数导致通信时延较大。基于此,以一种包含风、光、电、气等多类型能源的MG为研究对象,构建了支持MG间电量交易和MG内能量转换的能量管理模型,提出基于正余弦算法的联邦竞争深度Q网络学习算法,并基于该算法设计了计及能量交易与转换的多MG能量管理与优化策略。仿真结果表明,所提能量管理策略在保护数据隐私的前提下,能够得到更高奖励且最大化MG经济收益,同时降低了通信时延。 展开更多
关键词 微网(MG) 联邦学习 竞争深度Q网络 正余弦算法 能量管理
下载PDF
基于知识融合和深度强化学习的智能紧急切机决策
14
作者 李舟平 曾令康 +4 位作者 姚伟 胡泽 帅航 汤涌 文劲宇 《中国电机工程学报》 EI CSCD 北大核心 2024年第5期1675-1687,I0001,共14页
紧急控制是在严重故障后维持电力系统暂态安全稳定的重要手段。目前常用的“人在环路”离线紧急控制决策制定方式存在效率不高、严重依赖专家经验等问题,该文提出一种基于知识融合和深度强化学习(deep reinforcement learning,DRL)的智... 紧急控制是在严重故障后维持电力系统暂态安全稳定的重要手段。目前常用的“人在环路”离线紧急控制决策制定方式存在效率不高、严重依赖专家经验等问题,该文提出一种基于知识融合和深度强化学习(deep reinforcement learning,DRL)的智能紧急切机决策制定方法。首先,构建基于DRL的紧急切机决策制定框架。然后,在智能体处理多个发电机决策时,由于产生的高维决策空间使得智能体训练困难,提出决策空间压缩和应用分支竞争Q(branching dueling Q,BDQ)网络的两种解决方法。接着,为了进一步提高智能体的探索效率和决策质量,在智能体训练中融合紧急切机控制相关知识经验。最后,在10机39节点系统中的仿真结果表明,所提方法可以在多发电机决策时快速给出有效的紧急切机决策,应用BDQ网络比决策空间压缩的决策性能更好,知识融合策略可引导智能体减少无效决策探索从而提升决策性能。 展开更多
关键词 紧急切机决策 深度强化学习 决策空间 分支竞争Q网络 知识融合
下载PDF
自动化立体仓库退库货位优化问题及其求解算法
15
作者 何在祥 李丽 +1 位作者 张云峰 郗琳 《重庆理工大学学报(自然科学)》 CAS 北大核心 2024年第3期183-194,共12页
针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体... 针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体仓库实时存储信息和出库作业信息构建多维状态,以退库货位选择构建动作,建立自动化立体仓库退库货位优化的马尔科夫决策过程模型;将立体仓库多维状态特征输入双层决斗网络,采用决斗双重深度Q网络(dueling double deep Q-network,D3QN)算法训练网络模型并预测退库动作目标价值,以确定智能体的最优行为策略。实验结果表明D3QN算法在求解大规模退库货位优化问题上具有较好的稳定性。 展开更多
关键词 自动化立体仓库 退库货位优化 深度强化学习 D3QN
下载PDF
基于双智能体深度强化学习的交直流配电网经济调度方法
16
作者 赵倩宇 韩照洋 +3 位作者 王守相 尹孜阳 董逸超 钱广超 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2024年第6期624-632,共9页
随着大量直流电源和负荷的接入,交直流混合的配电网技术已成为未来配电网的发展趋势.然而,源荷不确定性及可调度设备的类型多样化给配电网调度带来了巨大的挑战.本文提出了基于分支决斗深度强化网络(branching dueling Q-network,BDQ)... 随着大量直流电源和负荷的接入,交直流混合的配电网技术已成为未来配电网的发展趋势.然而,源荷不确定性及可调度设备的类型多样化给配电网调度带来了巨大的挑战.本文提出了基于分支决斗深度强化网络(branching dueling Q-network,BDQ)和软演员-评论家(soft actor critic,SAC)双智能体深度强化学习的交直流配电网调度方法.该方法首先将经济调度问题与两智能体的动作、奖励、状态相结合,建立经济调度的马尔可夫决策过程,并分别基于BDQ和SAC方法设置两个智能体,其中,BDQ智能体用于控制配电网中离散动作设备,SAC智能体用于控制连续动作设备.然后,通过集中训练分散执行的方式,两智能体与环境进行交互,进行离线训练.最后,固定智能体的参数,进行在线调度.该方法的优势在于采用双智能体能够同时控制离散动作设备电容器组、载调压变压器和连续动作设备变流器、储能,同时通过对双智能体的集中训练,可以自适应源荷的不确定性.改进的IEEE33节点交直流配电网算例测试验证了所提方法的有效性. 展开更多
关键词 交直流配电网 深度强化学习 经济调度 分支决斗深度强化网络 软演员-评论家
下载PDF
基于深度强化学习的AUV路径规划研究
17
作者 房鹏程 周焕银 董玫君 《机床与液压》 北大核心 2024年第9期134-141,共8页
针对三维海洋环境水下自主航行器(AUV)路径规划问题,传统的路径规划算法在三维空间中搜索时间长,对环境的依赖性强,且环境发生改变时,需要重新规划路径,不满足实时性要求。为了使AUV能够自主学习场景并做出决策,提出一种改进的Dueling ... 针对三维海洋环境水下自主航行器(AUV)路径规划问题,传统的路径规划算法在三维空间中搜索时间长,对环境的依赖性强,且环境发生改变时,需要重新规划路径,不满足实时性要求。为了使AUV能够自主学习场景并做出决策,提出一种改进的Dueling DQN算法,更改了传统的网络结构以适应AUV路径规划场景。此外,针对路径规划在三维空间中搜寻目标点困难的问题,在原有的优先经验回放池基础上提出了经验蒸馏回放池,使智能体学习失败经验从而提高模型前期的收敛速度和稳定性。仿真实验结果表明:所提出的算法比传统路径规划算法具有更高的实时性,规划路径更短,在收敛速度和稳定性方面都优于标准的DQN算法。 展开更多
关键词 自主水下航行器(AUV) 三维路径规划 深度强化学习 dueling DQN算法
下载PDF
A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments
18
作者 LIU Peng XIA Boyuan +2 位作者 YANG Zhiwei LI Jichao TAN Yuejin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第6期1159-1175,共17页
Equipment development planning(EDP)is usually a long-term process often performed in an environment with high uncertainty.The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with ... Equipment development planning(EDP)is usually a long-term process often performed in an environment with high uncertainty.The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with unpredictable situations.To deal with this problem,a multi-stage EDP model based on a deep reinforcement learning(DRL)algorithm is proposed to respond quickly to any environmental changes within a reasonable range.Firstly,the basic problem of multi-stage EDP is described,and a mathematical planning model is constructed.Then,for two kinds of uncertainties(future capabi lity requirements and the amount of investment in each stage),a corresponding DRL framework is designed to define the environment,state,action,and reward function for multi-stage EDP.After that,the dueling deep Q-network(Dueling DQN)algorithm is used to solve the multi-stage EDP to generate an approximately optimal multi-stage equipment development scheme.Finally,a case of ten kinds of equipment in 100 possible environments,which are randomly generated,is used to test the feasibility and effectiveness of the proposed models.The results show that the algorithm can respond instantaneously in any state of the multistage EDP environment and unlike traditional algorithms,the algorithm does not need to re-optimize the problem for any change in the environment.In addition,the algorithm can flexibly adjust at subsequent planning stages in the event of a change to the equipment capability requirements to adapt to the new requirements. 展开更多
关键词 equipment development planning(EDP) MULTI-STAGE reinforcement learning uncertainty dueling deep q-network(dueling DQN)
下载PDF
基于改进D3QN的煤炭码头卸车排产智能优化方法
19
作者 秦保新 张羽霄 +2 位作者 吴思锐 曹卫冲 李湛 《系统仿真学报》 CAS CSCD 北大核心 2024年第3期770-781,共12页
采用智能化决策排产能够提高大型港口的运营效率,是人工智能技术在智慧港口场景落地的重要研究方向之一。针对煤炭码头卸车智能排产任务,将其抽象为马尔可夫序列决策问题。建立了该问题的深度强化学习模型,并针对该模型中动作空间维度... 采用智能化决策排产能够提高大型港口的运营效率,是人工智能技术在智慧港口场景落地的重要研究方向之一。针对煤炭码头卸车智能排产任务,将其抽象为马尔可夫序列决策问题。建立了该问题的深度强化学习模型,并针对该模型中动作空间维度高且可行动作稀疏的特点,提出一种改进的D3QN算法,实现了卸车排产调度决策的智能优化。仿真结果表明,对于同一组随机任务序列,优化后的排产策略相比随机策略实现了明显的效率提升。同时,将训练好的排产策略应用于随机生成的新任务序列,可实现5%~7%的排产效率提升,表明该优化方法具有较好的泛化能力。此外,随着决策模型复杂度的提升,传统启发式优化算法面临建模困难、求解效率低等突出问题。所提算法为该类问题的研究提供了一种新思路,有望实现深度强化学习智能决策在港口排产任务中的更广泛应用。 展开更多
关键词 码头卸车排产 调度策略优化 智能决策 深度强化学习 duelingDoubleDQN算法
下载PDF
Deep Reinforcement Learning-Based Computation Offloading for 5G Vehicle-Aware Multi-Access Edge Computing Network 被引量:11
20
作者 Ziying Wu Danfeng Yan 《China Communications》 SCIE CSCD 2021年第11期26-41,共16页
Multi-access Edge Computing(MEC)is one of the key technologies of the future 5G network.By deploying edge computing centers at the edge of wireless access network,the computation tasks can be offloaded to edge servers... Multi-access Edge Computing(MEC)is one of the key technologies of the future 5G network.By deploying edge computing centers at the edge of wireless access network,the computation tasks can be offloaded to edge servers rather than the remote cloud server to meet the requirements of 5G low-latency and high-reliability application scenarios.Meanwhile,with the development of IOV(Internet of Vehicles)technology,various delay-sensitive and compute-intensive in-vehicle applications continue to appear.Compared with traditional Internet business,these computation tasks have higher processing priority and lower delay requirements.In this paper,we design a 5G-based vehicle-aware Multi-access Edge Computing network(VAMECN)and propose a joint optimization problem of minimizing total system cost.In view of the problem,a deep reinforcement learningbased joint computation offloading and task migration optimization(JCOTM)algorithm is proposed,considering the influences of multiple factors such as concurrent multiple computation tasks,system computing resources distribution,and network communication bandwidth.And,the mixed integer nonlinear programming problem is described as a Markov Decision Process.Experiments show that our proposed algorithm can effectively reduce task processing delay and equipment energy consumption,optimize computing offloading and resource allocation schemes,and improve system resource utilization,compared with other computing offloading policies. 展开更多
关键词 multi-access edge computing computation offloading 5G vehicle-aware deep reinforcement learning deep q-network
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部