期刊文献+
共找到81篇文章
< 1 2 5 >
每页显示 20 50 100
Automatic depth matching method of well log based on deep reinforcement learning
1
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep Q-network
下载PDF
Exploring Deep Reinforcement Learning with Multi Q-Learning 被引量:26
2
作者 Ethan Duryea Michael Ganger Wei Hu 《Intelligent Control and Automation》 2016年第4期129-144,共16页
Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but... Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58. 展开更多
关键词 reinforcement learning deep learning Multi q-learning
下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior
3
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(ddqn)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(DRL) sparse returns
下载PDF
RIS-Assisted UAV-D2D Communications Exploiting Deep Reinforcement Learning
4
作者 YOU Qian XU Qian +2 位作者 YANG Xin ZHANG Tao CHEN Ming 《ZTE Communications》 2023年第2期61-69,共9页
Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interferenc... Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment. 展开更多
关键词 device-to-device communications reconfigurable intelligent surface deep reinforcement learning softmax deep double deterministic policy gradient
下载PDF
Supervisory control of the hybrid off-highway vehicle for fuel economy improvement using predictive double Q-learning with backup models
5
作者 SHUAI Bin LI Yan-fei +2 位作者 ZHOU Quan XU Hong-ming SHUAI Shi-jin 《Journal of Central South University》 SCIE EI CAS CSCD 2022年第7期2266-2278,共13页
This paper studied a supervisory control system for a hybrid off-highway electric vehicle under the chargesustaining(CS)condition.A new predictive double Q-learning with backup models(PDQL)scheme is proposed to optimi... This paper studied a supervisory control system for a hybrid off-highway electric vehicle under the chargesustaining(CS)condition.A new predictive double Q-learning with backup models(PDQL)scheme is proposed to optimize the engine fuel in real-world driving and improve energy efficiency with a faster and more robust learning process.Unlike the existing“model-free”methods,which solely follow on-policy and off-policy to update knowledge bases(Q-tables),the PDQL is developed with the capability to merge both on-policy and off-policy learning by introducing a backup model(Q-table).Experimental evaluations are conducted based on software-in-the-loop(SiL)and hardware-in-the-loop(HiL)test platforms based on real-time modelling of the studied vehicle.Compared to the standard double Q-learning(SDQL),the PDQL only needs half of the learning iterations to achieve better energy efficiency than the SDQL at the end learning process.In the SiL under 35 rounds of learning,the results show that the PDQL can improve the vehicle energy efficiency by 1.75%higher than SDQL.By implementing the PDQL in HiL under four predefined real-world conditions,the PDQL can robustly save more than 5.03%energy than the SDQL scheme. 展开更多
关键词 supervisory charge-sustaining control hybrid electric vehicle reinforcement learning predictive double q-learning
下载PDF
Double Sarsa and Double Expected Sarsa with Shallow and Deep Learning 被引量:10
6
作者 Michael Ganger Ethan Duryea Wei Hu 《Journal of Data Analysis and Information Processing》 2016年第4期159-176,共18页
Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, pro... Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms called Double Sarsa and Double Expected Sarsa that are shown to be more robust than their single counterparts when rewards are stochastic. We find that these algorithms add a significant amount of stability in the learning process at only a minor computational cost, which leads to higher returns when using an on-policy algorithm. We then use shallow and deep neural networks to approximate the actionvalue, and show that Double Sarsa and Double Expected Sarsa are much more stable after convergence and can collect larger rewards than the single versions. 展开更多
关键词 double Sarsa double Expected Sarsa reinforcement learning deep learning
下载PDF
Deep Q-Learning Based Optimal Query Routing Approach for Unstructured P2P Network 被引量:1
7
作者 Mohammad Shoab Abdullah Shawan Alotaibi 《Computers, Materials & Continua》 SCIE EI 2022年第3期5765-5781,共17页
Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen... Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively. 展开更多
关键词 reinforcement learning deep q-learning unstructured p2p network query routing
下载PDF
基于Dueling Double DQN的交通信号控制方法
8
作者 叶宝林 陈栋 +2 位作者 刘春元 陈滨 吴维敏 《计算机测量与控制》 2024年第7期154-161,共8页
为了提高交叉口通行效率缓解交通拥堵,深入挖掘交通状态信息中所包含的深层次隐含特征信息,提出了一种基于Dueling Double DQN(D3QN)的单交叉口交通信号控制方法;构建了一个基于深度强化学习Double DQN(DDQN)的交通信号控制模型,对动作... 为了提高交叉口通行效率缓解交通拥堵,深入挖掘交通状态信息中所包含的深层次隐含特征信息,提出了一种基于Dueling Double DQN(D3QN)的单交叉口交通信号控制方法;构建了一个基于深度强化学习Double DQN(DDQN)的交通信号控制模型,对动作-价值函数的估计值和目标值迭代运算过程进行了优化,克服基于深度强化学习DQN的交通信号控制模型存在收敛速度慢的问题;设计了一个新的Dueling Network解耦交通状态和相位动作的价值,增强Double DQN(DDQN)提取深层次特征信息的能力;基于微观仿真平台SUMO搭建了一个单交叉口模拟仿真框架和环境,开展仿真测试;仿真测试结果表明,与传统交通信号控制方法和基于深度强化学习DQN的交通信号控制方法相比,所提方法能够有效减少车辆平均等待时间、车辆平均排队长度和车辆平均停车次数,明显提升交叉口通行效率。 展开更多
关键词 交通信号控制 深度强化学习 Dueling double DQN Dueling Network
下载PDF
Double DQN Method For Botnet Traffic Detection System
9
作者 Yutao Hu Yuntao Zhao +1 位作者 Yongxin Feng Xiangyu Ma 《Computers, Materials & Continua》 SCIE EI 2024年第4期509-530,共22页
In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforce... In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN. 展开更多
关键词 DQN ddqn deep reinforcement learning botnet detection feature classification
下载PDF
基于Double DQN的双模式多目标信号配时方法
10
作者 聂雷 张明萱 +1 位作者 黄庆涵 鲍海洲 《计算机技术与发展》 2024年第8期143-150,共8页
近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其... 近年来深度强化学习作为一种高效可靠的机器学习方法被广泛应用在交通信号控制领域。目前,现有交通信号配时方法通常忽略了特殊车辆(例如救护车、消防车等)的优先通行;此外,基于传统深度强化学习的信号配时方法优化目标较为单一,导致其在复杂交通场景中性能不佳。针对上述问题,基于Double DQN提出一种融合特殊车辆优先通行的双模式多目标信号配时方法(Dual-mode Multi-objective signal timing method based on Double DQN,DMDD),以提高不同交通场景下路口的通行效率。该方法首先基于路口的饱和状态选择信号控制模式,特殊车辆在紧急控制模式下被赋予更高的通行权重,有利于其更快通过路口;接着针对等待时长、队列长度和CO 2排放量3个指标分别设计神经网络进行奖励计算;最后利用Double DQN进行最优信号相位的选择,通过灵活切换信号相位以提升通行效率。基于SUMO的实验结果表明,DMDD与对比方法相比能有效缩短路口处特殊车辆的等待时长、队列长度和CO 2排放量,特殊车辆能够更快通过路口,有效地提高了通行效率。 展开更多
关键词 交通信号配时 深度强化学习 双模式多目标 double DQN SUMO
下载PDF
A deep Q-learning model for sequential task offloading in edge AI systems
11
作者 Dong Liu Shiheng Gu +1 位作者 Xinyu Fan Xu Zheng 《Intelligent and Converged Networks》 EI 2024年第3期207-221,共15页
Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This funda... Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This fundamental supports come from continuous data analysis and computation over these devices.Considering the resource constraints of terminal devices,multi-layer edge artificial intelligence systems improve the overall computing power of the system by scheduling computing tasks to edge and cloud servers for execution.Previous efforts tend to ignore the nature of strong pipelined characteristics of processing tasks in edge AI systems,such as the encryption,decryption and consensus algorithm supporting the implementation of Blockchain techniques.Therefore,this paper proposes a new pipelined task scheduling algorithm(referred to as PTS-RDQN),which utilizes the system representation ability of deep reinforcement learning and integrates multiple dimensional information to achieve global task scheduling.Specifically,a co-optimization strategy based on Rainbow Deep Q-Learning(RainbowDQN)is proposed to allocate computation tasks for mobile devices,edge and cloud servers,which is able to comprehensively consider the balance of task turnaround time,link quality,and other factors,thus effectively improving system performance and user experience.In addition,a task scheduling strategy based on PTS-RDQN is proposed,which is capable of realizing dynamic task allocation according to device load.The results based on many simulation experiments show that the proposed method can effectively improve the resource utilization,and provide an effective task scheduling strategy for the edge computing system with cloud-edge-end architecture. 展开更多
关键词 edge computing task scheduling reinforcement learning Rainbow deep q-learning(RainbowDQN)
原文传递
基于DDQN改进方法的“斗地主”策略
12
作者 孔燕 吴晓聪 +1 位作者 芮烨锋 史鸿远 《信息技术》 2024年第5期66-72,80,共8页
基于当前一些已有方法在牌类博弈中训练时间长、动作空间大、胜率低等问题,提出了针对DDQN算法网络架构、编码方式的改进方法。采用二进制对手牌特征进行编码,采用手牌拆分的方法把神经网络分为主牌神经网络和副牌神经网络,并且增加GRU... 基于当前一些已有方法在牌类博弈中训练时间长、动作空间大、胜率低等问题,提出了针对DDQN算法网络架构、编码方式的改进方法。采用二进制对手牌特征进行编码,采用手牌拆分的方法把神经网络分为主牌神经网络和副牌神经网络,并且增加GRU神经网络处理序列动作。经实验表明,该算法训练时间比传统DQN算法缩短了13%,在“地主”和“农民”位置上的平均胜率为70%和75%,高于DQN算法的28%和60%,证明了改进算法在上述部分指标方面的优势。 展开更多
关键词 深度强化学习 double deep q-learning 计算机博弈 Gate Recurrent Unit神经网络 大规模离散动作空间
下载PDF
Improved Double Deep Q Network-Based Task Scheduling Algorithm in Edge Computing for Makespan Optimization
13
作者 Lei Zeng Qi Liu +1 位作者 Shigen Shen Xiaodong Liu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第3期806-817,共12页
Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical ... Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan. 展开更多
关键词 edge computing task scheduling reinforcement learning MAKESPAN double deep Q Network(DQN)
原文传递
Learning-based joint UAV trajectory and power allocation optimization for secure IoT networks 被引量:3
14
作者 Dan Deng Xingwang Li +3 位作者 Varun Menon Md Jalil Piran Hui Chen Mian Ahmad Jan 《Digital Communications and Networks》 SCIE CSCD 2022年第4期415-421,共7页
Non-Orthogonal Multiplex Access(NOMA)can be deployed in Unmanned Aerial Vehicle(UAV)networks to improve spectrum efficiency.Due to the broadcasting feature of NOMA-UAV networks,it is essential to focus on the security... Non-Orthogonal Multiplex Access(NOMA)can be deployed in Unmanned Aerial Vehicle(UAV)networks to improve spectrum efficiency.Due to the broadcasting feature of NOMA-UAV networks,it is essential to focus on the security of the wireless system.This paper focuses on maximizing the secrecy sum rate under the constraint of the achievable rate of the legitimate channels.To tackle the non-convexity optimization problem,a reinforcement learning-based alternative optimization algorithm is proposed.Firstly,with the help of successive convex approximations,the optimal power allocation scheme with a given UAV trajectory is obtained by using convex optimization tools.Afterwards,through plenty of explorations of the wireless environment,the Q-learning networks approach the optimal location transition strategy of the UAV,even without the wireless channel state information. 展开更多
关键词 Unmanned aerial vehicle(UAV) NOMA reinforcement learning Secure communications deep q-learning
下载PDF
基于DDQN的生鲜农产品零售商库存成本控制模型 被引量:2
15
作者 李姣姣 何利力 郑军红 《智能计算机与应用》 2023年第10期60-64,72,共6页
针对生鲜农产品零售商库存成本控制问题,将该问题转换为马尔可夫决策过程,引入三参数Weibull函数,描述生鲜农产品的损腐特征,并考虑过期、损腐、缺货、订货和持有等成本,从供应链视角建立生鲜农产品库存成本控制模型,使用深度强化学习... 针对生鲜农产品零售商库存成本控制问题,将该问题转换为马尔可夫决策过程,引入三参数Weibull函数,描述生鲜农产品的损腐特征,并考虑过期、损腐、缺货、订货和持有等成本,从供应链视角建立生鲜农产品库存成本控制模型,使用深度强化学习中深度双Q网络(Double Deep Q Network,DDQN)优化订货,以控制库存总成本。实验结果表明,相比单周期随机型库存成本控制模型和固定订货量库存成本控制模型,DDQN模型的总成本分别降低约6%和11%,具有实际应用价值。 展开更多
关键词 生鲜农产品 深度强化学习 深度双Q网络 库存成本控制 供应链 WEIBULL分布
下载PDF
基于引导Minimax-DDQN的无人机空战机动决策 被引量:3
16
作者 王昱 任田君 范子琳 《计算机应用》 CSCD 北大核心 2023年第8期2636-2643,共8页
针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然... 针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。 展开更多
关键词 无人机空战 自主决策 深度强化学习 双重深度Q网络 多阶段训练
下载PDF
Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks 被引量:1
17
作者 Yutong CHEN Minghua HU +1 位作者 Yan XU Lei YANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第4期338-353,共16页
Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning... Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method.From a statis-tical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation. 展开更多
关键词 Air traffic flow management Demand and capacity bal-ancing deep q-learning network Flight delays GENERALISATION Ground delay program Multi-agent reinforcement learning
原文传递
基于深度强化学习和隐私保护的群智感知动态任务分配策略
18
作者 傅彦铭 陆盛林 +1 位作者 陈嘉元 覃华 《信息网络安全》 CSCD 北大核心 2024年第3期449-461,共13页
在移动群智感知(Mobile Crowd Sensing,MCS)中,动态任务分配的结果对提高系统效率和确保数据质量至关重要。然而,现有的大部分研究在处理动态任务分配时,通常将其简化为二分匹配模型,该简化模型未充分考虑任务属性与工人属性对匹配结果... 在移动群智感知(Mobile Crowd Sensing,MCS)中,动态任务分配的结果对提高系统效率和确保数据质量至关重要。然而,现有的大部分研究在处理动态任务分配时,通常将其简化为二分匹配模型,该简化模型未充分考虑任务属性与工人属性对匹配结果的影响,同时忽视了工人位置隐私的保护问题。针对这些不足,文章提出一种基于深度强化学习和隐私保护的群智感知动态任务分配策略。该策略首先通过差分隐私技术为工人位置添加噪声,保护工人隐私;然后利用深度强化学习方法自适应地调整任务批量分配;最后使用基于工人任务执行能力阈值的贪婪算法计算最优策略下的平台总效用。在真实数据集上的实验结果表明,该策略在不同参数设置下均能保持优越的性能,同时有效地保护了工人的位置隐私。 展开更多
关键词 群智感知 深度强化学习 隐私保护 双深度Q网络 能力阈值贪婪算法
下载PDF
基于深度强化学习的测井曲线自动深度校正方法 被引量:3
19
作者 熊文君 肖立志 +1 位作者 袁江如 岳文正 《石油勘探与开发》 EI CAS CSCD 北大核心 2024年第3期553-564,共12页
针对传统测井曲线深度校正需要手动调整曲线,而对于多口井的深度校正工作量巨大,需要大量人工参与,且工作效率较低的问题,提出一种多智能体深度强化学习方法(MARL)来实现多条测井曲线自动深度匹配。该方法基于卷积神经网络(CNN)定义多... 针对传统测井曲线深度校正需要手动调整曲线,而对于多口井的深度校正工作量巨大,需要大量人工参与,且工作效率较低的问题,提出一种多智能体深度强化学习方法(MARL)来实现多条测井曲线自动深度匹配。该方法基于卷积神经网络(CNN)定义多个自上而下的双滑动窗口捕捉测井曲线上相似的特征序列,并设计一个智能体与环境的互动机制来控制深度匹配过程。通过双深度Q学习网络(DDQN)选取一个动作来平移或缩放测井特征序列,并利用反馈的奖励信号来评估每个动作的好坏,以学习到最优的控制策略达到提升深度校正精度的目的。研究表明,MARL方法可以自动完成多口井、不同测井曲线的深度校正任务,减少人工干预。在油田实例应用中,对比分析了动态时间规整(DTW)、深度Q学习网络(DQN)和DDQN等方法的测试结果,DDQN算法采用双网络评估机制有效改进了算法的性能,能够识别和对齐测井曲线特征序列上更多的细节,具有较高的深度匹配精度。 展开更多
关键词 人工智能 机器学习 深度校正 测井曲线 多智能体深度强化学习 卷积神经网络 双深度Q学习网络
下载PDF
自动化立体仓库退库货位优化问题及其求解算法 被引量:1
20
作者 何在祥 李丽 +1 位作者 张云峰 郗琳 《重庆理工大学学报(自然科学)》 CAS 北大核心 2024年第3期183-194,共12页
针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体... 针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体仓库实时存储信息和出库作业信息构建多维状态,以退库货位选择构建动作,建立自动化立体仓库退库货位优化的马尔科夫决策过程模型;将立体仓库多维状态特征输入双层决斗网络,采用决斗双重深度Q网络(dueling double deep Q-network,D3QN)算法训练网络模型并预测退库动作目标价值,以确定智能体的最优行为策略。实验结果表明D3QN算法在求解大规模退库货位优化问题上具有较好的稳定性。 展开更多
关键词 自动化立体仓库 退库货位优化 深度强化学习 D3QN
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部