期刊文献+
共找到31篇文章
< 1 2 >
每页显示 20 50 100
Reinforcement Learning with an Ensemble of Binary Action Deep Q-Networks
1
作者 A.M.Hafiz M.Hassaballah +2 位作者 Abdullah Alqahtani Shtwai Alsubai Mohamed Abdel Hameed 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期2651-2666,共16页
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ... With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games. 展开更多
关键词 Deep q-networks ensemble learning reinforcement learning OpenAI Gym environments
下载PDF
Multi-Agent Deep Q-Networks for Efficient Edge Federated Learning Communications in Software-Defined IoT
2
作者 Prohim Tam Sa Math +1 位作者 Ahyoung Lee Seokhoon Kim 《Computers, Materials & Continua》 SCIE EI 2022年第5期3319-3335,共17页
Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging ... Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput. 展开更多
关键词 Deep q-networks federated learning network functions virtualization quality of service software-defined networking
下载PDF
Transformer-Aided Deep Double Dueling Spatial-Temporal Q-Network for Spatial Crowdsourcing Analysis
3
作者 Yu Li Mingxiao Li +2 位作者 Dongyang Ou Junjie Guo Fangyuan Pan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第4期893-909,共17页
With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ... With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models. 展开更多
关键词 Historical behavior analysis spatial crowdsourcing deep double dueling q-networks
下载PDF
UAV Autonomous Navigation for Wireless Powered Data Collection with Onboard Deep Q-Network
4
作者 LI Yuting DING Yi +3 位作者 GAO Jiangchuan LIU Yusha HU Jie YANG Kun 《ZTE Communications》 2023年第2期80-87,共8页
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ... In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design. 展开更多
关键词 unmanned aerial vehicle wireless power transfer deep q-network autonomous navigation
下载PDF
Automatic depth matching method of well log based on deep reinforcement learning
5
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep q-network
下载PDF
Associative Tasks Computing Offloading Scheme in Internet of Medical Things with Deep Reinforcement Learning
6
作者 Jiang Fan Qin Junwei +1 位作者 Liu Lei Tian Hui 《China Communications》 SCIE CSCD 2024年第4期38-52,共15页
The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-rel... The Internet of Medical Things(Io MT) is regarded as a critical technology for intelligent healthcare in the foreseeable 6G era. Nevertheless, due to the limited computing power capability of edge devices and task-related coupling relationships, Io MT faces unprecedented challenges. Considering the associative connections among tasks, this paper proposes a computing offloading policy for multiple-user devices(UDs) considering device-to-device(D2D) communication and a multi-access edge computing(MEC)technique under the scenario of Io MT. Specifically,to minimize the total delay and energy consumption concerning the requirement of Io MT, we first analyze and model the detailed local execution, MEC execution, D2D execution, and associated tasks offloading exchange model. Consequently, the associated tasks’ offloading scheme of multi-UDs is formulated as a mixed-integer nonconvex optimization problem. Considering the advantages of deep reinforcement learning(DRL) in processing tasks related to coupling relationships, a Double DQN based associative tasks computing offloading(DDATO) algorithm is then proposed to obtain the optimal solution, which can make the best offloading decision under the condition that tasks of UDs are associative. Furthermore, to reduce the complexity of the DDATO algorithm, the cacheaided procedure is intentionally introduced before the data training process. This avoids redundant offloading and computing procedures concerning tasks that previously have already been cached by other UDs. In addition, we use a dynamic ε-greedy strategy in the action selection section of the algorithm, thus preventing the algorithm from falling into a locally optimal solution. Simulation results demonstrate that compared with other existing methods for associative task models concerning different structures in the Io MT network, the proposed algorithm can lower the total cost more effectively and efficiently while also providing a tradeoff between delay and energy consumption tolerance. 展开更多
关键词 associative tasks cache-aided procedure double deep q-network Internet of Medical Things(IoMT) multi-access edge computing(MEC)
下载PDF
Value Function Mechanism in WSNs-Based Mango Plantation Monitoring System
7
作者 Wen-Tsai Sung Indra Griha Tofik Isa Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2024年第9期3733-3759,共27页
Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.... Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.In this study,a Wireless Sensor Networks(“WSNs”)-based intelligent mango plantation monitoring system will be developed that implements deep reinforcement learning(DRL)technology in carrying out prediction tasks based on three classifications:“optimal,”“sub-optimal,”or“not-optimal”conditions based on three parameters including humidity,temperature,and soil moisture.The key idea is how to provide a precise decision-making mechanism in the real-time monitoring system.A value function-based will be employed to perform DRL model called deep Q-network(DQN)which contributes in optimizing the future reward and performing the precise decision recommendation to the agent and system behavior.The WSNs experiment result indicates the system’s accuracy by capturing the real-time environment parameters is 98.39%.Meanwhile,the results of comparative accuracy model experiments of the proposed DQN,individual Q-learning,uniform coverage(UC),and NaÏe Bayes classifier(NBC)are 97.60%,95.30%,96.50%,and 92.30%,respectively.From the results of the comparative experiment,it can be seen that the proposed DQN used in the study has themost optimal accuracy.Testing with 22 test scenarios for“optimal,”“sub-optimal,”and“not-optimal”conditions was carried out to ensure the system runs well in the real-world data.The accuracy percentage which is generated from the real-world data reaches 95.45%.Fromthe resultsof the cost analysis,the systemcanprovide a low-cost systemcomparedtothe conventional system. 展开更多
关键词 Intelligent monitoring system deep reinforcement learning(DRL) wireless sensor networks(WSNs) deep q-network(DQN)
下载PDF
Convolutional Neural Network-Based Deep Q-Network (CNN-DQN) Resource Management in Cloud Radio Access Network 被引量:2
8
作者 Amjad Iqbal Mau-Luen Tham Yoong Choon Chang 《China Communications》 SCIE CSCD 2022年第10期129-142,共14页
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi... The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach. 展开更多
关键词 energy efficiency(EE) markov decision process(MDP) convolutional neural network(CNN) cloud RAN deep q-network(DQN)
下载PDF
Walking Stability Control Method for Biped Robot on Uneven Ground Based on Deep Q-Network
9
作者 Baoling Han Yuting Zhao Qingsheng Luo 《Journal of Beijing Institute of Technology》 EI CAS 2019年第3期598-605,共8页
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ... A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved. 展开更多
关键词 DEEP q-network (DQN) BIPED robot uneven ground WALKING STABILITY gait control
下载PDF
基于深度强化学习的鱼类集群行为建模
10
作者 陈鹏宇 王芳 +4 位作者 刘硕 岳圣智 宋亚男 金兆一 林远山 《广东海洋大学学报》 CAS CSCD 北大核心 2023年第3期1-9,共9页
【目的】使用深度强化学习技术对鱼类集群行为进行建模,探究鱼类集群行为的形成机理。【方法】针对传统基于规则的集群行为建模方法严重依赖人的先验知识而可能无法很好刻画集群行为的问题,提出一种基于Deep Q-Networks(DQN)的鱼类集群... 【目的】使用深度强化学习技术对鱼类集群行为进行建模,探究鱼类集群行为的形成机理。【方法】针对传统基于规则的集群行为建模方法严重依赖人的先验知识而可能无法很好刻画集群行为的问题,提出一种基于Deep Q-Networks(DQN)的鱼类集群行为建模方法,以鱼类个体运动方向与周围邻居平均运动方向夹角表达个体的状态(连续值),以离散化的转角表示其动作,使用神经网络表达鱼类个体的运动策略。在单个学习者多个教师的环境中,以邻居数变化作为立即奖励,使用DQN算法训练神经网络,获得鱼类个体运动策略。【结果】使用本研究方法鱼类个体能学习到教师的运动策略,习得的鱼类个体运动策略在不同场景中均能涌现出集群行为,并且集群行为的特性与真实鱼群行为类似。【结论】本研究方法能够有效地对鱼类集群行为进行建模,有助于分析和理解复杂鱼类种群行为。分析得到的鱼类个体之间的局部交互机理,为理解鱼群形成、鱼类洄游、渔场形成等提供新视角,也可为工厂化高密度养殖提供参考。 展开更多
关键词 鱼群 集群行为建模 深度强化学习 Deep q-networks
下载PDF
动态环境下基于忆阻强化学习的移动机器人路径规划 被引量:2
11
作者 杨海兰 祁永强 +3 位作者 吴保磊 荣丹 洪妙英 王军 《系统仿真学报》 CAS CSCD 北大核心 2023年第7期1619-1633,共15页
为解决动态环境下的移动机器人路径规划问题,提出基于改进蚁群算法和基于忆阻器阵列的DQN(deep q-network)算法的双层路径规划算法。通过改进了概率转移函数和信息素更新原则的蚁群算法完成静态全局路径规划;利用忆阻器“存算一体”的特... 为解决动态环境下的移动机器人路径规划问题,提出基于改进蚁群算法和基于忆阻器阵列的DQN(deep q-network)算法的双层路径规划算法。通过改进了概率转移函数和信息素更新原则的蚁群算法完成静态全局路径规划;利用忆阻器“存算一体”的特性,将其作为神经网络的突触结构,改进了传统DQN算法结构,完成移动机器人的局部动态避障。根据移动机器人感知范围内是否有动态障碍物来切换路径规划机制,完成动态环境下的路径规划任务。仿真结果表明该算法有效可行,能在动态环境中为移动机器人实时规划出可行路径。 展开更多
关键词 动态环境 DQN(deep q-network) 忆阻器 存算一体 路径规划
下载PDF
基于深度强化学习的高频量化交易策略研究 被引量:1
12
作者 文馨贤 《现代电子技术》 2023年第2期125-131,共7页
当前国内金融市场的投资交易已从基于传统技术分析等方法的主观交易逐渐转向基于程序化的量化策略交易。股票市场已有大量量化策略的研究工作,但针对期货市场的量化交易策略的研究还不足,已有策略在日内高频交易中的投资回报和风险控制... 当前国内金融市场的投资交易已从基于传统技术分析等方法的主观交易逐渐转向基于程序化的量化策略交易。股票市场已有大量量化策略的研究工作,但针对期货市场的量化交易策略的研究还不足,已有策略在日内高频交易中的投资回报和风险控制还有待优化。为提升期货高频量化策略的盈利和风控能力,文中设计一种期货交易环境,将1 min时间粒度的高频K线作为环境状态,针对期货交易中持仓状态和交易操作构建相应的动作空间及算法;采用基于LSTM的深度强化学习模型LSTM-Dueling DQN,使其更适用于处理序列输入的状态空间,并显著提升模型的学习速度。对DQN、Double DQN、基于全连接神经网络的Dueling DQN(FF-Dueling DQN)三个基准模型进行实验对比,得到文中构建的交易策略在四个黑色系商品期货交易中累计收益率最高达到43%,年化收益率达到153%,最大回撤控制在10.7%以内。实验结果表明,所提策略在震荡行情和趋势行情中都能实现超出业绩基准的超额收益。 展开更多
关键词 交易策略 深度强化学习 LSTM Deep q-network 高频交易 期货 量化金融
下载PDF
Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments
13
作者 LI Shuyi LI Minzhe JING Zhongliang 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期601-612,共12页
The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factor... The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved. 展开更多
关键词 MULTI-AGENT path planning deep reinforcement learning deep q-network
原文传递
A New Reward System Based on Human Demonstrations for Hard Exploration Games
14
作者 Wadhah Zeyad Tareq Mehmet Fatih Amasyali 《Computers, Materials & Continua》 SCIE EI 2022年第2期2401-2414,共14页
The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challe... The main idea of reinforcement learning is evaluating the chosen action depending on the current reward.According to this concept,many algorithms achieved proper performance on classic Atari 2600 games.The main challenge is when the reward is sparse or missing.Such environments are complex exploration environments likeMontezuma’s Revenge,Pitfall,and Private Eye games.Approaches built to deal with such challenges were very demanding.This work introduced a different reward system that enables the simple classical algorithm to learn fast and achieve high performance in hard exploration environments.Moreover,we added some simple enhancements to several hyperparameters,such as the number of actions and the sampling ratio that helped improve performance.We include the extra reward within the human demonstrations.After that,we used Prioritized Double Deep Q-Networks(Prioritized DDQN)to learning from these demonstrations.Our approach enabled the Prioritized DDQNwith a short learning time to finish the first level of Montezuma’s Revenge game and to perform well in both Pitfall and Private Eye.We used the same games to compare our results with several baselines,such as the Rainbow and Deep Q-learning from demonstrations(DQfD)algorithm.The results showed that the new rewards system enabled Prioritized DDQN to out-perform the baselines in the hard exploration games with short learning time. 展开更多
关键词 Deep reinforcement learning human demonstrations prioritized double deep q-networks atari
下载PDF
现货市场环境下基于深度强化学习的光储联合电站储能系统最优运行方法 被引量:7
15
作者 龚开 王旭 +3 位作者 邓晖 蒋传文 马骏超 房乐 《电网技术》 EI CSCD 北大核心 2022年第9期3365-3375,共11页
光伏-储能联合电站不仅能够有效减少光伏实时出力偏差,也是一种能够提供电能量和调频辅助服务的潜在市场主体。为实现上述3类目标,与光伏出力协同的储能电量调度策略至关重要,然而目前大部分光储联合电站的储能电量调度策略无法同时协... 光伏-储能联合电站不仅能够有效减少光伏实时出力偏差,也是一种能够提供电能量和调频辅助服务的潜在市场主体。为实现上述3类目标,与光伏出力协同的储能电量调度策略至关重要,然而目前大部分光储联合电站的储能电量调度策略无法同时协调降低光伏实时出力偏差和参与电能量与调频辅助服务市场3种决策;另一方面,电力现货市场价格与调频信号不确定性及储能电量调度策略将光储联合电站储能运行优化问题转化为一个随机动态非凸优化问题,现有相关研究大部分利用随机场景法或智能算法处理非凸优化,所获得的储能运行方案存在一定的局限性,且难以根据实时数据动态制定运行方案。因此,提出一种现货市场环境下基于DQN(deep Q-network)的光储联合电站储能系统优化运行方法,该方法克服了非凸优化难题,结合所提储能电量闭环调度策略能够实现光储联合电站在考虑偏差考核成本、电能量收益、调频辅助服务收益下的储能系统小时级动态优化运行,进而最大化光-储联合电站的经济收益。测试算例通过实际市场数据验证了所提方法的可行性和有效性。 展开更多
关键词 储能 deep q-network 不确定性 电力市场 最优运行
下载PDF
基于深度强化学习的单路口信号控制算法 被引量:11
16
作者 郭梦杰 任安虎 《电子测量技术》 2019年第24期49-52,共4页
针对交通拥堵问题,利用深度强化学习与交通信号控制相结合的方法,构造一个单路口的道路模型,将交通信号控制问题转化为一个Agent在离散时间步长上与交叉口交互的强化学习问题,将交叉口的等待时间作为目标函数。利用强化学习的决策能力... 针对交通拥堵问题,利用深度强化学习与交通信号控制相结合的方法,构造一个单路口的道路模型,将交通信号控制问题转化为一个Agent在离散时间步长上与交叉口交互的强化学习问题,将交叉口的等待时间作为目标函数。利用强化学习的决策能力和深度学习的感知能力,使得智能体Agent在观测到环境状态后选择出当前状态下可能的最优控制策略并执行,并根据奖赏函数来更新下一时刻的状态。在仿真软件SUMO上进行仿真实验,与定时控制模式相比,所提出的方法在不同饱和度流量下的车辆等待时间均有不同程度的提升,验证了算法的有效性。 展开更多
关键词 深度学习 交通信号控制 DEEP q-network SUMO
下载PDF
Deep Reinforcement Learning-Based Computation Offloading for 5G Vehicle-Aware Multi-Access Edge Computing Network 被引量:14
17
作者 Ziying Wu Danfeng Yan 《China Communications》 SCIE CSCD 2021年第11期26-41,共16页
Multi-access Edge Computing(MEC)is one of the key technologies of the future 5G network.By deploying edge computing centers at the edge of wireless access network,the computation tasks can be offloaded to edge servers... Multi-access Edge Computing(MEC)is one of the key technologies of the future 5G network.By deploying edge computing centers at the edge of wireless access network,the computation tasks can be offloaded to edge servers rather than the remote cloud server to meet the requirements of 5G low-latency and high-reliability application scenarios.Meanwhile,with the development of IOV(Internet of Vehicles)technology,various delay-sensitive and compute-intensive in-vehicle applications continue to appear.Compared with traditional Internet business,these computation tasks have higher processing priority and lower delay requirements.In this paper,we design a 5G-based vehicle-aware Multi-access Edge Computing network(VAMECN)and propose a joint optimization problem of minimizing total system cost.In view of the problem,a deep reinforcement learningbased joint computation offloading and task migration optimization(JCOTM)algorithm is proposed,considering the influences of multiple factors such as concurrent multiple computation tasks,system computing resources distribution,and network communication bandwidth.And,the mixed integer nonlinear programming problem is described as a Markov Decision Process.Experiments show that our proposed algorithm can effectively reduce task processing delay and equipment energy consumption,optimize computing offloading and resource allocation schemes,and improve system resource utilization,compared with other computing offloading policies. 展开更多
关键词 multi-access edge computing computation offloading 5G vehicle-aware deep reinforcement learning deep q-network
下载PDF
Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm 被引量:4
18
作者 Yong-feng Li Jing-ping Shi +2 位作者 Wei Jiang Wei-guo Zhang Yong-xi Lyu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2022年第9期1697-1714,共18页
To solve the problem of realizing autonomous aerial combat decision-making for unmanned combat aerial vehicles(UCAVs) rapidly and accurately in an uncertain environment, this paper proposes a decision-making method ba... To solve the problem of realizing autonomous aerial combat decision-making for unmanned combat aerial vehicles(UCAVs) rapidly and accurately in an uncertain environment, this paper proposes a decision-making method based on an improved deep reinforcement learning(DRL) algorithm: the multistep double deep Q-network(MS-DDQN) algorithm. First, a six-degree-of-freedom UCAV model based on an aircraft control system is established on a simulation platform, and the situation assessment functions of the UCAV and its target are established by considering their angles, altitudes, environments, missile attack performances, and UCAV performance. By controlling the flight path angle, roll angle, and flight velocity, 27 common basic actions are designed. On this basis, aiming to overcome the defects of traditional DRL in terms of training speed and convergence speed, the improved MS-DDQN method is introduced to incorporate the final return value into the previous steps. Finally, the pre-training learning model is used as the starting point for the second learning model to simulate the UCAV aerial combat decision-making process based on the basic training method, which helps to shorten the training time and improve the learning efficiency. The improved DRL algorithm significantly accelerates the training speed and estimates the target value more accurately during training, and it can be applied to aerial combat decision-making. 展开更多
关键词 Unmanned combat aerial vehicle Aerial combat decision Multi-step double deep q-network Six-degree-of-freedom Aerial combat maneuver library
下载PDF
Improving SINR via Joint Beam and Power Management for GEO and LEO Spectrum-Sharing Satellite Communication Systems 被引量:1
19
作者 Xiaojin Ding Zhuangzhuang Ren +1 位作者 Huanbin Lu Gengxin Zhang 《China Communications》 SCIE CSCD 2022年第7期25-36,共12页
In this paper,we investigate a geosynchronous earth orbit(GEO)and low earth orbit(LEO)coexisting satellite communication system.To decrease the interference imposed on the GEO user caused by LEO satellites,we propose ... In this paper,we investigate a geosynchronous earth orbit(GEO)and low earth orbit(LEO)coexisting satellite communication system.To decrease the interference imposed on the GEO user caused by LEO satellites,we propose a joint beammanagement and power-allocation(JBMPA)scheme to maximize signal-to-interference plus noise ratio(SINR)at the GEO user,whilst maintaining the ongoing wireless links spanning from LEO satellites to their corresponding users.Specifically,we first analyze the overlapping coverage among GEO and LEO satellites,to obtain the LEO-satellite set in which their beams impose interference on the GEO user.Then,considering the traffic of LEO satellites in the obtained set,we design a beam-management method to turn off and switch interference beams of LEO satellites.Finally,we further propose a deep Q-network(DQN)aided power allocation algorithm to allocate the transmit power for the ongoing LEO satellites in the obtained set,whose beams are unable to be managed.Numerical results show that comparing with the traditional fixed beam with power allocation(FBPA)scheme,the proposed JBMPA can achieve a higher SINR and a lower outage probability,whilst guaranteeing the ongoing wireless transmissions of LEO satellites. 展开更多
关键词 beam management deep q-network GEO-LEO coexisting power allocation
下载PDF
A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments
20
作者 LIU Peng XIA Boyuan +2 位作者 YANG Zhiwei LI Jichao TAN Yuejin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第6期1159-1175,共17页
Equipment development planning(EDP)is usually a long-term process often performed in an environment with high uncertainty.The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with ... Equipment development planning(EDP)is usually a long-term process often performed in an environment with high uncertainty.The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with unpredictable situations.To deal with this problem,a multi-stage EDP model based on a deep reinforcement learning(DRL)algorithm is proposed to respond quickly to any environmental changes within a reasonable range.Firstly,the basic problem of multi-stage EDP is described,and a mathematical planning model is constructed.Then,for two kinds of uncertainties(future capabi lity requirements and the amount of investment in each stage),a corresponding DRL framework is designed to define the environment,state,action,and reward function for multi-stage EDP.After that,the dueling deep Q-network(Dueling DQN)algorithm is used to solve the multi-stage EDP to generate an approximately optimal multi-stage equipment development scheme.Finally,a case of ten kinds of equipment in 100 possible environments,which are randomly generated,is used to test the feasibility and effectiveness of the proposed models.The results show that the algorithm can respond instantaneously in any state of the multistage EDP environment and unlike traditional algorithms,the algorithm does not need to re-optimize the problem for any change in the environment.In addition,the algorithm can flexibly adjust at subsequent planning stages in the event of a change to the equipment capability requirements to adapt to the new requirements. 展开更多
关键词 equipment development planning(EDP) MULTI-STAGE reinforcement learning uncertainty dueling deep q-network(Dueling DQN)
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部