期刊文献+
共找到2,710篇文章
< 1 2 136 >
每页显示 20 50 100
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
1
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
下载PDF
Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection
2
作者 Fei Ming Wenyin Gong +1 位作者 Ling Wang Yaochu Jin 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第4期919-931,共13页
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev... Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. 展开更多
关键词 Constrained multi-objective optimization deep Qlearning deep reinforcement learning(drl) evolutionary algorithms evolutionary operator selection
下载PDF
A deep reinforcement learning approach to gasoline blending real-time optimization under uncertainty
3
作者 Zhiwei Zhu Minglei Yang +3 位作者 Wangli He Renchu He Yunmeng Zhao Feng Qian 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2024年第7期183-192,共10页
The gasoline inline blending process has widely used real-time optimization techniques to achieve optimization objectives,such as minimizing the cost of production.However,the effectiveness of real-time optimization i... The gasoline inline blending process has widely used real-time optimization techniques to achieve optimization objectives,such as minimizing the cost of production.However,the effectiveness of real-time optimization in gasoline blending relies on accurate blending models and is challenged by stochastic disturbances.Thus,we propose a real-time optimization algorithm based on the soft actor-critic(SAC)deep reinforcement learning strategy to optimize gasoline blending without relying on a single blending model and to be robust against disturbances.Our approach constructs the environment using nonlinear blending models and feedstocks with disturbances.The algorithm incorporates the Lagrange multiplier and path constraints in reward design to manage sparse product constraints.Carefully abstracted states facilitate algorithm convergence,and the normalized action vector in each optimization period allows the agent to generalize to some extent across different target production scenarios.Through these well-designed components,the algorithm based on the SAC outperforms real-time optimization methods based on either nonlinear or linear programming.It even demonstrates comparable performance with the time-horizon based real-time optimization method,which requires knowledge of uncertainty models,confirming its capability to handle uncertainty without accurate models.Our simulation illustrates a promising approach to free real-time optimization of the gasoline blending process from uncertainty models that are difficult to acquire in practice. 展开更多
关键词 deep reinforcement learning Gasoline blending Real-time optimization PETROLEUM Computer simulation Neural networks
下载PDF
Automatic depth matching method of well log based on deep reinforcement learning
4
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep Q-network
下载PDF
QoS Routing Optimization Based on Deep Reinforcement Learning in SDN
5
作者 Yu Song Xusheng Qian +2 位作者 Nan Zhang Wei Wang Ao Xiong 《Computers, Materials & Continua》 SCIE EI 2024年第5期3007-3021,共15页
To enhance the efficiency and expediency of issuing e-licenses within the power sector, we must confront thechallenge of managing the surging demand for data traffic. Within this realm, the network imposes stringentQu... To enhance the efficiency and expediency of issuing e-licenses within the power sector, we must confront thechallenge of managing the surging demand for data traffic. Within this realm, the network imposes stringentQuality of Service (QoS) requirements, revealing the inadequacies of traditional routing allocation mechanismsin accommodating such extensive data flows. In response to the imperative of handling a substantial influx of datarequests promptly and alleviating the constraints of existing technologies and network congestion, we present anarchitecture forQoS routing optimizationwith in SoftwareDefinedNetwork (SDN), leveraging deep reinforcementlearning. This innovative approach entails the separation of SDN control and transmission functionalities, centralizingcontrol over data forwardingwhile integrating deep reinforcement learning for informed routing decisions. Byfactoring in considerations such as delay, bandwidth, jitter rate, and packet loss rate, we design a reward function toguide theDeepDeterministic PolicyGradient (DDPG) algorithmin learning the optimal routing strategy to furnishsuperior QoS provision. In our empirical investigations, we juxtapose the performance of Deep ReinforcementLearning (DRL) against that of Shortest Path (SP) algorithms in terms of data packet transmission delay. Theexperimental simulation results show that our proposed algorithm has significant efficacy in reducing networkdelay and improving the overall transmission efficiency, which is superior to the traditional methods. 展开更多
关键词 deep reinforcement learning SDN route optimization QOS
下载PDF
Deep Reinforcement Learning Based Joint Cooperation Clustering and Downlink Power Control for Cell-Free Massive MIMO
6
作者 Du Mingjun Sun Xinghua +2 位作者 Zhang Yue Wang Junyuan Liu Pei 《China Communications》 SCIE CSCD 2024年第11期1-14,共14页
In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinfo... In recent times,various power control and clustering approaches have been proposed to enhance overall performance for cell-free massive multipleinput multiple-output(CF-mMIMO)networks.With the emergence of deep reinforcement learning(DRL),significant progress has been made in the field of network optimization as DRL holds great promise for improving network performance and efficiency.In this work,our focus delves into the intricate challenge of joint cooperation clustering and downlink power control within CF-mMIMO networks.Leveraging the potent deep deterministic policy gradient(DDPG)algorithm,our objective is to maximize the proportional fairness(PF)for user rates,thereby aiming to achieve optimal network performance and resource utilization.Moreover,we harness the concept of“divide and conquer”strategy,introducing two innovative methods termed alternating DDPG(A-DDPG)and hierarchical DDPG(H-DDPG).These approaches aim to decompose the intricate joint optimization problem into more manageable sub-problems,thereby facilitating a more efficient resolution process.Our findings unequivo-cally showcase the superior efficacy of our proposed DDPG approach over the baseline schemes in both clustering and downlink power control.Furthermore,the A-DDPG and H-DDPG obtain higher performance gain than DDPG with lower computational complexity. 展开更多
关键词 cell-free massive MIMO CLUSTERING deep reinforcement learning power control
下载PDF
Deep Reinforcement Learning for Energy-Efficient Edge Caching in Mobile Edge Networks
7
作者 Meng Deng Zhou Huan +3 位作者 Jiang Kai Zheng Hantong Cao Yue Chen Peng 《China Communications》 SCIE CSCD 2024年第11期243-256,共14页
Edge caching has emerged as a promising application paradigm in 5G networks,and by building edge networks to cache content,it can alleviate the traffic load brought about by the rapid growth of Internet of Things(IoT)... Edge caching has emerged as a promising application paradigm in 5G networks,and by building edge networks to cache content,it can alleviate the traffic load brought about by the rapid growth of Internet of Things(IoT)services and applications.Due to the limitations of Edge Servers(ESs)and a large number of user demands,how to make the decision and utilize the resources of ESs are significant.In this paper,we aim to minimize the total system energy consumption in a heterogeneous network and formulate the content caching optimization problem as a Mixed Integer Non-Linear Programming(MINLP).To address the optimization problem,a Deep Q-Network(DQN)-based method is proposed to improve the overall performance of the system and reduce the backhaul traffic load.In addition,the DQN-based method can effectively solve the limitation of traditional reinforcement learning(RL)in complex scenarios.Simulation results show that the proposed DQN-based method can greatly outperform other benchmark methods,and significantly improve the cache hit rate and reduce the total system energy consumption in different scenarios. 展开更多
关键词 deep reinforcement learning edge caching energy consumption markov decision process
下载PDF
Policy Network-Based Dual-Agent Deep Reinforcement Learning for Multi-Resource Task Offloading in Multi-Access Edge Cloud Networks
8
作者 Feng Chuan Zhang Xu +2 位作者 Han Pengchao Ma Tianchun Gong Xiaoxue 《China Communications》 SCIE CSCD 2024年第4期53-73,共21页
The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC n... The Multi-access Edge Cloud(MEC) networks extend cloud computing services and capabilities to the edge of the networks. By bringing computation and storage capabilities closer to end-users and connected devices, MEC networks can support a wide range of applications. MEC networks can also leverage various types of resources, including computation resources, network resources, radio resources,and location-based resources, to provide multidimensional resources for intelligent applications in 5/6G.However, tasks generated by users often consist of multiple subtasks that require different types of resources. It is a challenging problem to offload multiresource task requests to the edge cloud aiming at maximizing benefits due to the heterogeneity of resources provided by devices. To address this issue,we mathematically model the task requests with multiple subtasks. Then, the problem of task offloading of multi-resource task requests is proved to be NP-hard. Furthermore, we propose a novel Dual-Agent Deep Reinforcement Learning algorithm with Node First and Link features(NF_L_DA_DRL) based on the policy network, to optimize the benefits generated by offloading multi-resource task requests in MEC networks. Finally, simulation results show that the proposed algorithm can effectively improve the benefit of task offloading with higher resource utilization compared with baseline algorithms. 展开更多
关键词 benefit maximization deep reinforcement learning multi-access edge cloud task offloading
下载PDF
Deep Reinforcement Learning-Based Task Offloading and Service Migrating Policies in Service Caching-Assisted Mobile Edge Computing
9
作者 Ke Hongchang Wang Hui +1 位作者 Sun Hongbin Halvin Yang 《China Communications》 SCIE CSCD 2024年第4期88-103,共16页
Emerging mobile edge computing(MEC)is considered a feasible solution for offloading the computation-intensive request tasks generated from mobile wireless equipment(MWE)with limited computational resources and energy.... Emerging mobile edge computing(MEC)is considered a feasible solution for offloading the computation-intensive request tasks generated from mobile wireless equipment(MWE)with limited computational resources and energy.Due to the homogeneity of request tasks from one MWE during a longterm time period,it is vital to predeploy the particular service cachings required by the request tasks at the MEC server.In this paper,we model a service caching-assisted MEC framework that takes into account the constraint on the number of service cachings hosted by each edge server and the migration of request tasks from the current edge server to another edge server with service caching required by tasks.Furthermore,we propose a multiagent deep reinforcement learning-based computation offloading and task migrating decision-making scheme(MBOMS)to minimize the long-term average weighted cost.The proposed MBOMS can learn the near-optimal offloading and migrating decision-making policy by centralized training and decentralized execution.Systematic and comprehensive simulation results reveal that our proposed MBOMS can converge well after training and outperforms the other five baseline algorithms. 展开更多
关键词 deep reinforcement learning mobile edge computing service caching service migrating
下载PDF
Resource Allocation for Cognitive Network Slicing in PD-SCMA System Based on Two-Way Deep Reinforcement Learning
10
作者 Zhang Zhenyu Zhang Yong +1 位作者 Yuan Siyu Cheng Zhenjie 《China Communications》 SCIE CSCD 2024年第6期53-68,共16页
In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Se... In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users. 展开更多
关键词 cognitive radio deep reinforcement learning network slicing power-domain non-orthogonal multiple access resource allocation
下载PDF
Deep reinforcement learning using least-squares truncated temporal-difference
11
作者 Junkai Ren Yixing Lan +3 位作者 Xin Xu Yichuan Zhang Qiang Fang Yujun Zeng 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第2期425-439,共15页
Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in curre... Policy evaluation(PE)is a critical sub-problem in reinforcement learning,which estimates the value function for a given policy and can be used for policy improvement.However,there still exist some limitations in current PE methods,such as low sample efficiency and local convergence,especially on complex tasks.In this study,a novel PE algorithm called Least-Squares Truncated Temporal-Difference learning(LST2D)is proposed.In LST2D,an adaptive truncation mechanism is designed,which effectively takes advantage of the fast convergence property of Least-Squares Temporal Difference learning and the asymptotic convergence property of Temporal Difference learning(TD).Then,two feature pre-training methods are utilised to improve the approximation ability of LST2D.Furthermore,an Actor-Critic algorithm based on LST2D and pre-trained feature representations(ACLPF)is proposed,where LST2D is integrated into the critic network to improve learning-prediction efficiency.Comprehensive simulation studies were conducted on four robotic tasks,and the corresponding results illustrate the effectiveness of LST2D.The proposed ACLPF algorithm outperformed DQN,ACER and PPO in terms of sample efficiency and stability,which demonstrated that LST2D can be applied to online learning control problems by incorporating it into the actor-critic architecture. 展开更多
关键词 deep reinforcement learning policy evaluation temporal difference value function approximation
下载PDF
Network Defense Decision-Making Based on Deep Reinforcement Learning and Dynamic Game Theory
12
作者 Huang Wanwei Yuan Bo +2 位作者 Wang Sunan Ding Yi Li Yuhua 《China Communications》 SCIE CSCD 2024年第9期262-275,共14页
Existing researches on cyber attackdefense analysis have typically adopted stochastic game theory to model the problem for solutions,but the assumption of complete rationality is used in modeling,ignoring the informat... Existing researches on cyber attackdefense analysis have typically adopted stochastic game theory to model the problem for solutions,but the assumption of complete rationality is used in modeling,ignoring the information opacity in practical attack and defense scenarios,and the model and method lack accuracy.To such problem,we investigate network defense policy methods under finite rationality constraints and propose network defense policy selection algorithm based on deep reinforcement learning.Based on graph theoretical methods,we transform the decision-making problem into a path optimization problem,and use a compression method based on service node to map the network state.On this basis,we improve the A3C algorithm and design the DefenseA3C defense policy selection algorithm with online learning capability.The experimental results show that the model and method proposed in this paper can stably converge to a better network state after training,which is faster and more stable than the original A3C algorithm.Compared with the existing typical approaches,Defense-A3C is verified its advancement. 展开更多
关键词 A3C cyber attack-defense analysis deep reinforcement learning stochastic game theory
下载PDF
Enhancing Image Description Generation through Deep Reinforcement Learning:Fusing Multiple Visual Features and Reward Mechanisms
13
作者 Yan Li Qiyuan Wang Kaidi Jia 《Computers, Materials & Continua》 SCIE EI 2024年第2期2469-2489,共21页
Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually imp... Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually impaired.This study presents an innovative approach employing deep reinforcement learning to enhance the accuracy of natural language descriptions of images.Our method focuses on refining the reward function in deep reinforcement learning,facilitating the generation of precise descriptions by aligning visual and textual features more closely.Our approach comprises three key architectures.Firstly,it utilizes Residual Network 101(ResNet-101)and Faster Region-based Convolutional Neural Network(Faster R-CNN)to extract average and local image features,respectively,followed by the implementation of a dual attention mechanism for intricate feature fusion.Secondly,the Transformer model is engaged to derive contextual semantic features from textual data.Finally,the generation of descriptive text is executed through a two-layer long short-term memory network(LSTM),directed by the value and reward functions.Compared with the image description method that relies on deep learning,the score of Bilingual Evaluation Understudy(BLEU-1)is 0.762,which is 1.6%higher,and the score of BLEU-4 is 0.299.Consensus-based Image Description Evaluation(CIDEr)scored 0.998,Recall-Oriented Understudy for Gisting Evaluation(ROUGE)scored 0.552,the latter improved by 0.36%.These results not only attest to the viability of our approach but also highlight its superiority in the realm of image description.Future research can explore the integration of our method with other artificial intelligence(AI)domains,such as emotional AI,to create more nuanced and context-aware systems. 展开更多
关键词 Image description deep reinforcement learning attention mechanism
下载PDF
Energy-Efficient Traffic Offloading for RSMA-Based Hybrid Satellite Terrestrial Networks with Deep Reinforcement Learning
14
作者 Qingmiao Zhang Lidong Zhu +1 位作者 Yanyan Chen Shan Jiang 《China Communications》 SCIE CSCD 2024年第2期49-58,共10页
As the demands of massive connections and vast coverage rapidly grow in the next wireless communication networks, rate splitting multiple access(RSMA) is considered to be the new promising access scheme since it can p... As the demands of massive connections and vast coverage rapidly grow in the next wireless communication networks, rate splitting multiple access(RSMA) is considered to be the new promising access scheme since it can provide higher efficiency with limited spectrum resources. In this paper, combining spectrum splitting with rate splitting, we propose to allocate resources with traffic offloading in hybrid satellite terrestrial networks. A novel deep reinforcement learning method is adopted to solve this challenging non-convex problem. However, the neverending learning process could prohibit its practical implementation. Therefore, we introduce the switch mechanism to avoid unnecessary learning. Additionally, the QoS constraint in the scheme can rule out unsuccessful transmission. The simulation results validates the energy efficiency performance and the convergence speed of the proposed algorithm. 展开更多
关键词 deep reinforcement learning energy efficiency hybrid satellite terrestrial networks rate splitting multiple access traffic offloading
下载PDF
UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience
15
作者 ZHAN Guang ZHANG Kun +1 位作者 LI Ke PIAO Haiyin 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo... Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 展开更多
关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience
下载PDF
Dynamic Economic Scheduling with Self-Adaptive Uncertainty in Distribution Network Based on Deep Reinforcement Learning
16
作者 Guanfu Wang Yudie Sun +5 位作者 Jinling Li Yu Jiang Chunhui Li Huanan Yu He Wang Shiqiang Li 《Energy Engineering》 EI 2024年第6期1671-1695,共25页
Traditional optimal scheduling methods are limited to accurate physical models and parameter settings, which aredifficult to adapt to the uncertainty of source and load, and there are problems such as the inability to... Traditional optimal scheduling methods are limited to accurate physical models and parameter settings, which aredifficult to adapt to the uncertainty of source and load, and there are problems such as the inability to make dynamicdecisions continuously. This paper proposed a dynamic economic scheduling method for distribution networksbased on deep reinforcement learning. Firstly, the economic scheduling model of the new energy distributionnetwork is established considering the action characteristics of micro-gas turbines, and the dynamic schedulingmodel based on deep reinforcement learning is constructed for the new energy distribution network system with ahigh proportion of new energy, and the Markov decision process of the model is defined. Secondly, Second, for thechanging characteristics of source-load uncertainty, agents are trained interactively with the distributed networkin a data-driven manner. Then, through the proximal policy optimization algorithm, agents adaptively learn thescheduling strategy and realize the dynamic scheduling decision of the new energy distribution network system.Finally, the feasibility and superiority of the proposed method are verified by an improved IEEE 33-node simulationsystem. 展开更多
关键词 SELF-ADAPTIVE the uncertainty of sources and load deep reinforcement learning dynamic economic scheduling
下载PDF
A Deep Reinforcement Learning-Based Technique for Optimal Power Allocation in Multiple Access Communications
17
作者 Sepehr Soltani Ehsan Ghafourian +2 位作者 Reza Salehi Diego Martín Milad Vahidi 《Intelligent Automation & Soft Computing》 2024年第1期93-108,共16页
Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning method... Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning methods have become quite popular in analyzing wireless communication systems,which among them deep reinforcement learning(DRL)has a significant role in solving optimization issues under certain constraints.To this purpose,in this paper,we investigate the PA problem in a k-user multiple access channels(MAC),where k transmitters(e.g.,mobile users)aim to send an independent message to a common receiver(e.g.,base station)through wireless channels.To this end,we first train the deep Q network(DQN)with a deep Q learning(DQL)algorithm over the simulation environment,utilizing offline learning.Then,the DQN will be used with the real data in the online training method for the PA issue by maximizing the sumrate subjected to the source power.Finally,the simulation results indicate that our proposedDQNmethod provides better performance in terms of the sumrate compared with the available DQL training approaches such as fractional programming(FP)and weighted minimum mean squared error(WMMSE).Additionally,by considering different user densities,we show that our proposed DQN outperforms benchmark algorithms,thereby,a good generalization ability is verified over wireless multi-user communication systems. 展开更多
关键词 deep reinforcement learning deep Q learning multiple access channel power allocation
下载PDF
Deep Reinforcement Learning Solves Job-shop Scheduling Problems
18
作者 Anjiang Cai Yangfan Yu Manman Zhao 《Instrumentation》 2024年第1期88-100,共13页
To solve the sparse reward problem of job-shop scheduling by deep reinforcement learning,a deep reinforcement learning framework considering sparse reward problem is proposed.The job shop scheduling problem is transfo... To solve the sparse reward problem of job-shop scheduling by deep reinforcement learning,a deep reinforcement learning framework considering sparse reward problem is proposed.The job shop scheduling problem is transformed into Markov decision process,and six state features are designed to improve the state feature representation by using two-way scheduling method,including four state features that distinguish the optimal action and two state features that are related to the learning goal.An extended variant of graph isomorphic network GIN++is used to encode disjunction graphs to improve the performance and generalization ability of the model.Through iterative greedy algorithm,random strategy is generated as the initial strategy,and the action with the maximum information gain is selected to expand it to optimize the exploration ability of Actor-Critic algorithm.Through validation of the trained policy model on multiple public test data sets and comparison with other advanced DRL methods and scheduling rules,the proposed method reduces the minimum average gap by 3.49%,5.31%and 4.16%,respectively,compared with the priority rule-based method,and 5.34%compared with the learning-based method.11.97%and 5.02%,effectively improving the accuracy of DRL to solve the approximate solution of JSSP minimum completion time. 展开更多
关键词 job shop scheduling problems deep reinforcement learning state characteristics policy network
下载PDF
Joint Flexible Duplexing and Power Allocation with Deep Reinforcement Learning in Cell-Free Massive MIMO System 被引量:5
19
作者 Danhao Deng Chaowei Wang +2 位作者 Zhi Zhang Lihua Li Weidong Wang 《China Communications》 SCIE CSCD 2023年第4期73-85,共13页
Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their du... Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their duplex modes to increase the link spectral efficiency.Then we formulate a joint flexible duplexing and power allocation problem to balance the user fairness and system spectral efficiency.We further transform the problem into a probability optimization to accommodate the shortterm communications.In contrast with the instant performance optimization,the probability optimization belongs to a sequential decision making problem,and thus we reformulate it as a Markov Decision Process(MDP).We utilizes deep reinforcement learning(DRL)algorithm to search the solution from a large state-action space,and propose an asynchronous advantage actor-critic(A3C)-based scheme to reduce the chance of converging to the suboptimal policy.Simulation results demonstrate that the A3C-based scheme is superior to the baseline schemes in term of the complexity,accumulated log spectral efficiency,and stability. 展开更多
关键词 cell-free massive MIMO flexible duplexing sum fair spectral efficiency deep reinforcement learning asynchronous advantage actor-critic
下载PDF
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:3
20
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents deep reinforcement learning Proximal policy optimization(PPO)
下载PDF
上一页 1 2 136 下一页 到第
使用帮助 返回顶部