期刊文献+
共找到1,814篇文章
< 1 2 91 >
每页显示 20 50 100
Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications 被引量:2
1
作者 Ding Wang Ning Gao +2 位作者 Derong Liu Jinna Li Frank L.Lewis 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第1期18-36,共19页
Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and ... Reinforcement learning(RL) has roots in dynamic programming and it is called adaptive/approximate dynamic programming(ADP) within the control community. This paper reviews recent developments in ADP along with RL and its applications to various advanced control fields. First, the background of the development of ADP is described, emphasizing the significance of regulation and tracking control problems. Some effective offline and online algorithms for ADP/adaptive critic control are displayed, where the main results towards discrete-time systems and continuous-time systems are surveyed, respectively.Then, the research progress on adaptive critic control based on the event-triggered framework and under uncertain environment is discussed, respectively, where event-based design, robust stabilization, and game design are reviewed. Moreover, the extensions of ADP for addressing control problems under complex environment attract enormous attention. The ADP architecture is revisited under the perspective of data-driven and RL frameworks,showing how they promote ADP formulation significantly.Finally, several typical control applications with respect to RL and ADP are summarized, particularly in the fields of wastewater treatment processes and power systems, followed by some general prospects for future research. Overall, the comprehensive survey on ADP and RL for advanced control applications has d emonstrated its remarkable potential within the artificial intelligence era. In addition, it also plays a vital role in promoting environmental protection and industrial intelligence. 展开更多
关键词 Adaptive dynamic programming(ADP) advanced control complex environment data-driven control event-triggered design intelligent control neural networks nonlinear systems optimal control reinforcement learning(RL)
下载PDF
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
2
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
下载PDF
Stability behavior of the Lanxi ancient flood control levee after reinforcement with upside-down hanging wells and grouting curtain
3
作者 QIN Zipeng TIAN Yan +4 位作者 GAO Siyuan ZHOU Jianfen HE Xiaohui HE Weizhong GAO Jingquan 《Journal of Mountain Science》 SCIE CSCD 2024年第1期84-99,共16页
The stability of the ancient flood control levees is mainly influenced by water level fluctuations, groundwater concentration and rainfalls. This paper takes the Lanxi ancient levee as a research object to study the e... The stability of the ancient flood control levees is mainly influenced by water level fluctuations, groundwater concentration and rainfalls. This paper takes the Lanxi ancient levee as a research object to study the evolution laws of its seepage, displacement and stability before and after reinforcement with the upside-down hanging wells and grouting curtain through numerical simulation methods combined with experiments and observations. The study results indicate that the filled soil is less affected by water level fluctuations and groundwater concentration after reinforcement. A high groundwater level is detrimental to the levee's long-term stability, and the drainage issues need to be fully considered. The deformation of the reinforced levee is effectively controlled since the fill deformation is mainly borne by the upside-down hanging wells. The safety factors of the levee before reinforcement vary significantly with the water level. The minimum value of the safety factors is 0.886 during the water level decreasing period, indicating a very high risk of the instability. While it reached 1.478 after reinforcement, the stability of the ancient levee is improved by a large margin. 展开更多
关键词 Stability analysis Multiple factors Antiseepage reinforcement Upside-down hanging well Grouting curtain Ancient levee
下载PDF
Toward Trustworthy Decision-Making for Autonomous Vehicles:A Robust Reinforcement Learning Approach with Safety Guarantees
4
作者 Xiangkun He Wenhui Huang Chen Lv 《Engineering》 SCIE EI CAS CSCD 2024年第2期77-89,共13页
While autonomous vehicles are vital components of intelligent transportation systems,ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving.Therefore,we present... While autonomous vehicles are vital components of intelligent transportation systems,ensuring the trustworthiness of decision-making remains a substantial challenge in realizing autonomous driving.Therefore,we present a novel robust reinforcement learning approach with safety guarantees to attain trustworthy decision-making for autonomous vehicles.The proposed technique ensures decision trustworthiness in terms of policy robustness and collision safety.Specifically,an adversary model is learned online to simulate the worst-case uncertainty by approximating the optimal adversarial perturbations on the observed states and environmental dynamics.In addition,an adversarial robust actor-critic algorithm is developed to enable the agent to learn robust policies against perturbations in observations and dynamics.Moreover,we devise a safety mask to guarantee the collision safety of the autonomous driving agent during both the training and testing processes using an interpretable knowledge model known as the Responsibility-Sensitive Safety Model.Finally,the proposed approach is evaluated through both simulations and experiments.These results indicate that the autonomous driving agent can make trustworthy decisions and drastically reduce the number of collisions through robust safety policies. 展开更多
关键词 Autonomous vehicle DECISION-MAKING reinforcement learning Adversarial attack Safety guarantee
下载PDF
Self organizing optimization and phase transition in reinforcement learning minority game system
5
作者 Si-Ping Zhang Jia-Qi Dong +3 位作者 Hui-Yu Zhang Yi-Xuan Lü Jue Wang Zi-Gang Huang 《Frontiers of physics》 SCIE CSCD 2024年第4期297-309,共13页
Whether the complex game system composed of a large number of artificial intelligence(AI)agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent ... Whether the complex game system composed of a large number of artificial intelligence(AI)agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance.In this paper,we address this question by combining the typical theoretical model of resource allocation system,the minority game model,with reinforcement learning.Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm.In particular,we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff,the whole system continues to approach the optimal state under certain parameter combinations,herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference.An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning.In order to further understand the dynamic behavior of agent learning,we define and analyze the conversion path of belief mode,and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system.Finally,we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback–Leibler divergence and give the parameter position where the period-two appears. 展开更多
关键词 oscillatory evolution collective behaviors phase transition reinforcement learning minority game
原文传递
Cognitive interference decision method for air defense missile fuze based on reinforcement learning
6
作者 Dingkun Huang Xiaopeng Yan +2 位作者 Jian Dai Xinwei Wang Yangtian Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第2期393-404,共12页
To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-lea... To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference. 展开更多
关键词 Cognitive radio Interference decision Radio fuze reinforcement learning Interference strategy optimization
下载PDF
Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles
7
作者 Xiaoqi Qiu Peng Lai +1 位作者 Changsheng Gao Wuxing Jing 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期457-470,共14页
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u... This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws. 展开更多
关键词 Endoatmospheric interception Missile guidance reinforcement learning Markov decision process Recurrent neural networks
下载PDF
A digital twins enabled underwater intelligent internet vehicle path planning system via reinforcement learning and edge computing
8
作者 Jiachen Yang Meng Xi +2 位作者 Jiabao Wen Yang Li Houbing Herbert Song 《Digital Communications and Networks》 SCIE CSCD 2024年第2期282-291,共10页
The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to th... The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to the complexity and variability of the ocean,accurate environment modeling and flexible path planning algorithms are pivotal challenges.The traditional models mainly utilize mathematical functions,which are not complete and reliable.Most existing path planning algorithms depend on the environment and lack flexibility.To overcome these challenges,we propose a path planning system for underwater intelligent internet vehicles.It applies digital twins and sensor data to map the real ocean environment to a virtual digital space,which provides a comprehensive and reliable environment for path simulation.We design a value-based reinforcement learning path planning algorithm and explore the optimal network structure parameters.The path simulation is controlled by a closed-loop model integrated into the terminal vehicle through edge computing.The integration of state input enriches the learning of neural networks and helps to improve generalization and flexibility.The task-related reward function promotes the rapid convergence of the training.The experimental results prove that our reinforcement learning based path planning algorithm has great flexibility and can effectively adapt to a variety of different ocean conditions. 展开更多
关键词 Digital twins reinforcement learning Edge computing Underwater intelligent internet vehicle Path planning
下载PDF
Combining reinforcement learning with mathematical programming:An approach for optimal design of heat exchanger networks
9
作者 Hui Tan Xiaodong Hong +4 位作者 Zuwei Liao Jingyuan Sun Yao Yang Jingdai Wang Yongrong Yang 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2024年第5期63-71,共9页
Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinea... Heat integration is important for energy-saving in the process industry.It is linked to the persistently challenging task of optimal design of heat exchanger networks(HEN).Due to the inherent highly nonconvex nonlinear and combinatorial nature of the HEN problem,it is not easy to find solutions of high quality for large-scale problems.The reinforcement learning(RL)method,which learns strategies through ongoing exploration and exploitation,reveals advantages in such area.However,due to the complexity of the HEN design problem,the RL method for HEN should be dedicated and designed.A hybrid strategy combining RL with mathematical programming is proposed to take better advantage of both methods.An insightful state representation of the HEN structure as well as a customized reward function is introduced.A Q-learning algorithm is applied to update the HEN structure using theε-greedy strategy.Better results are obtained from three literature cases of different scales. 展开更多
关键词 Heat exchanger network reinforcement learning Mathematical programming Process design
下载PDF
Resource Allocation for Cognitive Network Slicing in PD-SCMA System Based on Two-Way Deep Reinforcement Learning
10
作者 Zhang Zhenyu Zhang Yong +1 位作者 Yuan Siyu Cheng Zhenjie 《China Communications》 SCIE CSCD 2024年第6期53-68,共16页
In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Se... In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users. 展开更多
关键词 cognitive radio deep reinforcement learning network slicing power-domain non-orthogonal multiple access resource allocation
下载PDF
Numerical investigation of geostress influence on the grouting reinforcement effectiveness of tunnel surrounding rock mass in fault fracture zones
11
作者 Xiangyu Xu Zhijun Wu +3 位作者 Lei Weng Zhaofei Chu Quansheng Liu Yuan Zhou 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第1期81-101,共21页
Grouting is a widely used approach to reinforce broken surrounding rock mass during the construction of underground tunnels in fault fracture zones,and its reinforcement effectiveness is highly affected by geostress.I... Grouting is a widely used approach to reinforce broken surrounding rock mass during the construction of underground tunnels in fault fracture zones,and its reinforcement effectiveness is highly affected by geostress.In this study,a numerical manifold method(NMM)based simulator has been developed to examine the impact of geostress conditions on grouting reinforcement during tunnel excavation.To develop this simulator,a detection technique for identifying slurry migration channels and an improved fluid-solid coupling(FeS)framework,which considers the influence of fracture properties and geostress states,is developed and incorporated into a zero-thickness cohesive element(ZE)based NMM(Co-NMM)for simulating tunnel excavation.Additionally,to simulate coagulation of injected slurry,a bonding repair algorithm is further proposed based on the ZE model.To verify the accuracy of the proposed simulator,a series of simulations about slurry migration in single fractures and fracture networks are numerically reproduced,and the results align well with analytical and laboratory test results.Furthermore,these numerical results show that neglecting the influence of geostress condition can lead to a serious over-estimation of slurry migration range and reinforcement effectiveness.After validations,a series of simulations about tunnel grouting reinforcement and tunnel excavation in fault fracture zones with varying fracture densities under different geostress conditions are conducted.Based on these simula-tions,the influence of geostress conditions and the optimization of grouting schemes are discussed. 展开更多
关键词 Numerical manifold method(NMM) Grouting reinforcement Geostress condition Fault fracture zone Tunnel excavation
下载PDF
Preparation and Reinforcement Adaptability of Jute Fiber Reinforced Magnesium Phosphate Cement Based Composite Materials
12
作者 刘芯州 郭远臣 +3 位作者 WANG Rui XIANG Kai WANG Xue YE Qing 《Journal of Wuhan University of Technology(Materials Science)》 SCIE EI CAS CSCD 2024年第4期999-1009,共11页
To improve the brittleness characteristics of magnesium phosphate cement-based materials(MPC)and to promote its promotion and application in the field of structural reinforcement and repair,this study aimed to increas... To improve the brittleness characteristics of magnesium phosphate cement-based materials(MPC)and to promote its promotion and application in the field of structural reinforcement and repair,this study aimed to increase the toughness of MPC by adding jute fiber,explore the effects of different amounts of jute fiber on the working and mechanical properties of MPC,and prepare jute fiber reinforced magnesium phosphate cement-based materials(JFRMPC)to reinforce damaged beams.The improvement effect of beam performance before and after reinforcement was compared,and the strengthening and toughening mechanisms of jute fiber on MPC were explored through microscopic analysis.The experimental results show that,as the content of jute fiber(JF)increases,the fluidity and setting time of MPC decrease continuously;When the content of jute fiber is 0.8%,the compressive strength,flexural strength,and bonding strength of MPC at 28 days reach their maximum values,which are increased by 18.0%,20.5%,and 22.6%compared to those of M0,respectively.The beam strengthened with JFRMPC can withstand greater deformation,with a deflection of 2.3 times that of the unreinforced beam at failure.The strain of the steel bar is greatly reduced,and the initial crack and failure loads of the reinforced beam are increased by 192.1%and 16.1%,respectively,compared to those of the unreinforced beam.The JF added to the MPC matrix dissipates energy through tensile fracture and debonding pull-out,slowing down stress concentration and inhibiting the free development of cracks in the matrix,enabling JFRMPC to exhibit higher strength and better toughness.The JF does not cause the hydration of MPC to generate new compounds but reduces the amount of hydration products generated. 展开更多
关键词 magnesium phosphate cement jute fiber reinforcement of damaged beam flexural behavior
下载PDF
Distributed Graph Database Load Balancing Method Based on Deep Reinforcement Learning
13
作者 Shuming Sha Naiwang Guo +1 位作者 Wang Luo Yong Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第6期5105-5124,共20页
This paper focuses on the scheduling problem of workflow tasks that exhibit interdependencies.Unlike indepen-dent batch tasks,workflows typically consist of multiple subtasks with intrinsic correlations and dependenci... This paper focuses on the scheduling problem of workflow tasks that exhibit interdependencies.Unlike indepen-dent batch tasks,workflows typically consist of multiple subtasks with intrinsic correlations and dependencies.It necessitates the distribution of various computational tasks to appropriate computing node resources in accor-dance with task dependencies to ensure the smooth completion of the entire workflow.Workflow scheduling must consider an array of factors,including task dependencies,availability of computational resources,and the schedulability of tasks.Therefore,this paper delves into the distributed graph database workflow task scheduling problem and proposes a workflow scheduling methodology based on deep reinforcement learning(DRL).The method optimizes the maximum completion time(makespan)and response time of workflow tasks,aiming to enhance the responsiveness of workflow tasks while ensuring the minimization of the makespan.The experimental results indicate that the Q-learning Deep Reinforcement Learning(Q-DRL)algorithm markedly diminishes the makespan and refines the average response time within distributed graph database environments.In quantifying makespan,Q-DRL achieves mean reductions of 12.4%and 11.9%over established First-fit and Random scheduling strategies,respectively.Additionally,Q-DRL surpasses the performance of both DRL-Cloud and Improved Deep Q-learning Network(IDQN)algorithms,with improvements standing at 4.4%and 2.6%,respectively.With reference to average response time,the Q-DRL approach exhibits a significantly enhanced performance in the scheduling of workflow tasks,decreasing the average by 2.27%and 4.71%when compared to IDQN and DRL-Cloud,respectively.The Q-DRL algorithm also demonstrates a notable increase in the efficiency of system resource utilization,reducing the average idle rate by 5.02%and 9.30%in comparison to IDQN and DRL-Cloud,respectively.These findings support the assertion that Q-DRL not only upholds a lower average idle rate but also effectively curtails the average response time,thereby substantially improving processing efficiency and optimizing resource utilization within distributed graph database systems. 展开更多
关键词 reinforcement learning WORKFLOW task scheduling load balancing
下载PDF
Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection
14
作者 Fei Ming Wenyin Gong +1 位作者 Ling Wang Yaochu Jin 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第4期919-931,共13页
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev... Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. 展开更多
关键词 Constrained multi-objective optimization deep Qlearning deep reinforcement learning(DRL) evolutionary algorithms evolutionary operator selection
下载PDF
Efficient Penetration Testing Path Planning Based on Reinforcement Learning with Episodic Memory
15
作者 Ziqiao Zhou Tianyang Zhou +1 位作者 Jinghao Xu Junhu Zhu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第9期2613-2634,共22页
Intelligent penetration testing is of great significance for the improvement of the security of information systems,and the critical issue is the planning of penetration test paths.In view of the difficulty for attack... Intelligent penetration testing is of great significance for the improvement of the security of information systems,and the critical issue is the planning of penetration test paths.In view of the difficulty for attackers to obtain complete network information in realistic network scenarios,Reinforcement Learning(RL)is a promising solution to discover the optimal penetration path under incomplete information about the target network.Existing RL-based methods are challenged by the sizeable discrete action space,which leads to difficulties in the convergence.Moreover,most methods still rely on experts’knowledge.To address these issues,this paper proposes a penetration path planning method based on reinforcement learning with episodic memory.First,the penetration testing problem is formally described in terms of reinforcement learning.To speed up the training process without specific prior knowledge,the proposed algorithm introduces episodic memory to store experienced advantageous strategies for the first time.Furthermore,the method offers an exploration strategy based on episodic memory to guide the agents in learning.The design makes full use of historical experience to achieve the purpose of reducing blind exploration and improving planning efficiency.Ultimately,comparison experiments are carried out with the existing RL-based methods.The results reveal that the proposed method has better convergence performance.The running time is reduced by more than 20%. 展开更多
关键词 Intelligent penetration testing penetration testing path planning reinforcement learning episodic memory exploration strategy
下载PDF
Achieving dynamic privacy measurement and protection based on reinforcement learning for mobile edge crowdsensing of IoT
16
作者 Renwan Bi Mingfeng Zhao +2 位作者 Zuobin Ying Youliang Tian Jinbo Xiong 《Digital Communications and Networks》 SCIE CSCD 2024年第2期380-388,共9页
With the maturity and development of 5G field,Mobile Edge CrowdSensing(MECS),as an intelligent data collection paradigm,provides a broad prospect for various applications in IoT.However,sensing users as data uploaders... With the maturity and development of 5G field,Mobile Edge CrowdSensing(MECS),as an intelligent data collection paradigm,provides a broad prospect for various applications in IoT.However,sensing users as data uploaders lack a balance between data benefits and privacy threats,leading to conservative data uploads and low revenue or excessive uploads and privacy breaches.To solve this problem,a Dynamic Privacy Measurement and Protection(DPMP)framework is proposed based on differential privacy and reinforcement learning.Firstly,a DPM model is designed to quantify the amount of data privacy,and a calculation method for personalized privacy threshold of different users is also designed.Furthermore,a Dynamic Private sensing data Selection(DPS)algorithm is proposed to help sensing users maximize data benefits within their privacy thresholds.Finally,theoretical analysis and ample experiment results show that DPMP framework is effective and efficient to achieve a balance between data benefits and sensing user privacy protection,in particular,the proposed DPMP framework has 63%and 23%higher training efficiency and data benefits,respectively,compared to the Monte Carlo algorithm. 展开更多
关键词 Mobile edge crowdsensing Dynamic privacy measurement Personalized privacy threshold Privacy protection reinforcement learning
下载PDF
Numerical Simulation of Surrounding Rock Deformation and Grouting Reinforcement of Cross-Fault Tunnel under Different Excavation Methods
17
作者 Duan Zhu Zhende Zhu +2 位作者 Cong Zhang LunDai Baotian Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第3期2445-2470,共26页
Tunnel construction is susceptible to accidents such as loosening, deformation, collapse, and water inrush, especiallyunder complex geological conditions like dense fault areas. These accidents can cause instability a... Tunnel construction is susceptible to accidents such as loosening, deformation, collapse, and water inrush, especiallyunder complex geological conditions like dense fault areas. These accidents can cause instability and damageto the tunnel. As a result, it is essential to conduct research on tunnel construction and grouting reinforcementtechnology in fault fracture zones to address these issues and ensure the safety of tunnel excavation projects. Thisstudy utilized the Xianglushan cross-fault tunnel to conduct a comprehensive analysis on the construction, support,and reinforcement of a tunnel crossing a fault fracture zone using the three-dimensional finite element numericalmethod. The study yielded the following research conclusions: The excavation conditions of the cross-fault tunnelarray were analyzed to determine the optimal construction method for excavation while controlling deformationand stress in the surrounding rock. The middle partition method (CD method) was found to be the most suitable.Additionally, the effects of advanced reinforcement grouting on the cross-fault fracture zone tunnel were studied,and the optimal combination of grouting reinforcement range (140°) and grouting thickness (1m) was determined.The stress and deformation data obtained fromon-site monitoring of the surrounding rock was slightly lower thanthe numerical simulation results. However, the change trend of both sets of data was found to be consistent. Theseresearch findings provide technical analysis and data support for the construction and design of cross-fault tunnels. 展开更多
关键词 Cross-fault tunnel finite element analysis excavation methods surrounding rock deformation grouting reinforcement
下载PDF
Reinforcement learning based adaptive control for uncertain mechanical systems with asymptotic tracking
18
作者 Xiang-long Liang Zhi-kai Yao +1 位作者 Yao-wen Ge Jian-yong Yao 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第4期19-28,共10页
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg... This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach. 展开更多
关键词 Adaptive control reinforcement learning Uncertain mechanical systems Asymptotic tracking
下载PDF
Decoding topological XYZ^(2) codes with reinforcement learning based on attention mechanisms
19
作者 陈庆辉 姬宇欣 +2 位作者 王柯涵 马鸿洋 纪乃华 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第6期262-270,共9页
Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co... Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes. 展开更多
关键词 quantum error correction topological quantum stabilizer code reinforcement learning attention mechanism
下载PDF
QoS Routing Optimization Based on Deep Reinforcement Learning in SDN
20
作者 Yu Song Xusheng Qian +2 位作者 Nan Zhang Wei Wang Ao Xiong 《Computers, Materials & Continua》 SCIE EI 2024年第5期3007-3021,共15页
To enhance the efficiency and expediency of issuing e-licenses within the power sector, we must confront thechallenge of managing the surging demand for data traffic. Within this realm, the network imposes stringentQu... To enhance the efficiency and expediency of issuing e-licenses within the power sector, we must confront thechallenge of managing the surging demand for data traffic. Within this realm, the network imposes stringentQuality of Service (QoS) requirements, revealing the inadequacies of traditional routing allocation mechanismsin accommodating such extensive data flows. In response to the imperative of handling a substantial influx of datarequests promptly and alleviating the constraints of existing technologies and network congestion, we present anarchitecture forQoS routing optimizationwith in SoftwareDefinedNetwork (SDN), leveraging deep reinforcementlearning. This innovative approach entails the separation of SDN control and transmission functionalities, centralizingcontrol over data forwardingwhile integrating deep reinforcement learning for informed routing decisions. Byfactoring in considerations such as delay, bandwidth, jitter rate, and packet loss rate, we design a reward function toguide theDeepDeterministic PolicyGradient (DDPG) algorithmin learning the optimal routing strategy to furnishsuperior QoS provision. In our empirical investigations, we juxtapose the performance of Deep ReinforcementLearning (DRL) against that of Shortest Path (SP) algorithms in terms of data packet transmission delay. Theexperimental simulation results show that our proposed algorithm has significant efficacy in reducing networkdelay and improving the overall transmission efficiency, which is superior to the traditional methods. 展开更多
关键词 Deep reinforcement learning SDN route optimization QOS
下载PDF
上一页 1 2 91 下一页 到第
使用帮助 返回顶部