Most researches associated with target encircling control are focused on moving along a circular orbit under an ideal environment free from external disturbances.However,elliptical encirclement with a time-varying obs...Most researches associated with target encircling control are focused on moving along a circular orbit under an ideal environment free from external disturbances.However,elliptical encirclement with a time-varying observation radius,may permit a more flexible and high-efficacy enclosing solution,whilst the non-orthogonal property between axial and tangential speed components,non-ignorable environmental perturbations,and strict assignment requirements empower elliptical encircling control to be more challenging,and the relevant investigations are still open.Following this line,an appointed-time elliptical encircling control rule capable of reinforcing circumnavigation performances is developed to enable Unmanned Aerial Vehicles(UAVs)to move along a specified elliptical path within a predetermined reaching time.The remarkable merits of the designed strategy are that the relative distance controlling error can be guaranteed to evolve within specified regions with a designer-specified convergence behavior.Meanwhile,wind perturbations can be online counteracted based on an unknown system dynamics estimator(USDE)with only one regulating parameter and high computational efficiency.Lyapunov tool demonstrates that all involved error variables are ultimately limited,and simulations are implemented to confirm the usability of the suggested control algorithm.展开更多
The new energy vehicle plays a crucial role in green transportation,and the energy management strategy of hybrid power systems is essential for ensuring energy-efficient driving.This paper presents a state-of-the-art ...The new energy vehicle plays a crucial role in green transportation,and the energy management strategy of hybrid power systems is essential for ensuring energy-efficient driving.This paper presents a state-of-the-art survey and review of reinforcement learning-based energy management strategies for hybrid power systems.Additionally,it envisions the outlook for autonomous intelligent hybrid electric vehicles,with reinforcement learning as the foundational technology.First of all,to provide a macro view of historical development,the brief history of deep learning,reinforcement learning,and deep reinforcement learning is presented in the form of a timeline.Then,the comprehensive survey and review are conducted by collecting papers from mainstream academic databases.Enumerating most of the contributions based on three main directions—algorithm innovation,powertrain innovation,and environment innovation—provides an objective review of the research status.Finally,to advance the application of reinforcement learning in autonomous intelligent hybrid electric vehicles,future research plans positioned as“Alpha HEV”are envisioned,integrating Autopilot and energy-saving control.展开更多
In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforce...In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN.展开更多
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading...In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.展开更多
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg...This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.展开更多
The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to th...The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to the complexity and variability of the ocean,accurate environment modeling and flexible path planning algorithms are pivotal challenges.The traditional models mainly utilize mathematical functions,which are not complete and reliable.Most existing path planning algorithms depend on the environment and lack flexibility.To overcome these challenges,we propose a path planning system for underwater intelligent internet vehicles.It applies digital twins and sensor data to map the real ocean environment to a virtual digital space,which provides a comprehensive and reliable environment for path simulation.We design a value-based reinforcement learning path planning algorithm and explore the optimal network structure parameters.The path simulation is controlled by a closed-loop model integrated into the terminal vehicle through edge computing.The integration of state input enriches the learning of neural networks and helps to improve generalization and flexibility.The task-related reward function promotes the rapid convergence of the training.The experimental results prove that our reinforcement learning based path planning algorithm has great flexibility and can effectively adapt to a variety of different ocean conditions.展开更多
In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Se...In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users.展开更多
Currently,cybersecurity threats such as data breaches and phishing have been on the rise due to the many differentattack strategies of cyber attackers,significantly increasing risks to individuals and organizations.Tr...Currently,cybersecurity threats such as data breaches and phishing have been on the rise due to the many differentattack strategies of cyber attackers,significantly increasing risks to individuals and organizations.Traditionalsecurity technologies such as intrusion detection have been developed to respond to these cyber threats.Recently,advanced integrated cybersecurity that incorporates Artificial Intelligence has been the focus.In this paper,wepropose a response strategy using a reinforcement-learning-based cyber-attack-defense simulation tool to addresscontinuously evolving cyber threats.Additionally,we have implemented an effective reinforcement-learning-basedcyber-attack scenario using Cyber Battle Simulation,which is a cyber-attack-defense simulator.This scenarioinvolves important security components such as node value,cost,firewalls,and services.Furthermore,we applieda new vulnerability assessment method based on the Common Vulnerability Scoring System.This approach candesign an optimal attack strategy by considering the importance of attack goals,which helps in developing moreeffective response strategies.These attack strategies are evaluated by comparing their performance using a variety ofReinforcement Learning methods.The experimental results show that RL models demonstrate improved learningperformance with the proposed attack strategy compared to the original strategies.In particular,the success rateof the Advantage Actor-Critic-based attack strategy improved by 5.04 percentage points,reaching 10.17%,whichrepresents an impressive 98.24%increase over the original scenario.Consequently,the proposed method canenhance security and risk management capabilities in cyber environments,improving the efficiency of securitymanagement and significantly contributing to the development of security systems.展开更多
This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the sl...This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the slow and fast characteristics among system states,the interconnected SPS is decomposed into the slow time-scale dynamics and the fast timescale dynamics through singular perturbation theory.For the fast time-scale dynamics with interconnections,we devise a decentralized optimal control strategy by selecting appropriate weight matrices in the cost function.For the slow time-scale dynamics with unknown system parameters,an off-policy RL algorithm with convergence guarantee is given to learn the optimal control strategy in terms of measurement data.By combining the slow and fast controllers,we establish the composite decentralized adaptive optimal output regulator,and rigorously analyze the stability and optimality of the closed-loop system.The proposed decomposition design not only bypasses the numerical stiffness but also alleviates the high-dimensionality.The efficacy of the proposed methodology is validated by a load-frequency control application of a two-area power system.展开更多
Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity....Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.In this study,a Wireless Sensor Networks(“WSNs”)-based intelligent mango plantation monitoring system will be developed that implements deep reinforcement learning(DRL)technology in carrying out prediction tasks based on three classifications:“optimal,”“sub-optimal,”or“not-optimal”conditions based on three parameters including humidity,temperature,and soil moisture.The key idea is how to provide a precise decision-making mechanism in the real-time monitoring system.A value function-based will be employed to perform DRL model called deep Q-network(DQN)which contributes in optimizing the future reward and performing the precise decision recommendation to the agent and system behavior.The WSNs experiment result indicates the system’s accuracy by capturing the real-time environment parameters is 98.39%.Meanwhile,the results of comparative accuracy model experiments of the proposed DQN,individual Q-learning,uniform coverage(UC),and NaÏe Bayes classifier(NBC)are 97.60%,95.30%,96.50%,and 92.30%,respectively.From the results of the comparative experiment,it can be seen that the proposed DQN used in the study has themost optimal accuracy.Testing with 22 test scenarios for“optimal,”“sub-optimal,”and“not-optimal”conditions was carried out to ensure the system runs well in the real-world data.The accuracy percentage which is generated from the real-world data reaches 95.45%.Fromthe resultsof the cost analysis,the systemcanprovide a low-cost systemcomparedtothe conventional system.展开更多
In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied sy...In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy.展开更多
A suitable bearing capacity of foundation is critical for the safety of civil structures.Sometimes foundation reinforcement is necessary and an effective and environmentally friendly method would be the preferred choi...A suitable bearing capacity of foundation is critical for the safety of civil structures.Sometimes foundation reinforcement is necessary and an effective and environmentally friendly method would be the preferred choice.In this study,the potential application of enzyme-induced carbonate precipitation(EICP)was investigated for reinforcing a 0.6 m bedding layer on top of clay to improve the bearing capacity of the foundation underneath an underground cable duct.Laboratory experiments were conducted to determine the optimal operational parameters for the extraction of crude urease liquid and optimal grain size range of sea sands to be used to construct the bedding layer.Field tests were planned based on orthogonal experimental design to study the factors that would significantly affect the biocementation effect on site.The dynamic deformation modulus,calcium carbonate content and longterm ground stress variations were used to evaluate the bio-cementation effect and the long-term performance of the EICP-treated bedding layer.The laboratory test results showed that the optimal duration for the extraction of crude urease liquid is 1 h and the optimal usage of soybean husk powder in urease extraction solution is 100 g/L.The calcium carbonate production rate decreases significantly when the concentration of cementation solution exceeds 0.5 mol/L.The results of site trial showed that the number of EICP treatments has the most significant impact on the effectiveness of EICP treatment and the highest dynamic deformation modulus(Evd)of EICP-treated bedding layer reached 50.55 MPa.The area with better bio-cementation effect was found to take higher ground stress which validates that the EICP treatment could improve the bearing capacity of foundation by reinforcing the bedding layer.The field trial described and the analysis introduced in this paper can provide a practical basis for applying EICP technology to the reinforcement of bedding layer in poor ground conditions.展开更多
Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their du...Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their duplex modes to increase the link spectral efficiency.Then we formulate a joint flexible duplexing and power allocation problem to balance the user fairness and system spectral efficiency.We further transform the problem into a probability optimization to accommodate the shortterm communications.In contrast with the instant performance optimization,the probability optimization belongs to a sequential decision making problem,and thus we reformulate it as a Markov Decision Process(MDP).We utilizes deep reinforcement learning(DRL)algorithm to search the solution from a large state-action space,and propose an asynchronous advantage actor-critic(A3C)-based scheme to reduce the chance of converging to the suboptimal policy.Simulation results demonstrate that the A3C-based scheme is superior to the baseline schemes in term of the complexity,accumulated log spectral efficiency,and stability.展开更多
In the era of an energy revolution,grid decentralization has emerged as a viable solution to meet the increasing global energy demand by incorporating renewables at the distributed level.Microgrids are considered a dr...In the era of an energy revolution,grid decentralization has emerged as a viable solution to meet the increasing global energy demand by incorporating renewables at the distributed level.Microgrids are considered a driving component for accelerating grid decentralization.To optimally utilize the available resources and address potential challenges,there is a need to have an intelligent and reliable energy management system(EMS)for the microgrid.The artificial intelligence field has the potential to address the problems in EMS and can provide resilient,efficient,reliable,and scalable solutions.This paper presents an overview of existing conventional and AI-based techniques for energy management systems in microgrids.We analyze EMS methods for centralized,decentralized,and distributed microgrids separately.Then,we summarize machine learning techniques such as ANNs,federated learning,LSTMs,RNNs,and reinforcement learning for EMS objectives such as economic dispatch,optimal power flow,and scheduling.With the incorporation of AI,microgrids can achieve greater performance efficiency and more reliability for managing a large number of energy resources.However,challenges such as data privacy,security,scalability,explainability,etc.,need to be addressed.To conclude,the authors state the possible future research directions to explore AI-based EMS's potential in real-world applications.展开更多
In this paper,we investigate the downlink orthogonal frequency division multiplexing(OFDM)transmission system assisted by reconfigurable intelligent surfaces(RISs).Considering multiple antennas at the base station(BS)...In this paper,we investigate the downlink orthogonal frequency division multiplexing(OFDM)transmission system assisted by reconfigurable intelligent surfaces(RISs).Considering multiple antennas at the base station(BS)and multiple single-antenna users,the joint optimization of precoder at the BS and the phase shift design at the RIS is studied to minimize the transmit power under the constraint of the certain quality-of-service.A deep reinforcement learning(DRL)based algorithm is proposed,in which maximum ratio transmission(MRT)precoding is utilized at the BS and the twin delayed deep deterministic policy gradient(TD3)method is utilized for RIS phase shift optimization.Numerical results demonstrate that the proposed DRL based algorithm can achieve a transmit power almost the same with the lower bound achieved by manifold optimization(MO)algorithm while has much less computation delay.展开更多
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni...In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.展开更多
Reconfigurable intelligent surface(RIS)for wireless networks have drawn lots of attention in both academic and industry communities.RIS can dynamically control the phases of the reflection elements to send the signal ...Reconfigurable intelligent surface(RIS)for wireless networks have drawn lots of attention in both academic and industry communities.RIS can dynamically control the phases of the reflection elements to send the signal in the desired direction,thus it provides supplementary links for wireless networks.Most of prior works on RIS-aided wireless communication systems consider continuous phase shifts,but phase shifts of RIS are discrete in practical hardware.Thus we focus on the actual discrete phase shifts on RIS in this paper.Using the advanced deep reinforcement learning(DRL),we jointly optimize the transmit beamforming matrix from the discrete Fourier transform(DFT)codebook at the base station(BS)and the discrete phase shifts at the RIS to maximize the received signal-to-interference plus noise ratio(SINR).Unlike the traditional schemes usually using alternate optimization methods to solve the transmit beamforming and phase shifts,the DRL algorithm proposed in the paper can jointly design the transmit beamforming and phase shifts as the output of the DRL neural network.Numerical results indicate that the DRL proposed can dispose the complicated optimization problem with low computational complexity.展开更多
In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov ...In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov decision process(MDP) with unknown transition probability. However, the optimal value function is time-dependent and difficult to obtain because of the periodicity of the electricity price and residential load. Therefore, a series of time-independent action-value functions are proposed to describe every period of a day. To approximate every action-value function, a corresponding critic network is established, which is cascaded with other critic networks according to the time sequence. Then, the continuous management strategy is obtained from the related action network. Moreover, a two-stage learning protocol including offline and online learning stages is provided for detailed implementation in real-time battery management. Numerical experimental examples are given to demonstrate the effectiveness of the developed algorithm.展开更多
An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over...An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches.展开更多
In the fifth generation(5G)wireless system,a closed-loop power control(CLPC)scheme based on deep Q learning network(DQN)is introduced to intelligently adjust the transmit power of the base station(BS),which can improv...In the fifth generation(5G)wireless system,a closed-loop power control(CLPC)scheme based on deep Q learning network(DQN)is introduced to intelligently adjust the transmit power of the base station(BS),which can improve the user equipment(UE)received signal to interference plus noise ratio(SINR)to a target threshold range.However,the selected power control(PC)action in DQN is not accurately matched the fluctuations of the wireless environment.Since the experience replay characteristic of the conventional DQN scheme leads to a possibility of insufficient training in the target deep neural network(DNN).As a result,the Q-value of the sub-optimal PC action exceed the optimal one.To solve this problem,we propose the improved DQN scheme.In the proposed scheme,we add an additional DNN to the conventional DQN,and set a shorter training interval to speed up the training of the DNN in order to fully train it.Finally,the proposed scheme can ensure that the Q value of the optimal action remains maximum.After multiple episodes of training,the proposed scheme can generate more accurate PC actions to match the fluctuations of the wireless environment.As a result,the UE received SINR can achieve the target threshold range faster and keep more stable.The simulation results prove that the proposed scheme outperforms the conventional schemes.展开更多
基金National Natural Science Foundation of China(Grant Nos.61803348,62173312,51922009)Shanxi Province Key Laboratory of Quantum Sensing and Precision Measurement(Grant No.201905D121001).
文摘Most researches associated with target encircling control are focused on moving along a circular orbit under an ideal environment free from external disturbances.However,elliptical encirclement with a time-varying observation radius,may permit a more flexible and high-efficacy enclosing solution,whilst the non-orthogonal property between axial and tangential speed components,non-ignorable environmental perturbations,and strict assignment requirements empower elliptical encircling control to be more challenging,and the relevant investigations are still open.Following this line,an appointed-time elliptical encircling control rule capable of reinforcing circumnavigation performances is developed to enable Unmanned Aerial Vehicles(UAVs)to move along a specified elliptical path within a predetermined reaching time.The remarkable merits of the designed strategy are that the relative distance controlling error can be guaranteed to evolve within specified regions with a designer-specified convergence behavior.Meanwhile,wind perturbations can be online counteracted based on an unknown system dynamics estimator(USDE)with only one regulating parameter and high computational efficiency.Lyapunov tool demonstrates that all involved error variables are ultimately limited,and simulations are implemented to confirm the usability of the suggested control algorithm.
基金Supported by National Natural Science Foundation of China (Grant Nos.52222215,52072051)Fundamental Research Funds for the Central Universities in China (Grant No.2023CDJXY-025)Chongqing Municipal Natural Science Foundation of China (Grant No.CSTB2023NSCQ-JQX0003)。
文摘The new energy vehicle plays a crucial role in green transportation,and the energy management strategy of hybrid power systems is essential for ensuring energy-efficient driving.This paper presents a state-of-the-art survey and review of reinforcement learning-based energy management strategies for hybrid power systems.Additionally,it envisions the outlook for autonomous intelligent hybrid electric vehicles,with reinforcement learning as the foundational technology.First of all,to provide a macro view of historical development,the brief history of deep learning,reinforcement learning,and deep reinforcement learning is presented in the form of a timeline.Then,the comprehensive survey and review are conducted by collecting papers from mainstream academic databases.Enumerating most of the contributions based on three main directions—algorithm innovation,powertrain innovation,and environment innovation—provides an objective review of the research status.Finally,to advance the application of reinforcement learning in autonomous intelligent hybrid electric vehicles,future research plans positioned as“Alpha HEV”are envisioned,integrating Autopilot and energy-saving control.
基金the Liaoning Province Applied Basic Research Program,2023JH2/101600038.
文摘In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN.
基金This project was funded by Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah underGrant No.(IFPIP-1127-611-1443)the authors,therefore,acknowledge with thanks DSR technical and financial support.
文摘In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.
基金supported in part by the National Key R&D Program of China under Grant 2021YFB2011300the National Natural Science Foundation of China under Grant 52075262。
文摘This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.
基金supported by the National Natural Science Foundation of China(No.61871283).
文摘The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to the complexity and variability of the ocean,accurate environment modeling and flexible path planning algorithms are pivotal challenges.The traditional models mainly utilize mathematical functions,which are not complete and reliable.Most existing path planning algorithms depend on the environment and lack flexibility.To overcome these challenges,we propose a path planning system for underwater intelligent internet vehicles.It applies digital twins and sensor data to map the real ocean environment to a virtual digital space,which provides a comprehensive and reliable environment for path simulation.We design a value-based reinforcement learning path planning algorithm and explore the optimal network structure parameters.The path simulation is controlled by a closed-loop model integrated into the terminal vehicle through edge computing.The integration of state input enriches the learning of neural networks and helps to improve generalization and flexibility.The task-related reward function promotes the rapid convergence of the training.The experimental results prove that our reinforcement learning based path planning algorithm has great flexibility and can effectively adapt to a variety of different ocean conditions.
基金supported by the National Natural Science Foundation of China(Grant No.61971057).
文摘In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea Government(MSIT)(No.RS2022-II220961).
文摘Currently,cybersecurity threats such as data breaches and phishing have been on the rise due to the many differentattack strategies of cyber attackers,significantly increasing risks to individuals and organizations.Traditionalsecurity technologies such as intrusion detection have been developed to respond to these cyber threats.Recently,advanced integrated cybersecurity that incorporates Artificial Intelligence has been the focus.In this paper,wepropose a response strategy using a reinforcement-learning-based cyber-attack-defense simulation tool to addresscontinuously evolving cyber threats.Additionally,we have implemented an effective reinforcement-learning-basedcyber-attack scenario using Cyber Battle Simulation,which is a cyber-attack-defense simulator.This scenarioinvolves important security components such as node value,cost,firewalls,and services.Furthermore,we applieda new vulnerability assessment method based on the Common Vulnerability Scoring System.This approach candesign an optimal attack strategy by considering the importance of attack goals,which helps in developing moreeffective response strategies.These attack strategies are evaluated by comparing their performance using a variety ofReinforcement Learning methods.The experimental results show that RL models demonstrate improved learningperformance with the proposed attack strategy compared to the original strategies.In particular,the success rateof the Advantage Actor-Critic-based attack strategy improved by 5.04 percentage points,reaching 10.17%,whichrepresents an impressive 98.24%increase over the original scenario.Consequently,the proposed method canenhance security and risk management capabilities in cyber environments,improving the efficiency of securitymanagement and significantly contributing to the development of security systems.
基金supported by the National Natural Science Foundation of China (62073327,62273350)the Natural Science Foundation of Jiangsu Province (BK20221112)。
文摘This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the slow and fast characteristics among system states,the interconnected SPS is decomposed into the slow time-scale dynamics and the fast timescale dynamics through singular perturbation theory.For the fast time-scale dynamics with interconnections,we devise a decentralized optimal control strategy by selecting appropriate weight matrices in the cost function.For the slow time-scale dynamics with unknown system parameters,an off-policy RL algorithm with convergence guarantee is given to learn the optimal control strategy in terms of measurement data.By combining the slow and fast controllers,we establish the composite decentralized adaptive optimal output regulator,and rigorously analyze the stability and optimality of the closed-loop system.The proposed decomposition design not only bypasses the numerical stiffness but also alleviates the high-dimensionality.The efficacy of the proposed methodology is validated by a load-frequency control application of a two-area power system.
基金supported by the Department of Electrical Engineering at the National Chin-Yi University of Technology。
文摘Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.In this study,a Wireless Sensor Networks(“WSNs”)-based intelligent mango plantation monitoring system will be developed that implements deep reinforcement learning(DRL)technology in carrying out prediction tasks based on three classifications:“optimal,”“sub-optimal,”or“not-optimal”conditions based on three parameters including humidity,temperature,and soil moisture.The key idea is how to provide a precise decision-making mechanism in the real-time monitoring system.A value function-based will be employed to perform DRL model called deep Q-network(DQN)which contributes in optimizing the future reward and performing the precise decision recommendation to the agent and system behavior.The WSNs experiment result indicates the system’s accuracy by capturing the real-time environment parameters is 98.39%.Meanwhile,the results of comparative accuracy model experiments of the proposed DQN,individual Q-learning,uniform coverage(UC),and NaÏe Bayes classifier(NBC)are 97.60%,95.30%,96.50%,and 92.30%,respectively.From the results of the comparative experiment,it can be seen that the proposed DQN used in the study has themost optimal accuracy.Testing with 22 test scenarios for“optimal,”“sub-optimal,”and“not-optimal”conditions was carried out to ensure the system runs well in the real-world data.The accuracy percentage which is generated from the real-world data reaches 95.45%.Fromthe resultsof the cost analysis,the systemcanprovide a low-cost systemcomparedtothe conventional system.
基金supported in part by the National Key R&D Program of China under Grants 2021YFE0206100in part by the National Natural Science Foundation of China under Grant 62073321+2 种基金in part by National Defense Basic Scientific Research Program JCKY2019203C029in part by the Science and Technology Development Fund,Macao SAR under Grants FDCT-22-009-MISE,0060/2021/A2 and 0015/2020/AMJin part by the financial support from the National Defense Basic Scientific Research Project(JCKY2020130C025).
文摘In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy.
基金The authors gratefully acknowledge the financial support of National Natural Science Foundation of China(Grant No.41972276)Natural Science Foundation of Fujian Province(Grant No.2020J06013)“Foal Eagle Program”Youth Top-notch Talent Project of Fujian Province,China(Grant No.00387088).
文摘A suitable bearing capacity of foundation is critical for the safety of civil structures.Sometimes foundation reinforcement is necessary and an effective and environmentally friendly method would be the preferred choice.In this study,the potential application of enzyme-induced carbonate precipitation(EICP)was investigated for reinforcing a 0.6 m bedding layer on top of clay to improve the bearing capacity of the foundation underneath an underground cable duct.Laboratory experiments were conducted to determine the optimal operational parameters for the extraction of crude urease liquid and optimal grain size range of sea sands to be used to construct the bedding layer.Field tests were planned based on orthogonal experimental design to study the factors that would significantly affect the biocementation effect on site.The dynamic deformation modulus,calcium carbonate content and longterm ground stress variations were used to evaluate the bio-cementation effect and the long-term performance of the EICP-treated bedding layer.The laboratory test results showed that the optimal duration for the extraction of crude urease liquid is 1 h and the optimal usage of soybean husk powder in urease extraction solution is 100 g/L.The calcium carbonate production rate decreases significantly when the concentration of cementation solution exceeds 0.5 mol/L.The results of site trial showed that the number of EICP treatments has the most significant impact on the effectiveness of EICP treatment and the highest dynamic deformation modulus(Evd)of EICP-treated bedding layer reached 50.55 MPa.The area with better bio-cementation effect was found to take higher ground stress which validates that the EICP treatment could improve the bearing capacity of foundation by reinforcing the bedding layer.The field trial described and the analysis introduced in this paper can provide a practical basis for applying EICP technology to the reinforcement of bedding layer in poor ground conditions.
基金supported by the National Key R&D Program of China under Grant 2020YFB1807204the BUPT Excellent Ph.D.Students Foundation under Grant CX2022306。
文摘Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their duplex modes to increase the link spectral efficiency.Then we formulate a joint flexible duplexing and power allocation problem to balance the user fairness and system spectral efficiency.We further transform the problem into a probability optimization to accommodate the shortterm communications.In contrast with the instant performance optimization,the probability optimization belongs to a sequential decision making problem,and thus we reformulate it as a Markov Decision Process(MDP).We utilizes deep reinforcement learning(DRL)algorithm to search the solution from a large state-action space,and propose an asynchronous advantage actor-critic(A3C)-based scheme to reduce the chance of converging to the suboptimal policy.Simulation results demonstrate that the A3C-based scheme is superior to the baseline schemes in term of the complexity,accumulated log spectral efficiency,and stability.
文摘In the era of an energy revolution,grid decentralization has emerged as a viable solution to meet the increasing global energy demand by incorporating renewables at the distributed level.Microgrids are considered a driving component for accelerating grid decentralization.To optimally utilize the available resources and address potential challenges,there is a need to have an intelligent and reliable energy management system(EMS)for the microgrid.The artificial intelligence field has the potential to address the problems in EMS and can provide resilient,efficient,reliable,and scalable solutions.This paper presents an overview of existing conventional and AI-based techniques for energy management systems in microgrids.We analyze EMS methods for centralized,decentralized,and distributed microgrids separately.Then,we summarize machine learning techniques such as ANNs,federated learning,LSTMs,RNNs,and reinforcement learning for EMS objectives such as economic dispatch,optimal power flow,and scheduling.With the incorporation of AI,microgrids can achieve greater performance efficiency and more reliability for managing a large number of energy resources.However,challenges such as data privacy,security,scalability,explainability,etc.,need to be addressed.To conclude,the authors state the possible future research directions to explore AI-based EMS's potential in real-world applications.
基金supported in part by the National Natural Science Foundation of China under Grants 62231009,61971126,62261160576 and 61921004the National Natural Foundation of Jiangsu Province under Grant BK20211511in part by the Jiangsu Province Frontier Leading Technology Basic Research Project under Grant BK20212002。
文摘In this paper,we investigate the downlink orthogonal frequency division multiplexing(OFDM)transmission system assisted by reconfigurable intelligent surfaces(RISs).Considering multiple antennas at the base station(BS)and multiple single-antenna users,the joint optimization of precoder at the BS and the phase shift design at the RIS is studied to minimize the transmit power under the constraint of the certain quality-of-service.A deep reinforcement learning(DRL)based algorithm is proposed,in which maximum ratio transmission(MRT)precoding is utilized at the BS and the twin delayed deep deterministic policy gradient(TD3)method is utilized for RIS phase shift optimization.Numerical results demonstrate that the proposed DRL based algorithm can achieve a transmit power almost the same with the lower bound achieved by manifold optimization(MO)algorithm while has much less computation delay.
基金supported in part by the National Key Research and Development Program of China(2021YFB1714700)the National Natural Science Foundation of China(62022061,6192100028)。
文摘In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.
文摘Reconfigurable intelligent surface(RIS)for wireless networks have drawn lots of attention in both academic and industry communities.RIS can dynamically control the phases of the reflection elements to send the signal in the desired direction,thus it provides supplementary links for wireless networks.Most of prior works on RIS-aided wireless communication systems consider continuous phase shifts,but phase shifts of RIS are discrete in practical hardware.Thus we focus on the actual discrete phase shifts on RIS in this paper.Using the advanced deep reinforcement learning(DRL),we jointly optimize the transmit beamforming matrix from the discrete Fourier transform(DFT)codebook at the base station(BS)and the discrete phase shifts at the RIS to maximize the received signal-to-interference plus noise ratio(SINR).Unlike the traditional schemes usually using alternate optimization methods to solve the transmit beamforming and phase shifts,the DRL algorithm proposed in the paper can jointly design the transmit beamforming and phase shifts as the output of the DRL neural network.Numerical results indicate that the DRL proposed can dispose the complicated optimization problem with low computational complexity.
基金supported by the National Key R&D Program of China (2018AAA0101400)the National Natural Science Foundation of China (61921004,62173251,U1713209,62236002)+1 种基金the Fundamental Research Funds for the Central UniversitiesGuangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control。
文摘In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov decision process(MDP) with unknown transition probability. However, the optimal value function is time-dependent and difficult to obtain because of the periodicity of the electricity price and residential load. Therefore, a series of time-independent action-value functions are proposed to describe every period of a day. To approximate every action-value function, a corresponding critic network is established, which is cascaded with other critic networks according to the time sequence. Then, the continuous management strategy is obtained from the related action network. Moreover, a two-stage learning protocol including offline and online learning stages is provided for detailed implementation in real-time battery management. Numerical experimental examples are given to demonstrate the effectiveness of the developed algorithm.
文摘An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches.
文摘In the fifth generation(5G)wireless system,a closed-loop power control(CLPC)scheme based on deep Q learning network(DQN)is introduced to intelligently adjust the transmit power of the base station(BS),which can improve the user equipment(UE)received signal to interference plus noise ratio(SINR)to a target threshold range.However,the selected power control(PC)action in DQN is not accurately matched the fluctuations of the wireless environment.Since the experience replay characteristic of the conventional DQN scheme leads to a possibility of insufficient training in the target deep neural network(DNN).As a result,the Q-value of the sub-optimal PC action exceed the optimal one.To solve this problem,we propose the improved DQN scheme.In the proposed scheme,we add an additional DNN to the conventional DQN,and set a shorter training interval to speed up the training of the DNN in order to fully train it.Finally,the proposed scheme can ensure that the Q value of the optimal action remains maximum.After multiple episodes of training,the proposed scheme can generate more accurate PC actions to match the fluctuations of the wireless environment.As a result,the UE received SINR can achieve the target threshold range faster and keep more stable.The simulation results prove that the proposed scheme outperforms the conventional schemes.