期刊文献+
共找到2,294篇文章
< 1 2 115 >
每页显示 20 50 100
Elliptical encirclement control capable of reinforcing performances for UAVs around a dynamic target
1
作者 Fei Zhang Xingling Shao +1 位作者 Yi Xia Wendong Zhang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第2期104-119,共16页
Most researches associated with target encircling control are focused on moving along a circular orbit under an ideal environment free from external disturbances.However,elliptical encirclement with a time-varying obs... Most researches associated with target encircling control are focused on moving along a circular orbit under an ideal environment free from external disturbances.However,elliptical encirclement with a time-varying observation radius,may permit a more flexible and high-efficacy enclosing solution,whilst the non-orthogonal property between axial and tangential speed components,non-ignorable environmental perturbations,and strict assignment requirements empower elliptical encircling control to be more challenging,and the relevant investigations are still open.Following this line,an appointed-time elliptical encircling control rule capable of reinforcing circumnavigation performances is developed to enable Unmanned Aerial Vehicles(UAVs)to move along a specified elliptical path within a predetermined reaching time.The remarkable merits of the designed strategy are that the relative distance controlling error can be guaranteed to evolve within specified regions with a designer-specified convergence behavior.Meanwhile,wind perturbations can be online counteracted based on an unknown system dynamics estimator(USDE)with only one regulating parameter and high computational efficiency.Lyapunov tool demonstrates that all involved error variables are ultimately limited,and simulations are implemented to confirm the usability of the suggested control algorithm. 展开更多
关键词 Elliptical encirclement Reinforced performances Wind perturbations UAVS
下载PDF
Reinforcement Learning-Based Energy Management for Hybrid Power Systems:State-of-the-Art Survey,Review,and Perspectives
2
作者 Xiaolin Tang Jiaxin Chen +4 位作者 Yechen Qin Teng Liu Kai Yang Amir Khajepour Shen Li 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2024年第3期1-25,共25页
The new energy vehicle plays a crucial role in green transportation,and the energy management strategy of hybrid power systems is essential for ensuring energy-efficient driving.This paper presents a state-of-the-art ... The new energy vehicle plays a crucial role in green transportation,and the energy management strategy of hybrid power systems is essential for ensuring energy-efficient driving.This paper presents a state-of-the-art survey and review of reinforcement learning-based energy management strategies for hybrid power systems.Additionally,it envisions the outlook for autonomous intelligent hybrid electric vehicles,with reinforcement learning as the foundational technology.First of all,to provide a macro view of historical development,the brief history of deep learning,reinforcement learning,and deep reinforcement learning is presented in the form of a timeline.Then,the comprehensive survey and review are conducted by collecting papers from mainstream academic databases.Enumerating most of the contributions based on three main directions—algorithm innovation,powertrain innovation,and environment innovation—provides an objective review of the research status.Finally,to advance the application of reinforcement learning in autonomous intelligent hybrid electric vehicles,future research plans positioned as“Alpha HEV”are envisioned,integrating Autopilot and energy-saving control. 展开更多
关键词 New energy vehicle Hybrid power system Reinforcement learning Energy management strategy
下载PDF
Double DQN Method For Botnet Traffic Detection System
3
作者 Yutao Hu Yuntao Zhao +1 位作者 Yongxin Feng Xiangyu Ma 《Computers, Materials & Continua》 SCIE EI 2024年第4期509-530,共22页
In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforce... In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deepQnetwork(DQN)algorithminDeep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN(DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detectionmodel has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of theDDQNmodel are still higher than DQN. 展开更多
关键词 DQN DDQN deep reinforcement learning botnet detection feature classification
下载PDF
Unleashing the Power of Multi-Agent Reinforcement Learning for Algorithmic Trading in the Digital Financial Frontier and Enterprise Information Systems
4
作者 Saket Sarin Sunil K.Singh +4 位作者 Sudhakar Kumar Shivam Goyal Brij Bhooshan Gupta Wadee Alhalabi Varsha Arya 《Computers, Materials & Continua》 SCIE EI 2024年第8期3123-3138,共16页
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading... In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess. 展开更多
关键词 Neurodynamic Fintech multi-agent reinforcement learning algorithmic trading digital financial frontier
下载PDF
Reinforcement learning based adaptive control for uncertain mechanical systems with asymptotic tracking
5
作者 Xiang-long Liang Zhi-kai Yao +1 位作者 Yao-wen Ge Jian-yong Yao 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第4期19-28,共10页
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg... This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach. 展开更多
关键词 Adaptive control Reinforcement learning Uncertain mechanical systems Asymptotic tracking
下载PDF
A digital twins enabled underwater intelligent internet vehicle path planning system via reinforcement learning and edge computing
6
作者 Jiachen Yang Meng Xi +2 位作者 Jiabao Wen Yang Li Houbing Herbert Song 《Digital Communications and Networks》 SCIE CSCD 2024年第2期282-291,共10页
The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to th... The Autonomous Underwater Glider(AUG)is a kind of prevailing underwater intelligent internet vehicle and occupies a dominant position in industrial applications,in which path planning is an essential problem.Due to the complexity and variability of the ocean,accurate environment modeling and flexible path planning algorithms are pivotal challenges.The traditional models mainly utilize mathematical functions,which are not complete and reliable.Most existing path planning algorithms depend on the environment and lack flexibility.To overcome these challenges,we propose a path planning system for underwater intelligent internet vehicles.It applies digital twins and sensor data to map the real ocean environment to a virtual digital space,which provides a comprehensive and reliable environment for path simulation.We design a value-based reinforcement learning path planning algorithm and explore the optimal network structure parameters.The path simulation is controlled by a closed-loop model integrated into the terminal vehicle through edge computing.The integration of state input enriches the learning of neural networks and helps to improve generalization and flexibility.The task-related reward function promotes the rapid convergence of the training.The experimental results prove that our reinforcement learning based path planning algorithm has great flexibility and can effectively adapt to a variety of different ocean conditions. 展开更多
关键词 Digital twins Reinforcement learning Edge computing Underwater intelligent internet vehicle Path planning
下载PDF
Resource Allocation for Cognitive Network Slicing in PD-SCMA System Based on Two-Way Deep Reinforcement Learning
7
作者 Zhang Zhenyu Zhang Yong +1 位作者 Yuan Siyu Cheng Zhenjie 《China Communications》 SCIE CSCD 2024年第6期53-68,共16页
In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Se... In this paper,we propose the Two-way Deep Reinforcement Learning(DRL)-Based resource allocation algorithm,which solves the problem of resource allocation in the cognitive downlink network based on the underlay mode.Secondary users(SUs)in the cognitive network are multiplexed by a new Power Domain Sparse Code Multiple Access(PD-SCMA)scheme,and the physical resources of the cognitive base station are virtualized into two types of slices:enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice.We design the Double Deep Q Network(DDQN)network output the optimal codebook assignment scheme and simultaneously use the Deep Deterministic Policy Gradient(DDPG)network output the optimal power allocation scheme.The objective is to jointly optimize the spectral efficiency of the system and the Quality of Service(QoS)of SUs.Simulation results show that the proposed algorithm outperforms the CNDDQN algorithm and modified JEERA algorithm in terms of spectral efficiency and QoS satisfaction.Additionally,compared with the Power Domain Non-orthogonal Multiple Access(PD-NOMA)slices and the Sparse Code Multiple Access(SCMA)slices,the PD-SCMA slices can dramatically enhance spectral efficiency and increase the number of accessible users. 展开更多
关键词 cognitive radio deep reinforcement learning network slicing power-domain non-orthogonal multiple access resource allocation
下载PDF
Optimal Cyber Attack Strategy Using Reinforcement Learning Based onCommon Vulnerability Scoring System
8
作者 Bum-Sok Kim Hye-Won Suk +2 位作者 Yong-Hoon Choi Dae-Sung Moon Min-Suk Kim 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第11期1551-1574,共24页
Currently,cybersecurity threats such as data breaches and phishing have been on the rise due to the many differentattack strategies of cyber attackers,significantly increasing risks to individuals and organizations.Tr... Currently,cybersecurity threats such as data breaches and phishing have been on the rise due to the many differentattack strategies of cyber attackers,significantly increasing risks to individuals and organizations.Traditionalsecurity technologies such as intrusion detection have been developed to respond to these cyber threats.Recently,advanced integrated cybersecurity that incorporates Artificial Intelligence has been the focus.In this paper,wepropose a response strategy using a reinforcement-learning-based cyber-attack-defense simulation tool to addresscontinuously evolving cyber threats.Additionally,we have implemented an effective reinforcement-learning-basedcyber-attack scenario using Cyber Battle Simulation,which is a cyber-attack-defense simulator.This scenarioinvolves important security components such as node value,cost,firewalls,and services.Furthermore,we applieda new vulnerability assessment method based on the Common Vulnerability Scoring System.This approach candesign an optimal attack strategy by considering the importance of attack goals,which helps in developing moreeffective response strategies.These attack strategies are evaluated by comparing their performance using a variety ofReinforcement Learning methods.The experimental results show that RL models demonstrate improved learningperformance with the proposed attack strategy compared to the original strategies.In particular,the success rateof the Advantage Actor-Critic-based attack strategy improved by 5.04 percentage points,reaching 10.17%,whichrepresents an impressive 98.24%increase over the original scenario.Consequently,the proposed method canenhance security and risk management capabilities in cyber environments,improving the efficiency of securitymanagement and significantly contributing to the development of security systems. 展开更多
关键词 Reinforcement learning common vulnerability scoring system cyber attack cyber battle simulation
下载PDF
Adaptive Optimal Output Regulation of Interconnected Singularly Perturbed Systems With Application to Power Systems
9
作者 Jianguo Zhao Chunyu Yang +2 位作者 Weinan Gao Linna Zhou Xiaomin Liu 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第3期595-607,共13页
This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the sl... This article studies the adaptive optimal output regulation problem for a class of interconnected singularly perturbed systems(SPSs) with unknown dynamics based on reinforcement learning(RL).Taking into account the slow and fast characteristics among system states,the interconnected SPS is decomposed into the slow time-scale dynamics and the fast timescale dynamics through singular perturbation theory.For the fast time-scale dynamics with interconnections,we devise a decentralized optimal control strategy by selecting appropriate weight matrices in the cost function.For the slow time-scale dynamics with unknown system parameters,an off-policy RL algorithm with convergence guarantee is given to learn the optimal control strategy in terms of measurement data.By combining the slow and fast controllers,we establish the composite decentralized adaptive optimal output regulator,and rigorously analyze the stability and optimality of the closed-loop system.The proposed decomposition design not only bypasses the numerical stiffness but also alleviates the high-dimensionality.The efficacy of the proposed methodology is validated by a load-frequency control application of a two-area power system. 展开更多
关键词 Adaptive optimal control decentralized control output regulation reinforcement learning(RL) singularly perturbed systems(SPSs)
下载PDF
Value Function Mechanism in WSNs-Based Mango Plantation Monitoring System
10
作者 Wen-Tsai Sung Indra Griha Tofik Isa Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2024年第9期3733-3759,共27页
Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.... Mango fruit is one of the main fruit commodities that contributes to Taiwan’s income.The implementation of technology is an alternative to increasing the quality and quantity of mango plantation product productivity.In this study,a Wireless Sensor Networks(“WSNs”)-based intelligent mango plantation monitoring system will be developed that implements deep reinforcement learning(DRL)technology in carrying out prediction tasks based on three classifications:“optimal,”“sub-optimal,”or“not-optimal”conditions based on three parameters including humidity,temperature,and soil moisture.The key idea is how to provide a precise decision-making mechanism in the real-time monitoring system.A value function-based will be employed to perform DRL model called deep Q-network(DQN)which contributes in optimizing the future reward and performing the precise decision recommendation to the agent and system behavior.The WSNs experiment result indicates the system’s accuracy by capturing the real-time environment parameters is 98.39%.Meanwhile,the results of comparative accuracy model experiments of the proposed DQN,individual Q-learning,uniform coverage(UC),and NaÏe Bayes classifier(NBC)are 97.60%,95.30%,96.50%,and 92.30%,respectively.From the results of the comparative experiment,it can be seen that the proposed DQN used in the study has themost optimal accuracy.Testing with 22 test scenarios for“optimal,”“sub-optimal,”and“not-optimal”conditions was carried out to ensure the system runs well in the real-world data.The accuracy percentage which is generated from the real-world data reaches 95.45%.Fromthe resultsof the cost analysis,the systemcanprovide a low-cost systemcomparedtothe conventional system. 展开更多
关键词 Intelligent monitoring system deep reinforcement learning(DRL) wireless sensor networks(WSNs) deep Q-network(DQN)
下载PDF
A new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning
11
作者 Wendi Chen Qinglai Wei 《Journal of Automation and Intelligence》 2024年第1期34-39,共6页
In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied sy... In this paper,a new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning is presented in this paper.The existence of nonlinear terms in the studied system makes it very difficult to design the optimal controller using traditional methods.To achieve optimal control,RL algorithm based on critic–actor architecture is considered for the nonlinear system.Due to the significant security risks of network transmission,the system is vulnerable to deception attacks,which can make all the system state unavailable.By using the attacked states to design coordinate transformation,the harm brought by unknown deception attacks has been overcome.The presented control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded.Finally,the simulation experiment is shown to prove the effectiveness of the strategy. 展开更多
关键词 Nonlinear systems Reinforcement learning Optimal control Backstepping method
下载PDF
Field implementation of enzyme-induced carbonate precipitation technology for reinforcing a bedding layer beneath an underground cable duct 被引量:7
12
作者 Kai Xu Ming Huang +2 位作者 Jiajie Zhen Chaoshui Xu Mingjuan Cui 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2023年第4期1011-1022,共12页
A suitable bearing capacity of foundation is critical for the safety of civil structures.Sometimes foundation reinforcement is necessary and an effective and environmentally friendly method would be the preferred choi... A suitable bearing capacity of foundation is critical for the safety of civil structures.Sometimes foundation reinforcement is necessary and an effective and environmentally friendly method would be the preferred choice.In this study,the potential application of enzyme-induced carbonate precipitation(EICP)was investigated for reinforcing a 0.6 m bedding layer on top of clay to improve the bearing capacity of the foundation underneath an underground cable duct.Laboratory experiments were conducted to determine the optimal operational parameters for the extraction of crude urease liquid and optimal grain size range of sea sands to be used to construct the bedding layer.Field tests were planned based on orthogonal experimental design to study the factors that would significantly affect the biocementation effect on site.The dynamic deformation modulus,calcium carbonate content and longterm ground stress variations were used to evaluate the bio-cementation effect and the long-term performance of the EICP-treated bedding layer.The laboratory test results showed that the optimal duration for the extraction of crude urease liquid is 1 h and the optimal usage of soybean husk powder in urease extraction solution is 100 g/L.The calcium carbonate production rate decreases significantly when the concentration of cementation solution exceeds 0.5 mol/L.The results of site trial showed that the number of EICP treatments has the most significant impact on the effectiveness of EICP treatment and the highest dynamic deformation modulus(Evd)of EICP-treated bedding layer reached 50.55 MPa.The area with better bio-cementation effect was found to take higher ground stress which validates that the EICP treatment could improve the bearing capacity of foundation by reinforcing the bedding layer.The field trial described and the analysis introduced in this paper can provide a practical basis for applying EICP technology to the reinforcement of bedding layer in poor ground conditions. 展开更多
关键词 Enzyme-induced carbonate precipitation (EICP) Plant-based urease Underground cable duct Foundation reinforcement
下载PDF
Joint Flexible Duplexing and Power Allocation with Deep Reinforcement Learning in Cell-Free Massive MIMO System 被引量:5
13
作者 Danhao Deng Chaowei Wang +2 位作者 Zhi Zhang Lihua Li Weidong Wang 《China Communications》 SCIE CSCD 2023年第4期73-85,共13页
Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their du... Network-assisted full duplex(NAFD)cellfree(CF)massive MIMO has drawn increasing attention in 6G evolvement.In this paper,we build an NAFD CF system in which the users and access points(APs)can flexibly select their duplex modes to increase the link spectral efficiency.Then we formulate a joint flexible duplexing and power allocation problem to balance the user fairness and system spectral efficiency.We further transform the problem into a probability optimization to accommodate the shortterm communications.In contrast with the instant performance optimization,the probability optimization belongs to a sequential decision making problem,and thus we reformulate it as a Markov Decision Process(MDP).We utilizes deep reinforcement learning(DRL)algorithm to search the solution from a large state-action space,and propose an asynchronous advantage actor-critic(A3C)-based scheme to reduce the chance of converging to the suboptimal policy.Simulation results demonstrate that the A3C-based scheme is superior to the baseline schemes in term of the complexity,accumulated log spectral efficiency,and stability. 展开更多
关键词 cell-free massive MIMO flexible duplexing sum fair spectral efficiency deep reinforcement learning asynchronous advantage actor-critic
下载PDF
Survey on AI and Machine Learning Techniques for Microgrid Energy Management Systems 被引量:2
14
作者 Aditya Joshi Skieler Capezza +1 位作者 Ahmad Alhaji Mo-Yuen Chow 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第7期1513-1529,共17页
In the era of an energy revolution,grid decentralization has emerged as a viable solution to meet the increasing global energy demand by incorporating renewables at the distributed level.Microgrids are considered a dr... In the era of an energy revolution,grid decentralization has emerged as a viable solution to meet the increasing global energy demand by incorporating renewables at the distributed level.Microgrids are considered a driving component for accelerating grid decentralization.To optimally utilize the available resources and address potential challenges,there is a need to have an intelligent and reliable energy management system(EMS)for the microgrid.The artificial intelligence field has the potential to address the problems in EMS and can provide resilient,efficient,reliable,and scalable solutions.This paper presents an overview of existing conventional and AI-based techniques for energy management systems in microgrids.We analyze EMS methods for centralized,decentralized,and distributed microgrids separately.Then,we summarize machine learning techniques such as ANNs,federated learning,LSTMs,RNNs,and reinforcement learning for EMS objectives such as economic dispatch,optimal power flow,and scheduling.With the incorporation of AI,microgrids can achieve greater performance efficiency and more reliability for managing a large number of energy resources.However,challenges such as data privacy,security,scalability,explainability,etc.,need to be addressed.To conclude,the authors state the possible future research directions to explore AI-based EMS's potential in real-world applications. 展开更多
关键词 CONSENSUS energy management system(EMS) reinforcement learning supervised learning
下载PDF
Deep Reinforcement Learning Based Power Minimization for RIS-Assisted MISO-OFDM Systems 被引量:1
15
作者 Peng Chen Wenting Huang +1 位作者 Xiao Li Shi Jin 《China Communications》 SCIE CSCD 2023年第4期259-269,共11页
In this paper,we investigate the downlink orthogonal frequency division multiplexing(OFDM)transmission system assisted by reconfigurable intelligent surfaces(RISs).Considering multiple antennas at the base station(BS)... In this paper,we investigate the downlink orthogonal frequency division multiplexing(OFDM)transmission system assisted by reconfigurable intelligent surfaces(RISs).Considering multiple antennas at the base station(BS)and multiple single-antenna users,the joint optimization of precoder at the BS and the phase shift design at the RIS is studied to minimize the transmit power under the constraint of the certain quality-of-service.A deep reinforcement learning(DRL)based algorithm is proposed,in which maximum ratio transmission(MRT)precoding is utilized at the BS and the twin delayed deep deterministic policy gradient(TD3)method is utilized for RIS phase shift optimization.Numerical results demonstrate that the proposed DRL based algorithm can achieve a transmit power almost the same with the lower bound achieved by manifold optimization(MO)algorithm while has much less computation delay. 展开更多
关键词 deep reinforcement learning OFDM PRECODING reconfigurable intelligent surface
下载PDF
A Data-Based Feedback Relearning Algorithm for Uncertain Nonlinear Systems 被引量:1
16
作者 Chaoxu Mu Yong Zhang +2 位作者 Guangbin Cai Ruijun Liu Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第5期1288-1303,共16页
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni... In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison. 展开更多
关键词 Data episodes experience replay neural networks reinforcement learning(RL) uncertain systems
下载PDF
Discrete Phase Shifts Control and Beam Selection in RIS-Aided MISO System via Deep Reinforcement Learning 被引量:1
17
作者 Dongting Lin Yuan Liu 《China Communications》 SCIE CSCD 2023年第8期198-208,共11页
Reconfigurable intelligent surface(RIS)for wireless networks have drawn lots of attention in both academic and industry communities.RIS can dynamically control the phases of the reflection elements to send the signal ... Reconfigurable intelligent surface(RIS)for wireless networks have drawn lots of attention in both academic and industry communities.RIS can dynamically control the phases of the reflection elements to send the signal in the desired direction,thus it provides supplementary links for wireless networks.Most of prior works on RIS-aided wireless communication systems consider continuous phase shifts,but phase shifts of RIS are discrete in practical hardware.Thus we focus on the actual discrete phase shifts on RIS in this paper.Using the advanced deep reinforcement learning(DRL),we jointly optimize the transmit beamforming matrix from the discrete Fourier transform(DFT)codebook at the base station(BS)and the discrete phase shifts at the RIS to maximize the received signal-to-interference plus noise ratio(SINR).Unlike the traditional schemes usually using alternate optimization methods to solve the transmit beamforming and phase shifts,the DRL algorithm proposed in the paper can jointly design the transmit beamforming and phase shifts as the output of the DRL neural network.Numerical results indicate that the DRL proposed can dispose the complicated optimization problem with low computational complexity. 展开更多
关键词 reconfigurable intelligent surface discrete phase shifts transmit beamforming deep reinforcement learning
下载PDF
Reinforcement learning-based scheduling of multi-battery energy storage system 被引量:1
18
作者 CHENG Guangran DONG Lu +1 位作者 YUAN Xin SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第1期117-128,共12页
In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov ... In this paper, a reinforcement learning-based multibattery energy storage system(MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov decision process(MDP) with unknown transition probability. However, the optimal value function is time-dependent and difficult to obtain because of the periodicity of the electricity price and residential load. Therefore, a series of time-independent action-value functions are proposed to describe every period of a day. To approximate every action-value function, a corresponding critic network is established, which is cascaded with other critic networks according to the time sequence. Then, the continuous management strategy is obtained from the related action network. Moreover, a two-stage learning protocol including offline and online learning stages is provided for detailed implementation in real-time battery management. Numerical experimental examples are given to demonstrate the effectiveness of the developed algorithm. 展开更多
关键词 multi-battery energy storage system(MBESS) reinforcement learning periodic value iteration DATA-DRIVEN
下载PDF
Feature Selection with Deep Reinforcement Learning for Intrusion Detection System 被引量:1
19
作者 S.Priya K.Pradeep Mohan Kumar 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3339-3353,共15页
An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over... An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches. 展开更多
关键词 Intrusion detection security reinforcement learning machine learning feature selection beetle antenna search
下载PDF
A Deep Reinforcement Learning-Based Power Control Scheme for the 5G Wireless Systems 被引量:1
20
作者 Renjie Liang Haiyang Lyu Jiancun Fan 《China Communications》 SCIE CSCD 2023年第10期109-119,共11页
In the fifth generation(5G)wireless system,a closed-loop power control(CLPC)scheme based on deep Q learning network(DQN)is introduced to intelligently adjust the transmit power of the base station(BS),which can improv... In the fifth generation(5G)wireless system,a closed-loop power control(CLPC)scheme based on deep Q learning network(DQN)is introduced to intelligently adjust the transmit power of the base station(BS),which can improve the user equipment(UE)received signal to interference plus noise ratio(SINR)to a target threshold range.However,the selected power control(PC)action in DQN is not accurately matched the fluctuations of the wireless environment.Since the experience replay characteristic of the conventional DQN scheme leads to a possibility of insufficient training in the target deep neural network(DNN).As a result,the Q-value of the sub-optimal PC action exceed the optimal one.To solve this problem,we propose the improved DQN scheme.In the proposed scheme,we add an additional DNN to the conventional DQN,and set a shorter training interval to speed up the training of the DNN in order to fully train it.Finally,the proposed scheme can ensure that the Q value of the optimal action remains maximum.After multiple episodes of training,the proposed scheme can generate more accurate PC actions to match the fluctuations of the wireless environment.As a result,the UE received SINR can achieve the target threshold range faster and keep more stable.The simulation results prove that the proposed scheme outperforms the conventional schemes. 展开更多
关键词 reinforcement learning closed-loop power control(CLPC) signal-to-interference-plusnoise ratio(SINR)
下载PDF
上一页 1 2 115 下一页 到第
使用帮助 返回顶部