Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issu...Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issue of automated vehicles affected by replay attacks.A proportional-integral-observer(PIO)with predetermined forgetting parameters is first constructed to acquire the dynamical information of vehicles.Then,a time-varying parameter and two positive scalars are employed to describe the temporal behavior of replay attacks.In light of such a scheme and the common properties of Laplace matrices,the closed-loop system with PIO-based controllers is transformed into a switched and time-delayed one.Furthermore,some sufficient conditions are derived to achieve the desired platooning performance by the view of the Lyapunov stability theory.The controller gains are analytically determined by resorting to the solution of certain matrix inequalities only dependent on maximum and minimum eigenvalues of communication topologies.Finally,a simulation example is provided to illustrate the effectiveness of the proposed control strategy.展开更多
Pupil dynamics are the important characteristics of face spoofing detection.The face recognition system is one of the most used biometrics for authenticating individual identity.The main threats to the facial recognit...Pupil dynamics are the important characteristics of face spoofing detection.The face recognition system is one of the most used biometrics for authenticating individual identity.The main threats to the facial recognition system are different types of presentation attacks like print attacks,3D mask attacks,replay attacks,etc.The proposed model uses pupil characteristics for liveness detection during the authentication process.The pupillary light reflex is an involuntary reaction controlling the pupil’s diameter at different light intensities.The proposed framework consists of two-phase methodologies.In the first phase,the pupil’s diameter is calculated by applying stimulus(light)in one eye of the subject and calculating the constriction of the pupil size on both eyes in different video frames.The above measurement is converted into feature space using Kohn and Clynes model-defined parameters.The Support Vector Machine is used to classify legitimate subjects when the diameter change is normal(or when the eye is alive)or illegitimate subjects when there is no change or abnormal oscillations of pupil behavior due to the presence of printed photograph,video,or 3D mask of the subject in front of the camera.In the second phase,we perform the facial recognition process.Scale-invariant feature transform(SIFT)is used to find the features from the facial images,with each feature having a size of a 128-dimensional vector.These features are scale,rotation,and orientation invariant and are used for recognizing facial images.The brute force matching algorithm is used for matching features of two different images.The threshold value we considered is 0.08 for good matches.To analyze the performance of the framework,we tested our model in two Face antispoofing datasets named Replay attack datasets and CASIA-SURF datasets,which were used because they contain the videos of the subjects in each sample having three modalities(RGB,IR,Depth).The CASIA-SURF datasets showed an 89.9%Equal Error Rate,while the Replay Attack datasets showed a 92.1%Equal Error Rate.展开更多
Various organizations store data online rather than on physical servers.As the number of user’s data stored in cloud servers increases,the attack rate to access data from cloud servers also increases.Different resear...Various organizations store data online rather than on physical servers.As the number of user’s data stored in cloud servers increases,the attack rate to access data from cloud servers also increases.Different researchers worked on different algorithms to protect cloud data from replay attacks.None of the papers used a technique that simultaneously detects a full-message and partial-message replay attack.This study presents the development of a TKN(Text,Key and Name)cryptographic algorithm aimed at protecting data from replay attacks.The program employs distinct ways to encrypt plain text[P],a user-defined Key[K],and a Secret Code[N].The novelty of the TKN cryptographic algorithm is that the bit value of each text is linked to another value with the help of the proposed algorithm,and the length of the cipher text obtained is twice the length of the original text.In the scenario that an attacker executes a replay attack on the cloud server,engages in cryptanalysis,or manipulates any data,it will result in automated modification of all associated values inside the backend.This mechanism has the benefit of enhancing the detectability of replay attacks.Nevertheless,the attacker cannot access data not included in any of the papers,regardless of how effective the attack strategy is.At the end of paper,the proposed algorithm’s novelty will be compared with different algorithms,and it will be discussed how far the proposed algorithm is better than all other algorithms.展开更多
Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different ...Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.展开更多
Network attack detection and mitigation require packet collection,pre-processing,feature analysis,classification,and post-processing.Models for these tasks sometimes become complex or inefficient when applied to real-...Network attack detection and mitigation require packet collection,pre-processing,feature analysis,classification,and post-processing.Models for these tasks sometimes become complex or inefficient when applied to real-time data samples.To mitigate hybrid assaults,this study designs an efficient forensic layer employing deep learning pattern analysis and multidomain feature extraction.In this paper,we provide a novel multidomain feature extraction method using Fourier,Z,Laplace,Discrete Cosine Transform(DCT),1D Haar Wavelet,Gabor,and Convolutional Operations.Evolutionary method dragon fly optimisation reduces feature dimensionality and improves feature selection accuracy.The selected features are fed into VGGNet and GoogLeNet models using binary cascaded neural networks to analyse network traffic patterns,detect anomalies,and warn network administrators.The suggested model tackles the inadequacies of existing approaches to hybrid threats,which are growing more common and challenge conventional security measures.Our model integrates multidomain feature extraction,deep learning pattern analysis,and the forensic layer to improve intrusion detection and prevention systems.In diverse attack scenarios,our technique has 3.5% higher accuracy,4.3% higher precision,8.5% higher recall,and 2.9% lower delay than previous models.展开更多
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni...In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.展开更多
Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,kno...Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.展开更多
Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can lear...Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples.展开更多
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ...Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.展开更多
In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use ...In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use of predictive arguments with a twofold aim:1)Promptly detect malicious agent behaviors affecting normal system operations;2)Apply specific control actions,based on predictive ideas,for mitigating as much as possible undesirable domino effects resulting from adversary operations.Specifically,the multi-agent system is topologically described by a leader-follower digraph characterized by a unique leader and set-theoretic receding horizon control ideas are exploited to develop a distributed algorithm capable to instantaneously recognize the attacked agent.Finally,numerical simulations are carried out to show benefits and effectiveness of the proposed approach.展开更多
This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constrai...This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constraints are imposed by not only the ego system’s own physical limitations but also other systems operating nearby.Since the model of the external agent is required to impose control barrier functions(CBFs)as safety constraints,a safety-aware loss function is defined and minimized to learn the uncertain and unknown behavior of external agents.More specifically,the loss function is defined based on barrier function error,instead of the system model error,and is minimized for both current samples as well as past samples stored in the memory to assure a fast and generalizable learning algorithm for approximating the safe set.The proposed model learning and CBF are then integrated together to form a learning-enabled zeroing CBF(L-ZCBF),which employs the approximated trajectory information of the external agents provided by the learned model but shrinks the safety boundary in case of an imminent safety violation using instantaneous sensory observations.It is shown that the proposed L-ZCBF assures the safety guarantees during learning and even in the face of inaccurate or simplified approximation of external agents,which is crucial in safety-critical applications in highly interactive environments.The efficacy of the proposed method is examined in a simulation of safe maneuver control of a vehicle in an urban area.展开更多
Mobile Ad hoc NETworks (MANETs), characterized by the free move of mobile nodes are more vulnerable to the trivial Denial-of-Service (DoS) attacks such as replay attacks. A replay attacker performs this attack at anyt...Mobile Ad hoc NETworks (MANETs), characterized by the free move of mobile nodes are more vulnerable to the trivial Denial-of-Service (DoS) attacks such as replay attacks. A replay attacker performs this attack at anytime and anywhere in the network by interception and retransmission of the valid signed messages. Consequently, the MANET performance is severally degraded by the overhead produced by the redundant valid messages. In this paper, we propose an enhancement of timestamp discrepancy used to validate a signed message and consequently limiting the impact of a replay attack. Our proposed timestamp concept estimates approximately the time where the message is received and validated by the received node. This estimation is based on the existing parameters defined at the 802.11 MAC layer.展开更多
This paper suggests the use of zonotopes for the design of watermark signals.The proposed approach exploits the recent analogy found between stochastic and zonotopic-based estimators to propose a deterministic counter...This paper suggests the use of zonotopes for the design of watermark signals.The proposed approach exploits the recent analogy found between stochastic and zonotopic-based estimators to propose a deterministic counterpart to current approaches that study the replay attack in the context of stationary Gaussian processes.In this regard,the zonotopic analogous case where the control loop is closed based on the estimates of a zonotopic Kalman filter(ZKF)is analyzed.This formulation allows to propose a new performance metric that is related to the Frobenius norm of the prediction zonotope.Hence,the steadystate operation of the system can be related with the size of the minimal Robust Positive Invariant set of the estimation error.Furthermore,analogous expressions concerning the impact that a zonotopic/Gaussian watermark signal has on the system operation are derived.Finally,a novel zonotopically bounded watermark signal that ensures the attack detection by causing the residual vector to exit the healthy residual set during the replay phase of the attack is introduced.The proposed approach is illustrated in simulation using a quadruple-tank process.展开更多
The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast ...The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast communication and plaintext to transmit data, the data frame does not contain the source address and the destination address, making the Electric Vehicle (EV) vulnerable to replay attack in the charging process. In order to verify the security problems of the protocol, this paper uses 27,655 message data in the complete charging process provided by Shanghai Thaisen electric company, and analyzes these actual data frames one by one with the program written by C++. In order to enhance the security of the protocol, Rivest-Shamir-Adleman (RSA) digital signature and adding random numbers are proposed to resist replay attack. Under the experimental environment of Eclipse, the normal charging of electric vehicles, RSA digital signature and random number defense are simulated. Experimental results show that RSA digital signature cannot resist replay attack, and adding random numbers can effectively enhance the ability of EV to resist replay attack during charging.展开更多
Substation automation system uses IEC 61850 protocol for the data transmission between different equipment manufacturers. However, the IEC 61850 protocol lacks an authentication security mechanism, which will make the...Substation automation system uses IEC 61850 protocol for the data transmission between different equipment manufacturers. However, the IEC 61850 protocol lacks an authentication security mechanism, which will make the communication face four threats: eavesdropping, interception, forgery, and alteration. In order to verify the IEC 61850 protocol communication problems, we used the simulation software to build the main operating equipment in the IEC 61850 network environment of the communication system. We verified IEC 61850 transmission protocol security defects, under DoS attack and Reply attack. In order to enhance security agreement, an improved algorithm was proposed based on identity authentication (W-EAP, Whitelist Based ECC & AES Protocol). Experimental results showed that the method can enhance the ability to resist attacks.展开更多
The need of a fast and global communication service has increased the competition among mobile phone companies. Therefore, those companies started adapting new methods to satisfy the needs of their customers. The Info...The need of a fast and global communication service has increased the competition among mobile phone companies. Therefore, those companies started adapting new methods to satisfy the needs of their customers. The Information Technology system proposed in this work is believed to provide an effective trading and managing tool for mobile’s community in Jordan.展开更多
基金supported in part by the National Natural Science Foundation of China (61973219,U21A2019,61873058)the Hainan Province Science and Technology Special Fund (ZDYF2022SHFZ105)。
文摘Secure platooning control plays an important role in enhancing the cooperative driving safety of automated vehicles subject to various security vulnerabilities.This paper focuses on the distributed secure control issue of automated vehicles affected by replay attacks.A proportional-integral-observer(PIO)with predetermined forgetting parameters is first constructed to acquire the dynamical information of vehicles.Then,a time-varying parameter and two positive scalars are employed to describe the temporal behavior of replay attacks.In light of such a scheme and the common properties of Laplace matrices,the closed-loop system with PIO-based controllers is transformed into a switched and time-delayed one.Furthermore,some sufficient conditions are derived to achieve the desired platooning performance by the view of the Lyapunov stability theory.The controller gains are analytically determined by resorting to the solution of certain matrix inequalities only dependent on maximum and minimum eigenvalues of communication topologies.Finally,a simulation example is provided to illustrate the effectiveness of the proposed control strategy.
基金funded by Researchers Supporting Program at King Saud University (RSPD2023R809).
文摘Pupil dynamics are the important characteristics of face spoofing detection.The face recognition system is one of the most used biometrics for authenticating individual identity.The main threats to the facial recognition system are different types of presentation attacks like print attacks,3D mask attacks,replay attacks,etc.The proposed model uses pupil characteristics for liveness detection during the authentication process.The pupillary light reflex is an involuntary reaction controlling the pupil’s diameter at different light intensities.The proposed framework consists of two-phase methodologies.In the first phase,the pupil’s diameter is calculated by applying stimulus(light)in one eye of the subject and calculating the constriction of the pupil size on both eyes in different video frames.The above measurement is converted into feature space using Kohn and Clynes model-defined parameters.The Support Vector Machine is used to classify legitimate subjects when the diameter change is normal(or when the eye is alive)or illegitimate subjects when there is no change or abnormal oscillations of pupil behavior due to the presence of printed photograph,video,or 3D mask of the subject in front of the camera.In the second phase,we perform the facial recognition process.Scale-invariant feature transform(SIFT)is used to find the features from the facial images,with each feature having a size of a 128-dimensional vector.These features are scale,rotation,and orientation invariant and are used for recognizing facial images.The brute force matching algorithm is used for matching features of two different images.The threshold value we considered is 0.08 for good matches.To analyze the performance of the framework,we tested our model in two Face antispoofing datasets named Replay attack datasets and CASIA-SURF datasets,which were used because they contain the videos of the subjects in each sample having three modalities(RGB,IR,Depth).The CASIA-SURF datasets showed an 89.9%Equal Error Rate,while the Replay Attack datasets showed a 92.1%Equal Error Rate.
基金Deanship of Scientific Research at Majmaah University for supporting this work under Project Number R-2023-811.
文摘Various organizations store data online rather than on physical servers.As the number of user’s data stored in cloud servers increases,the attack rate to access data from cloud servers also increases.Different researchers worked on different algorithms to protect cloud data from replay attacks.None of the papers used a technique that simultaneously detects a full-message and partial-message replay attack.This study presents the development of a TKN(Text,Key and Name)cryptographic algorithm aimed at protecting data from replay attacks.The program employs distinct ways to encrypt plain text[P],a user-defined Key[K],and a Secret Code[N].The novelty of the TKN cryptographic algorithm is that the bit value of each text is linked to another value with the help of the proposed algorithm,and the length of the cipher text obtained is twice the length of the original text.In the scenario that an attacker executes a replay attack on the cloud server,engages in cryptanalysis,or manipulates any data,it will result in automated modification of all associated values inside the backend.This mechanism has the benefit of enhancing the detectability of replay attacks.Nevertheless,the attacker cannot access data not included in any of the papers,regardless of how effective the attack strategy is.At the end of paper,the proposed algorithm’s novelty will be compared with different algorithms,and it will be discussed how far the proposed algorithm is better than all other algorithms.
文摘Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.
文摘Network attack detection and mitigation require packet collection,pre-processing,feature analysis,classification,and post-processing.Models for these tasks sometimes become complex or inefficient when applied to real-time data samples.To mitigate hybrid assaults,this study designs an efficient forensic layer employing deep learning pattern analysis and multidomain feature extraction.In this paper,we provide a novel multidomain feature extraction method using Fourier,Z,Laplace,Discrete Cosine Transform(DCT),1D Haar Wavelet,Gabor,and Convolutional Operations.Evolutionary method dragon fly optimisation reduces feature dimensionality and improves feature selection accuracy.The selected features are fed into VGGNet and GoogLeNet models using binary cascaded neural networks to analyse network traffic patterns,detect anomalies,and warn network administrators.The suggested model tackles the inadequacies of existing approaches to hybrid threats,which are growing more common and challenge conventional security measures.Our model integrates multidomain feature extraction,deep learning pattern analysis,and the forensic layer to improve intrusion detection and prevention systems.In diverse attack scenarios,our technique has 3.5% higher accuracy,4.3% higher precision,8.5% higher recall,and 2.9% lower delay than previous models.
基金supported in part by the National Key Research and Development Program of China(2021YFB1714700)the National Natural Science Foundation of China(62022061,6192100028)。
文摘In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.
基金supported in part by the National Natura Science Foundation of China(U2013602,61876181,51521003)the Nationa Key R&D Program of China(2020YFB13134)+2 种基金Shenzhen Science and Technology Research and Development Foundation(JCYJ20190813171009236)Beijing Nova Program of Science and Technology(Z191100001119043)the Youth Innovation Promotion Association,Chinese Academy of Sciences。
文摘Continual learning(CL)studies the problem of learning to accumulate knowledge over time from a stream of data.A crucial challenge is that neural networks suffer from performance degradation on previously seen data,known as catastrophic forgetting,due to allowing parameter sharing.In this work,we consider a more practical online class-incremental CL setting,where the model learns new samples in an online manner and may continuously experience new classes.Moreover,prior knowledge is unavailable during training and evaluation.Existing works usually explore sample usages from a single dimension,which ignores a lot of valuable supervisory information.To better tackle the setting,we propose a novel replay-based CL method,which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge.Specifically,besides the previous raw samples,we store the corresponding logits and features in the memory.Furthermore,to imitate the prediction of the past model,we construct extra constraints by leveraging multi-level information stored in the memory.With the same number of samples for replay,our method can use more past knowledge to prevent interference.We conduct extensive evaluations on several popular CL datasets,and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory.We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.
基金supported by Imperial College London,UK,King’s College London,UK and Engineering and Physical Sciences Research Council(EPSRC),UK.
文摘Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples.
基金supported by the National Natural Science Foundation of China(61751210,61572441)。
文摘Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
文摘In this paper,a resilient distributed control scheme against replay attacks for multi-agent networked systems subject to input and state constraints is proposed.The methodological starting point relies on a smart use of predictive arguments with a twofold aim:1)Promptly detect malicious agent behaviors affecting normal system operations;2)Apply specific control actions,based on predictive ideas,for mitigating as much as possible undesirable domino effects resulting from adversary operations.Specifically,the multi-agent system is topologically described by a leader-follower digraph characterized by a unique leader and set-theoretic receding horizon control ideas are exploited to develop a distributed algorithm capable to instantaneously recognize the attacked agent.Finally,numerical simulations are carried out to show benefits and effectiveness of the proposed approach.
文摘This paper presents learning-enabled barriercertified safe controllers for systems that operate in a shared environment for which multiple systems with uncertain dynamics and behaviors interact.That is,safety constraints are imposed by not only the ego system’s own physical limitations but also other systems operating nearby.Since the model of the external agent is required to impose control barrier functions(CBFs)as safety constraints,a safety-aware loss function is defined and minimized to learn the uncertain and unknown behavior of external agents.More specifically,the loss function is defined based on barrier function error,instead of the system model error,and is minimized for both current samples as well as past samples stored in the memory to assure a fast and generalizable learning algorithm for approximating the safe set.The proposed model learning and CBF are then integrated together to form a learning-enabled zeroing CBF(L-ZCBF),which employs the approximated trajectory information of the external agents provided by the learned model but shrinks the safety boundary in case of an imminent safety violation using instantaneous sensory observations.It is shown that the proposed L-ZCBF assures the safety guarantees during learning and even in the face of inaccurate or simplified approximation of external agents,which is crucial in safety-critical applications in highly interactive environments.The efficacy of the proposed method is examined in a simulation of safe maneuver control of a vehicle in an urban area.
文摘Mobile Ad hoc NETworks (MANETs), characterized by the free move of mobile nodes are more vulnerable to the trivial Denial-of-Service (DoS) attacks such as replay attacks. A replay attacker performs this attack at anytime and anywhere in the network by interception and retransmission of the valid signed messages. Consequently, the MANET performance is severally degraded by the overhead produced by the redundant valid messages. In this paper, we propose an enhancement of timestamp discrepancy used to validate a signed message and consequently limiting the impact of a replay attack. Our proposed timestamp concept estimates approximately the time where the message is received and validated by the received node. This estimation is based on the existing parameters defined at the 802.11 MAC layer.
基金in part supported by the Margarita Salas grant from the Spanish Ministry of Universities funded by the European Union NexGenerationEUin part co-funded by the Spanish State Research Agency(AEI)and the European Regional Development Fund(ERFD)through the project SaCoAV(ref.MINECO PID2020-114244RBI00)。
文摘This paper suggests the use of zonotopes for the design of watermark signals.The proposed approach exploits the recent analogy found between stochastic and zonotopic-based estimators to propose a deterministic counterpart to current approaches that study the replay attack in the context of stationary Gaussian processes.In this regard,the zonotopic analogous case where the control loop is closed based on the estimates of a zonotopic Kalman filter(ZKF)is analyzed.This formulation allows to propose a new performance metric that is related to the Frobenius norm of the prediction zonotope.Hence,the steadystate operation of the system can be related with the size of the minimal Robust Positive Invariant set of the estimation error.Furthermore,analogous expressions concerning the impact that a zonotopic/Gaussian watermark signal has on the system operation are derived.Finally,a novel zonotopically bounded watermark signal that ensures the attack detection by causing the residual vector to exit the healthy residual set during the replay phase of the attack is introduced.The proposed approach is illustrated in simulation using a quadruple-tank process.
文摘The GB/T 27930-2015 protocol is the communication protocol between the non-vehicle-mounted charger and the battery management system (BMS) stipulated by the state. However, as the protocol adopts the way of broadcast communication and plaintext to transmit data, the data frame does not contain the source address and the destination address, making the Electric Vehicle (EV) vulnerable to replay attack in the charging process. In order to verify the security problems of the protocol, this paper uses 27,655 message data in the complete charging process provided by Shanghai Thaisen electric company, and analyzes these actual data frames one by one with the program written by C++. In order to enhance the security of the protocol, Rivest-Shamir-Adleman (RSA) digital signature and adding random numbers are proposed to resist replay attack. Under the experimental environment of Eclipse, the normal charging of electric vehicles, RSA digital signature and random number defense are simulated. Experimental results show that RSA digital signature cannot resist replay attack, and adding random numbers can effectively enhance the ability of EV to resist replay attack during charging.
文摘Substation automation system uses IEC 61850 protocol for the data transmission between different equipment manufacturers. However, the IEC 61850 protocol lacks an authentication security mechanism, which will make the communication face four threats: eavesdropping, interception, forgery, and alteration. In order to verify the IEC 61850 protocol communication problems, we used the simulation software to build the main operating equipment in the IEC 61850 network environment of the communication system. We verified IEC 61850 transmission protocol security defects, under DoS attack and Reply attack. In order to enhance security agreement, an improved algorithm was proposed based on identity authentication (W-EAP, Whitelist Based ECC & AES Protocol). Experimental results showed that the method can enhance the ability to resist attacks.
文摘The need of a fast and global communication service has increased the competition among mobile phone companies. Therefore, those companies started adapting new methods to satisfy the needs of their customers. The Information Technology system proposed in this work is believed to provide an effective trading and managing tool for mobile’s community in Jordan.