Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev...Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.展开更多
Hyperspectral(HS)image classification plays a crucial role in numerous areas including remote sensing(RS),agriculture,and the monitoring of the environment.Optimal band selection in HS images is crucial for improving ...Hyperspectral(HS)image classification plays a crucial role in numerous areas including remote sensing(RS),agriculture,and the monitoring of the environment.Optimal band selection in HS images is crucial for improving the efficiency and accuracy of image classification.This process involves selecting the most informative spectral bands,which leads to a reduction in data volume.Focusing on these key bands also enhances the accuracy of classification algorithms,as redundant or irrelevant bands,which can introduce noise and lower model performance,are excluded.In this paper,we propose an approach for HS image classification using deep Q learning(DQL)and a novel multi-objective binary grey wolf optimizer(MOBGWO).We investigate the MOBGWO for optimal band selection to further enhance the accuracy of HS image classification.In the suggested MOBGWO,a new sigmoid function is introduced as a transfer function to modify the wolves’position.The primary objective of this classification is to reduce the number of bands while maximizing classification accuracy.To evaluate the effectiveness of our approach,we conducted experiments on publicly available HS image datasets,including Pavia University,Washington Mall,and Indian Pines datasets.We compared the performance of our proposed method with several state-of-the-art deep learning(DL)and machine learning(ML)algorithms,including long short-term memory(LSTM),deep neural network(DNN),recurrent neural network(RNN),support vector machine(SVM),and random forest(RF).Our experimental results demonstrate that the Hybrid MOBGWO-DQL significantly improves classification accuracy compared to traditional optimization and DL techniques.MOBGWO-DQL shows greater accuracy in classifying most categories in both datasets used.For the Indian Pine dataset,the MOBGWO-DQL architecture achieved a kappa coefficient(KC)of 97.68%and an overall accuracy(OA)of 94.32%.This was accompanied by the lowest root mean square error(RMSE)of 0.94,indicating very precise predictions with minimal error.In the case of the Pavia University dataset,the MOBGWO-DQL model demonstrated outstanding performance with the highest KC of 98.72%and an impressive OA of 96.01%.It also recorded the lowest RMSE at 0.63,reinforcing its accuracy in predictions.The results clearly demonstrate that the proposed MOBGWO-DQL architecture not only reaches a highly accurate model more quickly but also maintains superior performance throughout the training process.展开更多
An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over...An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches.展开更多
The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the cro...The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the crowded spectrum, the time-varying channels, and the malicious intelligent jamming. The existing frequency hopping, automatic link establishment and some new anti-jamming technologies can not completely solve the above problems. In this article, we adopt deep reinforcement learning to solve this intractable challenge. First, the combination of the spectrum state and the channel gain state is defined as the complex environmental state, and the Markov characteristic of defined state is analyzed and proved. Then, considering that the spectrum state and channel gain state are heterogeneous information, a new deep Q network(DQN) framework is designed, which contains multiple sub-networks to process different kinds of information. Finally, aiming to improve the learning speed and efficiency, the optimization targets of corresponding sub-networks are reasonably designed, and a heterogeneous information fusion deep reinforcement learning(HIF-DRL) algorithm is designed for the specific frequency selection. Simulation results show that the proposed algorithm performs well in channel prediction, jamming avoidance and frequency channel selection.展开更多
Unmanned aerial vehicle(UAV)-assisted communications have been considered as a solution of aerial networking in future wireless networks due to its low-cost, high-mobility, and swift features. This paper considers a U...Unmanned aerial vehicle(UAV)-assisted communications have been considered as a solution of aerial networking in future wireless networks due to its low-cost, high-mobility, and swift features. This paper considers a UAV-assisted downlink transmission,where UAVs are deployed as aerial base stations to serve ground users. To maximize the average transmission rate among the ground users, this paper formulates a joint optimization problem of UAV trajectory design and channel selection, which is NP-hard and non-convex. To solve the problem, we propose a multi-agent deep Q-network(MADQN) scheme.Specifically, the agents that the UAVs act as perform actions from their observations distributively and share the same reward. To tackle the tasks where the experience is insufficient, we propose a multi-agent meta reinforcement learning algorithm to fast adapt to the new tasks. By pretraining the tasks with similar distribution, the learning model can acquire general knowledge. Simulation results have indicated the MADQN scheme can achieve higher throughput than fixed allocation. Furthermore, our proposed multiagent meta reinforcement learning algorithm learns the new tasks much faster compared with the MADQN scheme.展开更多
With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from...With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from constructing end-to-end models directly,integrating learning approaches with some modules in the traditional methods for solving MILPs is also a promising direction.The cutting plane method is one of the fundamental algorithms used in modern MILP solvers,and the selection of appropriate cuts from the candidate cuts subset is crucial for enhancing efficiency.Due to the reliance on expert knowledge and problem-specific heuristics,classical cut selection methods are not always transferable and often limit the scalability and generalizability of the cutting plane method.To provide a more efficient and generalizable strategy,we propose a reinforcement learning(RL)framework to enhance cut selection in the solving process of MILPs.Firstly,we design feature vectors to incorporate the inherent properties of MILP and computational information from the solver and represent MILP instances as bipartite graphs.Secondly,we choose the weighted metrics to approximate the proximity of feasible solutions to the convex hull and utilize the learning method to determine the weights assigned to each metric.Thirdly,a graph convolutional neural network is adopted with a self-attention mechanism to predict the value of weighting factors.Finally,we transform the cut selection process into a Markov decision process and utilize RL method to train the model.Extensive experiments are conducted based on a leading open-source MILP solver SCIP.Results on both general and specific datasets validate the effectiveness and efficiency of our proposed approach.展开更多
In this paper,we investigate a backhaul framework jointly considering topology construction and power adjustment for self-organizing UAV networks.To enhance the backhaul rate with limited information exchange and avoi...In this paper,we investigate a backhaul framework jointly considering topology construction and power adjustment for self-organizing UAV networks.To enhance the backhaul rate with limited information exchange and avoid malicious power competition,we propose a deep reinforcement learning(DRL)based method to construct the backhaul framework where each UAV distributedly makes decisions.First,we decompose the backhaul framework into three submodules,i.e.,transmission target selection(TS),total power control(PC),and multi-channel power allocation(PA).Then,the three submodules are solved by heterogeneous DRL algorithms with tailored rewards to regulate UAVs’behaviors.In particular,TS is solved by deep-Q learning to construct topology with less relay and guarantee the backhaul rate.PC and PA are solved by deep deterministic policy gradient to match the traffic requirement with proper finegrained transmission power.As a result,the malicious power competition is alleviated,and the backhaul rate is further enhanced.Simulation results show that the proposed framework effectively achieves system-level and all-around performance gain compared with DQL and max-min method,i.e.,higher backhaul rate,lower transmission power,and fewer hop.展开更多
With the increasing demand for the automation of operations and processes in mechatronic systems,fault detection and diagnosis has become a major topic to guarantee the process performance.There exist numerous studies...With the increasing demand for the automation of operations and processes in mechatronic systems,fault detection and diagnosis has become a major topic to guarantee the process performance.There exist numerous studies on the topic of applying artificial intelligence methods for fault detection and diagnosis.However,much of the focus has been given on the detection of faults.In terms of the diagnosis of faults,on one hand,assumptions are required,which restricts the diagnosis range.On the other hand,different faults with similar symptoms cannot be distinguished,especially when the model is not trained by plenty of data.In this work,we proposed a reinforcement learning system for fault detection and diagnosis.No assumption is required.Feature exaction is first made.Then with the features as the states of the environment,the agent directly interacts with the environment.Optimal policy,which determines the exact category,size and location of the fault,is obtained by updating Q values.The method takes advantage of expert knowledge.When the features are unclear,action will be made to get more information from the new state for further determination.We create recurrent neural network with the long short-term memory architecture to approximate Q values.The application on a motor is discussed.The experimental results validate that the proposed method demonstrates a significant improvement compared with existing state-of-the-art methods of fault detection and diagnosis.展开更多
Material selection has become a critical part of design for engineers,due to availability of diverse choice of materials that have similar properties and meet the product design specification.Implementation of statist...Material selection has become a critical part of design for engineers,due to availability of diverse choice of materials that have similar properties and meet the product design specification.Implementation of statistical analysis alone makes it difficult to identify the ideal composition of the final composite.An integrated approach between statistical model and micromechanical model is desired.In this paper,resultant natural fibre and polymer matrix from previous study is used to estimate the mechanical properties such as density,Young’s modulus and tensile strength.Four levels of fibre loading are used to compare the optimum natural fibre reinforced polymer composite(NFRPC).The result from this analytical approach revealed that kenaf/polystyrene(PS)with 40%fibre loading is the ideal composite in automotive component application.It was found that the ideal composite score is 1.156 g/cm^(3),24.2 GPa and 413.4 MPa for density,Young’s modulus and tensile strength,respectively.A suggestion to increase the properties on Young’s modulus are also presented.This work proves that the statistical model is well incorporated with the analytical approach to choose the correct composite to use in automotive application.展开更多
近年来,工控网络发展势头迅猛.其数字化、智能化、自动化的优势为工业带来巨大效益的同时,也面临着愈发复杂多变的攻击威胁.在数据要素安全的背景下,及时发现和应对工控网络威胁成为一项迫切需要得到解决的任务.通过对工控网络中的数据...近年来,工控网络发展势头迅猛.其数字化、智能化、自动化的优势为工业带来巨大效益的同时,也面临着愈发复杂多变的攻击威胁.在数据要素安全的背景下,及时发现和应对工控网络威胁成为一项迫切需要得到解决的任务.通过对工控网络中的数据流进行连续监测和分析,工控网络威胁检测问题可以转化为时间序列异常检测问题.然而现有时间序列异常检测方法受限于工控网络数据集的质量,且往往仅对单一类型异常敏感而忽略其他异常.针对上述问题,提出了一种基于深度强化学习和数据增强的工控网络威胁检测方法(deep reinforcement learning and data augmentation based threat detection method in industrial control networks,DELTA).该方法提出了一种新的时序数据集数据增强选择方法,可以针对不同的基准模型选择合适的数据增强操作集以提升工控网络时间序列数据集的质量;同时使用深度强化学习算法(A2C/PPO)在不同时间点从基线模型中动态选取候选模型,以利用多种类型的异常检测模型解决单一类型异常敏感问题.与现有时间序列异常检测模型对比的实验结果表明,在付出可接受的额外时间消耗成本下,DELTA在准确率和F1值上比所有基线模型有明显的提升,验证了方法的有效性与实用性.展开更多
基金the National Natural Science Foundation of China(62076225,62073300)the Natural Science Foundation for Distinguished Young Scholars of Hubei(2019CFA081)。
文摘Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.
文摘Hyperspectral(HS)image classification plays a crucial role in numerous areas including remote sensing(RS),agriculture,and the monitoring of the environment.Optimal band selection in HS images is crucial for improving the efficiency and accuracy of image classification.This process involves selecting the most informative spectral bands,which leads to a reduction in data volume.Focusing on these key bands also enhances the accuracy of classification algorithms,as redundant or irrelevant bands,which can introduce noise and lower model performance,are excluded.In this paper,we propose an approach for HS image classification using deep Q learning(DQL)and a novel multi-objective binary grey wolf optimizer(MOBGWO).We investigate the MOBGWO for optimal band selection to further enhance the accuracy of HS image classification.In the suggested MOBGWO,a new sigmoid function is introduced as a transfer function to modify the wolves’position.The primary objective of this classification is to reduce the number of bands while maximizing classification accuracy.To evaluate the effectiveness of our approach,we conducted experiments on publicly available HS image datasets,including Pavia University,Washington Mall,and Indian Pines datasets.We compared the performance of our proposed method with several state-of-the-art deep learning(DL)and machine learning(ML)algorithms,including long short-term memory(LSTM),deep neural network(DNN),recurrent neural network(RNN),support vector machine(SVM),and random forest(RF).Our experimental results demonstrate that the Hybrid MOBGWO-DQL significantly improves classification accuracy compared to traditional optimization and DL techniques.MOBGWO-DQL shows greater accuracy in classifying most categories in both datasets used.For the Indian Pine dataset,the MOBGWO-DQL architecture achieved a kappa coefficient(KC)of 97.68%and an overall accuracy(OA)of 94.32%.This was accompanied by the lowest root mean square error(RMSE)of 0.94,indicating very precise predictions with minimal error.In the case of the Pavia University dataset,the MOBGWO-DQL model demonstrated outstanding performance with the highest KC of 98.72%and an impressive OA of 96.01%.It also recorded the lowest RMSE at 0.63,reinforcing its accuracy in predictions.The results clearly demonstrate that the proposed MOBGWO-DQL architecture not only reaches a highly accurate model more quickly but also maintains superior performance throughout the training process.
文摘An intrusion detection system(IDS)becomes an important tool for ensuring security in the network.In recent times,machine learning(ML)and deep learning(DL)models can be applied for the identification of intrusions over the network effectively.To resolve the security issues,this paper presents a new Binary Butterfly Optimization algorithm based on Feature Selection with DRL technique,called BBOFS-DRL for intrusion detection.The proposed BBOFSDRL model mainly accomplishes the recognition of intrusions in the network.To attain this,the BBOFS-DRL model initially designs the BBOFS algorithm based on the traditional butterfly optimization algorithm(BOA)to elect feature subsets.Besides,DRL model is employed for the proper identification and classification of intrusions that exist in the network.Furthermore,beetle antenna search(BAS)technique is applied to tune the DRL parameters for enhanced intrusion detection efficiency.For ensuring the superior intrusion detection outcomes of the BBOFS-DRL model,a wide-ranging experimental analysis is performed against benchmark dataset.The simulation results reported the supremacy of the BBOFS-DRL model over its recent state of art approaches.
基金supported by Guangxi key Laboratory Fund of Embedded Technology and Intelligent System under Grant No. 2018B-1the Natural Science Foundation for Distinguished Young Scholars of Jiangsu Province under Grant No. BK20160034+1 种基金the National Natural Science Foundation of China under Grant No. 61771488, No. 61671473 and No. 61631020in part by the Open Research Foundation of Science and Technology on Communication Networks Laboratory
文摘The high-frequency(HF) communication is one of essential communication methods for military and emergency application. However, the selection of communication frequency channel is always a difficult problem as the crowded spectrum, the time-varying channels, and the malicious intelligent jamming. The existing frequency hopping, automatic link establishment and some new anti-jamming technologies can not completely solve the above problems. In this article, we adopt deep reinforcement learning to solve this intractable challenge. First, the combination of the spectrum state and the channel gain state is defined as the complex environmental state, and the Markov characteristic of defined state is analyzed and proved. Then, considering that the spectrum state and channel gain state are heterogeneous information, a new deep Q network(DQN) framework is designed, which contains multiple sub-networks to process different kinds of information. Finally, aiming to improve the learning speed and efficiency, the optimization targets of corresponding sub-networks are reasonably designed, and a heterogeneous information fusion deep reinforcement learning(HIF-DRL) algorithm is designed for the specific frequency selection. Simulation results show that the proposed algorithm performs well in channel prediction, jamming avoidance and frequency channel selection.
基金supported in part by the National Nature Science Foundation of China under Grant 62131005 and U19B2014in part by the National Key Research and Development Program of China under Grant 254。
文摘Unmanned aerial vehicle(UAV)-assisted communications have been considered as a solution of aerial networking in future wireless networks due to its low-cost, high-mobility, and swift features. This paper considers a UAV-assisted downlink transmission,where UAVs are deployed as aerial base stations to serve ground users. To maximize the average transmission rate among the ground users, this paper formulates a joint optimization problem of UAV trajectory design and channel selection, which is NP-hard and non-convex. To solve the problem, we propose a multi-agent deep Q-network(MADQN) scheme.Specifically, the agents that the UAVs act as perform actions from their observations distributively and share the same reward. To tackle the tasks where the experience is insufficient, we propose a multi-agent meta reinforcement learning algorithm to fast adapt to the new tasks. By pretraining the tasks with similar distribution, the learning model can acquire general knowledge. Simulation results have indicated the MADQN scheme can achieve higher throughput than fixed allocation. Furthermore, our proposed multiagent meta reinforcement learning algorithm learns the new tasks much faster compared with the MADQN scheme.
基金supported by the National Key R&D Program of China(Grant No.2022YFB2403400)National Natural Science Foundation of China(Grant Nos.11991021 and 12021001)。
文摘With the rapid development of artificial intelligence in recent years,applying various learning techniques to solve mixed-integer linear programming(MILP)problems has emerged as a burgeoning research domain.Apart from constructing end-to-end models directly,integrating learning approaches with some modules in the traditional methods for solving MILPs is also a promising direction.The cutting plane method is one of the fundamental algorithms used in modern MILP solvers,and the selection of appropriate cuts from the candidate cuts subset is crucial for enhancing efficiency.Due to the reliance on expert knowledge and problem-specific heuristics,classical cut selection methods are not always transferable and often limit the scalability and generalizability of the cutting plane method.To provide a more efficient and generalizable strategy,we propose a reinforcement learning(RL)framework to enhance cut selection in the solving process of MILPs.Firstly,we design feature vectors to incorporate the inherent properties of MILP and computational information from the solver and represent MILP instances as bipartite graphs.Secondly,we choose the weighted metrics to approximate the proximity of feasible solutions to the convex hull and utilize the learning method to determine the weights assigned to each metric.Thirdly,a graph convolutional neural network is adopted with a self-attention mechanism to predict the value of weighting factors.Finally,we transform the cut selection process into a Markov decision process and utilize RL method to train the model.Extensive experiments are conducted based on a leading open-source MILP solver SCIP.Results on both general and specific datasets validate the effectiveness and efficiency of our proposed approach.
文摘In this paper,we investigate a backhaul framework jointly considering topology construction and power adjustment for self-organizing UAV networks.To enhance the backhaul rate with limited information exchange and avoid malicious power competition,we propose a deep reinforcement learning(DRL)based method to construct the backhaul framework where each UAV distributedly makes decisions.First,we decompose the backhaul framework into three submodules,i.e.,transmission target selection(TS),total power control(PC),and multi-channel power allocation(PA).Then,the three submodules are solved by heterogeneous DRL algorithms with tailored rewards to regulate UAVs’behaviors.In particular,TS is solved by deep-Q learning to construct topology with less relay and guarantee the backhaul rate.PC and PA are solved by deep deterministic policy gradient to match the traffic requirement with proper finegrained transmission power.As a result,the malicious power competition is alleviated,and the backhaul rate is further enhanced.Simulation results show that the proposed framework effectively achieves system-level and all-around performance gain compared with DQL and max-min method,i.e.,higher backhaul rate,lower transmission power,and fewer hop.
基金This work was supported by the Soft Science Research Program of Guangdong Province under Grant 2020A1010020013the National Defense Innovation Special Zone of Science and Technology Project under Grant 18-163-00-TS-006-038-01the National Natural Science Foundation of China under Grant 61673240.
文摘With the increasing demand for the automation of operations and processes in mechatronic systems,fault detection and diagnosis has become a major topic to guarantee the process performance.There exist numerous studies on the topic of applying artificial intelligence methods for fault detection and diagnosis.However,much of the focus has been given on the detection of faults.In terms of the diagnosis of faults,on one hand,assumptions are required,which restricts the diagnosis range.On the other hand,different faults with similar symptoms cannot be distinguished,especially when the model is not trained by plenty of data.In this work,we proposed a reinforcement learning system for fault detection and diagnosis.No assumption is required.Feature exaction is first made.Then with the features as the states of the environment,the agent directly interacts with the environment.Optimal policy,which determines the exact category,size and location of the fault,is obtained by updating Q values.The method takes advantage of expert knowledge.When the features are unclear,action will be made to get more information from the new state for further determination.We create recurrent neural network with the long short-term memory architecture to approximate Q values.The application on a motor is discussed.The experimental results validate that the proposed method demonstrates a significant improvement compared with existing state-of-the-art methods of fault detection and diagnosis.
文摘Material selection has become a critical part of design for engineers,due to availability of diverse choice of materials that have similar properties and meet the product design specification.Implementation of statistical analysis alone makes it difficult to identify the ideal composition of the final composite.An integrated approach between statistical model and micromechanical model is desired.In this paper,resultant natural fibre and polymer matrix from previous study is used to estimate the mechanical properties such as density,Young’s modulus and tensile strength.Four levels of fibre loading are used to compare the optimum natural fibre reinforced polymer composite(NFRPC).The result from this analytical approach revealed that kenaf/polystyrene(PS)with 40%fibre loading is the ideal composite in automotive component application.It was found that the ideal composite score is 1.156 g/cm^(3),24.2 GPa and 413.4 MPa for density,Young’s modulus and tensile strength,respectively.A suggestion to increase the properties on Young’s modulus are also presented.This work proves that the statistical model is well incorporated with the analytical approach to choose the correct composite to use in automotive application.
文摘近年来,工控网络发展势头迅猛.其数字化、智能化、自动化的优势为工业带来巨大效益的同时,也面临着愈发复杂多变的攻击威胁.在数据要素安全的背景下,及时发现和应对工控网络威胁成为一项迫切需要得到解决的任务.通过对工控网络中的数据流进行连续监测和分析,工控网络威胁检测问题可以转化为时间序列异常检测问题.然而现有时间序列异常检测方法受限于工控网络数据集的质量,且往往仅对单一类型异常敏感而忽略其他异常.针对上述问题,提出了一种基于深度强化学习和数据增强的工控网络威胁检测方法(deep reinforcement learning and data augmentation based threat detection method in industrial control networks,DELTA).该方法提出了一种新的时序数据集数据增强选择方法,可以针对不同的基准模型选择合适的数据增强操作集以提升工控网络时间序列数据集的质量;同时使用深度强化学习算法(A2C/PPO)在不同时间点从基线模型中动态选取候选模型,以利用多种类型的异常检测模型解决单一类型异常敏感问题.与现有时间序列异常检测模型对比的实验结果表明,在付出可接受的额外时间消耗成本下,DELTA在准确率和F1值上比所有基线模型有明显的提升,验证了方法的有效性与实用性.