In the IoT(Internet of Things)domain,the increased use of encryption protocols such as SSL/TLS,VPN(Virtual Private Network),and Tor has led to a rise in attacks leveraging encrypted traffic.While research on anomaly d...In the IoT(Internet of Things)domain,the increased use of encryption protocols such as SSL/TLS,VPN(Virtual Private Network),and Tor has led to a rise in attacks leveraging encrypted traffic.While research on anomaly detection using AI(Artificial Intelligence)is actively progressing,the encrypted nature of the data poses challenges for labeling,resulting in data imbalance and biased feature extraction toward specific nodes.This study proposes a reconstruction error-based anomaly detection method using an autoencoder(AE)that utilizes packet metadata excluding specific node information.The proposed method omits biased packet metadata such as IP and Port and trains the detection model using only normal data,leveraging a small amount of packet metadata.This makes it well-suited for direct application in IoT environments due to its low resource consumption.In experiments comparing feature extraction methods for AE-based anomaly detection,we found that using flowbased features significantly improves accuracy,precision,F1 score,and AUC(Area Under the Receiver Operating Characteristic Curve)score compared to packet-based features.Additionally,for flow-based features,the proposed method showed a 30.17%increase in F1 score and improved false positive rates compared to Isolation Forest and OneClassSVM.Furthermore,the proposedmethod demonstrated a 32.43%higherAUCwhen using packet features and a 111.39%higher AUC when using flow features,compared to previously proposed oversampling methods.This study highlights the impact of feature extraction methods on attack detection in imbalanced,encrypted traffic environments and emphasizes that the one-class method using AE is more effective for attack detection and reducing false positives compared to traditional oversampling methods.展开更多
Encrypted traffic plays a crucial role in safeguarding network security and user privacy.However,encrypting malicious traffic can lead to numerous security issues,making the effective classification of encrypted traff...Encrypted traffic plays a crucial role in safeguarding network security and user privacy.However,encrypting malicious traffic can lead to numerous security issues,making the effective classification of encrypted traffic essential.Existing methods for detecting encrypted traffic face two significant challenges.First,relying solely on the original byte information for classification fails to leverage the rich temporal relationships within network traffic.Second,machine learning and convolutional neural network methods lack sufficient network expression capabilities,hindering the full exploration of traffic’s potential characteristics.To address these limitations,this study introduces a traffic classification method that utilizes time relationships and a higher-order graph neural network,termed HGNN-ETC.This approach fully exploits the original byte information and chronological relationships of traffic packets,transforming traffic data into a graph structure to provide the model with more comprehensive context information.HGNN-ETC employs an innovative k-dimensional graph neural network to effectively capture the multi-scale structural features of traffic graphs,enabling more accurate classification.We select the ISCXVPN and the USTC-TK2016 dataset for our experiments.The results show that compared with other state-of-the-art methods,our method can obtain a better classification effect on different datasets,and the accuracy rate is about 97.00%.In addition,by analyzing the impact of varying input specifications on classification performance,we determine the optimal network data truncation strategy and confirm the model’s excellent generalization ability on different datasets.展开更多
Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Altho...Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively.展开更多
The rapidly increasing popularity of mobile devices has changed the methods with which people access various network services and increased net-work traffic markedly.Over the past few decades,network traffic identific...The rapidly increasing popularity of mobile devices has changed the methods with which people access various network services and increased net-work traffic markedly.Over the past few decades,network traffic identification has been a research hotspot in the field of network management and security mon-itoring.However,as more network services use encryption technology,network traffic identification faces many challenges.Although classic machine learning methods can solve many problems that cannot be solved by port-and payload-based methods,manually extract features that are frequently updated is time-consuming and labor-intensive.Deep learning has good automatic feature learning capabilities and is an ideal method for network traffic identification,particularly encrypted traffic identification;Existing recognition methods based on deep learning primarily use supervised learning methods and rely on many labeled samples.However,in real scenarios,labeled samples are often difficult to obtain.This paper adjusts the structure of the auxiliary classification generation adversarial network(ACGAN)so that it can use unlabeled samples for training,and use the wasserstein distance instead of the original cross entropy as the loss function to achieve semisupervised learning.Experimental results show that the identification accuracy of ISCX and USTC data sets using the proposed method yields markedly better performance when the number of labeled samples is small compared to that of convolutional neural network(CNN)based classifier.展开更多
Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypt...Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems,in this pa⁃per,we propose a method named Payload Encoding Representation from Transformer(PERT)to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique.By implementing traffic classification experiments on a pub⁃lic encrypted traffic data set and our captured Android HTTPS traffic,we prove the pro⁃posed method can achieve an obvious better effectiveness than other compared baselines.To the best of our knowledge,this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.展开更多
While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning me...While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.展开更多
Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out se...Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out separately by existing systems using separate models,significantly adding to the difficulty of network administration.Convolutional Neural Network(CNN)and Transformer are deep learning-based approaches for network traf-fic classification.CNN is good at extracting local features while ignoring long-distance information from the network traffic sequence,and Transformer can capture long-distance feature dependencies while ignoring local details.Based on these characteristics,a multi-task learning model that combines Transformer and 1D-CNN for encrypted traffic classification is proposed(MTC).In order to make up for the Transformer’s lack of local detail feature extraction capability and the 1D-CNN’s shortcoming of ignoring long-distance correlation information when processing traffic sequences,the model uses a parallel structure to fuse the features generated by the Transformer block and the 1D-CNN block with each other using a feature fusion block.This structure improved the representation of traffic features by both blocks and allows the model to perform well with both long and short length sequences.The model simultaneously handles multiple tasks,which lowers the cost of training.Experiments reveal that on the ISCX VPN-nonVPN dataset,the model achieves an average F1 score of 98.25%and an average recall of 98.30%for the task of identifying applications,and an average F1 score of 97.94%,and an average recall of 97.54%for the task of traffic characterization.When advanced models on the same dataset are chosen for comparison,the model produces the best results.To prove the generalization,we applied MTC to CICIDS2017 dataset,and our model also achieved good results.展开更多
Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network(CNN),recurrent neural network(RNN),and stacked autoencoder(SAE...Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network(CNN),recurrent neural network(RNN),and stacked autoencoder(SAE),and only construct a shallow network to extract features,which leads to the low accuracy of encrypted traffic classification,an encrypted traffic classification framework based on the fusion of vision transformer and temporal features was proposed.Bottleneck transformer network(BoTNet)was used to extract spatial features and bi-directional long short-term memory(BiLSTM)was used to extract temporal features.After the two sub-networks are parallelized,the feature fusion method of early fusion was used in the framework to perform feature fusion.Finally,the encrypted traffic was identified through the fused features.The experimental results show that the BiLSTM and BoTNet fusion transformer(BTFT)model can enhance the performance of encrypted traffic classification by fusing multi-dimensional features.The accuracy rate of a virtual private network(VPN)and non-VPN binary classification is 99.9%,and the accuracy rate of fine-grained encrypted traffic twelve-classification can also reach 97%.展开更多
VPNs are vital for safeguarding communication routes in the continually changing cybersecurity world.However,increasing network attack complexity and variety require increasingly advanced algorithms to recognize and c...VPNs are vital for safeguarding communication routes in the continually changing cybersecurity world.However,increasing network attack complexity and variety require increasingly advanced algorithms to recognize and categorizeVPNnetwork data.We present a novelVPNnetwork traffic flowclassificationmethod utilizing Artificial Neural Networks(ANN).This paper aims to provide a reliable system that can identify a virtual private network(VPN)traffic fromintrusion attempts,data exfiltration,and denial-of-service assaults.We compile a broad dataset of labeled VPN traffic flows from various apps and usage patterns.Next,we create an ANN architecture that can handle encrypted communication and distinguish benign from dangerous actions.To effectively process and categorize encrypted packets,the neural network model has input,hidden,and output layers.We use advanced feature extraction approaches to improve the ANN’s classification accuracy by leveraging network traffic’s statistical and behavioral properties.We also use cutting-edge optimizationmethods to optimize network characteristics and performance.The suggested ANN-based categorization method is extensively tested and analyzed.Results show the model effectively classifies VPN traffic types.We also show that our ANN-based technique outperforms other approaches in precision,recall,and F1-score with 98.79%accuracy.This study improves VPN security and protects against new cyberthreats.Classifying VPNtraffic flows effectively helps enterprises protect sensitive data,maintain network integrity,and respond quickly to security problems.This study advances network security and lays the groundwork for ANN-based cybersecurity solutions.展开更多
With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged...With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.展开更多
In this paper, we to detect encrypted botnet propose a novel method traffic. During the traffic preprocessing stage, the proposed payload extraction method can identify a large amount of encrypted applications traffic...In this paper, we to detect encrypted botnet propose a novel method traffic. During the traffic preprocessing stage, the proposed payload extraction method can identify a large amount of encrypted applications traffic. It can filter out a large amount of non-malicious traffic, greatly in, roving the detection efficiency. A Sequential Probability Ratio Test (SPRT)-based method can find spatialtemporal correlations in suspicious botnet traffic and make an accurate judgment. Experimental resuks show that the false positive and false nega- tive rates can be controlled within a certain range.展开更多
As an essential function of encrypted Internet traffic analysis,encrypted traffic service classification can support both coarse-grained network service traffic management and security supervision.However,the traditio...As an essential function of encrypted Internet traffic analysis,encrypted traffic service classification can support both coarse-grained network service traffic management and security supervision.However,the traditional plaintext-based Deep Packet Inspection(DPI)method cannot be applied to such a classification.Moreover,machine learning-based existing methods encounter two problems during feature selection:complex feature overcost processing and Transport Layer Security(TLS)version discrepancy.In this paper,we consider differences between encryption network protocol stacks and propose a composite deep learning-based method in multiprotocol environments using a sliding multiple Protocol Data Unit(multiPDU)length sequence as features by fully utilizing the Markov property in a multiPDU length sequence and maintaining suitability with a TLS-1.3 environment.Control experiments show that both Length-Sensitive(LS)composite deep learning model using a capsule neural network and LS-long short time memory achieve satisfactory effectiveness in F1-score and performance.Owing to faster feature extraction,our method is suitable for actual network environments and superior to state-of-the-art methods.展开更多
The dark web is a shadow area hidden in the depths of the Internet,which is difficult to access through common search engines.Because of its anonymity,the dark web has gradually become a hotbed for a variety of cyber-...The dark web is a shadow area hidden in the depths of the Internet,which is difficult to access through common search engines.Because of its anonymity,the dark web has gradually become a hotbed for a variety of cyber-crimes.Although some research based on machine learning or deep learning has been shown to be effective in the task of analyzing dark web traffic in recent years,there are still pain points such as low accuracy,insufficient real-time performance,and limited application scenarios.Aiming at the difficulties faced by the existing automated dark web traffic analysis methods,a novel method named Dark-Forest to analyze the behavior of dark web traffic is proposed.In this method,firstly,particle swarm optimization algorithm is used to filter the redundant features of dark web traffic data,which can effectively shorten the training and inference time of the model to meet the realtime requirements of dark web detection task.Then,the selected features of traffic are analyzed and classified using the DeepForest model as a backbone classifier.The comparison experiment with the current mainstream methods shows that Dark-Forest takes into account the advantages of statistical machine learning and deep learning,and achieves an accuracy rate of 87.84%.This method not only outperforms baseline methods such as Random Forest,MLP,CNN,and the original DeepForest in both large-scale and small-scale dataset based learning tasks,but also can detect normal network traffic,tunnel network traffic and anonymous network traffic,which may close the gap between different network traffic analysis tasks.Thus,it has a wider application scenario and higher practical value.展开更多
Recently,website fingerprinting(WF)attacks that eavesdrop on the web browsing activity of users by analyzing the observed traffic can endanger the data security of users even if the users have deployed encrypted proxi...Recently,website fingerprinting(WF)attacks that eavesdrop on the web browsing activity of users by analyzing the observed traffic can endanger the data security of users even if the users have deployed encrypted proxies such as Tor.Several WF defenses have been raised to counter passive WF attacks.However,the existing defense methods have several significant drawbacks in terms of effectiveness and overhead,which means that these defenses rarely apply in the real world.The performance of the existing methods greatly depends on the number of dummy packets added,which increases overheads and hampers the user experience of web browsing activity.Inspired by the feature extraction of current WF attacks with deep learning networks,in this paper,we propose TED,a lightweight WF defense method that effectively decreases the accuracy of current WF attacks.We apply the idea of adversary examples,aiming to effectively disturb the accuracy of WF attacks with deep learning networks and precisely insert a few dummy packets.The defense extracts the key features of similar websites through a feature extraction network with adapted Grad-CAM and applies the features to interfere with the WF attacks.The key features of traces are utilized to generate defense fractions that are inserted into the targeted trace to deceive WF classifiers.The experiments are carried out on public datasets from DF.Compared with several WF defenses,the experiments show that TED can efficiently reduce the effectiveness of WF attacks with minimal expenditure,reducing the accuracy by nearly 40%with less than 30%overhead.展开更多
Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In...Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In this paper, we propose an accurate and extensible traffic classifier. Specifically, to address the discriminator bias issue, our classifier is built by making an optimal cascade of binary sub-classifiers, where each binary sub-classifier is trained independently with the discriminators used for identifying application specific traffic. Moreover, to balance a training dataset,we apply SMOTE algorithm in generating artificial training samples for minority classes.We evaluate our classifier on two datasets collected from different network border routers.Compared with the previous multi-class traffic classifiers built in one-time training process,our classifier achieves much higher F-Measure and AUC for each application.展开更多
In the early days of IoT’s introduction, it was challenging to introduce encryption communication due to the lackof performance of each component, such as computing resources like CPUs and batteries, to encrypt and d...In the early days of IoT’s introduction, it was challenging to introduce encryption communication due to the lackof performance of each component, such as computing resources like CPUs and batteries, to encrypt and decryptdata. Because IoT is applied and utilized in many important fields, a cyberattack on IoT can result in astronomicalfinancial and human casualties. For this reason, the application of encrypted communication to IoT has beenrequired, and the application of encrypted communication to IoT has become possible due to improvements inthe computing performance of IoT devices and the development of lightweight cryptography. The applicationof encrypted communication in IoT has made it possible to use encrypted communication channels to launchcyberattacks. The approach of extracting evidence of an attack based on the primary information of a networkpacket is no longer valid because critical information, such as the payload in a network packet, is encrypted byencrypted communication. For this reason, technology that can detect cyberattacks over encrypted network trafficoccurring in IoT environments is required. Therefore, this research proposes an encrypted cyberattack detectionsystem for the IoT (ECDS-IoT) that derives valid features for cyberattack detection from the cryptographic networktraffic generated in the IoT environment and performs cyberattack detection based on the derived features. ECDS-IoT identifies identifiable information from encrypted traffic collected in IoT environments and extracts statistics-based features through statistical analysis of identifiable information. ECDS-IoT understands information aboutnormal data by learning only statistical features extracted from normal data. ECDS-IoT detects cyberattacks basedonly on the normal data information it has trained. To evaluate the cyberattack detection performance of theproposed ECDS-IoT in this research, ECDS-IoT used CICIoT2023, a dataset containing encrypted traffic generatedby normal and seven categories of cyberattacks in the IoT environment and experimented with cyberattackdetection on encrypted traffic using Autoencoder, RNN, GRU, LSTM, BiLSTM, and AE-LSTM algorithms. Asa result of evaluating the performance of cyberattack detection for encrypted traffic, ECDS-IoT achieved highperformance such as accuracy 0.99739, precision 0.99154, recall 1.0, F1 score 0.99575, and ROC_AUC 0.99822when using the AE-LSTM algorithm. As shown by the cyberattack detection results of ECDS-IoT, it is possibleto detect most cyberattacks through encrypted traffic. By applying ECDS-IoT to IoT, it can effectively detectcyberattacks concealed in encrypted traffic, promoting the efficient operation of IoT and preventing financial andhuman damage caused by cyberattacks.展开更多
The traffic encryption brings new challenges to the identification of unknown encrypted traffc.Currently,machine learning is the most commonly used encrypted traffic recognization technology,but this method relies on ...The traffic encryption brings new challenges to the identification of unknown encrypted traffc.Currently,machine learning is the most commonly used encrypted traffic recognization technology,but this method relies on expensive prior label information.Therefore,we propose a subspace clustering via graph auto-encoder network(SCGAE)to recognize unknown applications without prior label information.The SCGAE adopts a graph encoder-decoder structure,which can comprehensively utilize the feature and structure information to extract discriminative embedding representation.Additionally,the self-supervised module is introduced,which use the clustering labels acts as a supervisor to guide the learning of the graph encoder-decoder module.Finally,we obtain the self-expression coefficient matrix through the self-expression module and map it to the subspace for clustering.The results show that SCGAE has better performance than all benchmark models in unknown encrypted traffic recognization.展开更多
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509,Development of Security Monitoring Technology Based Network Behavior against Encrypted Cyber Threats in ICT Convergence Environment).
文摘In the IoT(Internet of Things)domain,the increased use of encryption protocols such as SSL/TLS,VPN(Virtual Private Network),and Tor has led to a rise in attacks leveraging encrypted traffic.While research on anomaly detection using AI(Artificial Intelligence)is actively progressing,the encrypted nature of the data poses challenges for labeling,resulting in data imbalance and biased feature extraction toward specific nodes.This study proposes a reconstruction error-based anomaly detection method using an autoencoder(AE)that utilizes packet metadata excluding specific node information.The proposed method omits biased packet metadata such as IP and Port and trains the detection model using only normal data,leveraging a small amount of packet metadata.This makes it well-suited for direct application in IoT environments due to its low resource consumption.In experiments comparing feature extraction methods for AE-based anomaly detection,we found that using flowbased features significantly improves accuracy,precision,F1 score,and AUC(Area Under the Receiver Operating Characteristic Curve)score compared to packet-based features.Additionally,for flow-based features,the proposed method showed a 30.17%increase in F1 score and improved false positive rates compared to Isolation Forest and OneClassSVM.Furthermore,the proposedmethod demonstrated a 32.43%higherAUCwhen using packet features and a 111.39%higher AUC when using flow features,compared to previously proposed oversampling methods.This study highlights the impact of feature extraction methods on attack detection in imbalanced,encrypted traffic environments and emphasizes that the one-class method using AE is more effective for attack detection and reducing false positives compared to traditional oversampling methods.
基金supported in part by the National Key Research and Development Program of China(No.2022YFB4500800)the National Science Foundation of China(No.42071431).
文摘Encrypted traffic plays a crucial role in safeguarding network security and user privacy.However,encrypting malicious traffic can lead to numerous security issues,making the effective classification of encrypted traffic essential.Existing methods for detecting encrypted traffic face two significant challenges.First,relying solely on the original byte information for classification fails to leverage the rich temporal relationships within network traffic.Second,machine learning and convolutional neural network methods lack sufficient network expression capabilities,hindering the full exploration of traffic’s potential characteristics.To address these limitations,this study introduces a traffic classification method that utilizes time relationships and a higher-order graph neural network,termed HGNN-ETC.This approach fully exploits the original byte information and chronological relationships of traffic packets,transforming traffic data into a graph structure to provide the model with more comprehensive context information.HGNN-ETC employs an innovative k-dimensional graph neural network to effectively capture the multi-scale structural features of traffic graphs,enabling more accurate classification.We select the ISCXVPN and the USTC-TK2016 dataset for our experiments.The results show that compared with other state-of-the-art methods,our method can obtain a better classification effect on different datasets,and the accuracy rate is about 97.00%.In addition,by analyzing the impact of varying input specifications on classification performance,we determine the optimal network data truncation strategy and confirm the model’s excellent generalization ability on different datasets.
基金supported by the National Natural Science Foundation of China (Grants Nos.61931004,62072250)the Talent Launch Fund of Nanjing University of Information Science and Technology (2020r061).
文摘Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively.
基金This work is supported by the Science and Technology Project of State Grid Jiangsu Electric Power Co.,Ltd.under Grant No.J2020068.
文摘The rapidly increasing popularity of mobile devices has changed the methods with which people access various network services and increased net-work traffic markedly.Over the past few decades,network traffic identification has been a research hotspot in the field of network management and security mon-itoring.However,as more network services use encryption technology,network traffic identification faces many challenges.Although classic machine learning methods can solve many problems that cannot be solved by port-and payload-based methods,manually extract features that are frequently updated is time-consuming and labor-intensive.Deep learning has good automatic feature learning capabilities and is an ideal method for network traffic identification,particularly encrypted traffic identification;Existing recognition methods based on deep learning primarily use supervised learning methods and rely on many labeled samples.However,in real scenarios,labeled samples are often difficult to obtain.This paper adjusts the structure of the auxiliary classification generation adversarial network(ACGAN)so that it can use unlabeled samples for training,and use the wasserstein distance instead of the original cross entropy as the loss function to achieve semisupervised learning.Experimental results show that the identification accuracy of ISCX and USTC data sets using the proposed method yields markedly better performance when the number of labeled samples is small compared to that of convolutional neural network(CNN)based classifier.
文摘Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems,in this pa⁃per,we propose a method named Payload Encoding Representation from Transformer(PERT)to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique.By implementing traffic classification experiments on a pub⁃lic encrypted traffic data set and our captured Android HTTPS traffic,we prove the pro⁃posed method can achieve an obvious better effectiveness than other compared baselines.To the best of our knowledge,this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.
基金This research was funded by National Natural Science Foundation of China under Grant No.61806171Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15+2 种基金Open Fund Project of Key Laboratory for Non-Destructive Testing and Engineering Computer of Sichuan Province Universities on Bridge Inspection and Engineering under Grant No.2022QYJ06Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.
基金supported by the People’s Public Security University of China central basic scientific research business program(No.2021JKF206).
文摘Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out separately by existing systems using separate models,significantly adding to the difficulty of network administration.Convolutional Neural Network(CNN)and Transformer are deep learning-based approaches for network traf-fic classification.CNN is good at extracting local features while ignoring long-distance information from the network traffic sequence,and Transformer can capture long-distance feature dependencies while ignoring local details.Based on these characteristics,a multi-task learning model that combines Transformer and 1D-CNN for encrypted traffic classification is proposed(MTC).In order to make up for the Transformer’s lack of local detail feature extraction capability and the 1D-CNN’s shortcoming of ignoring long-distance correlation information when processing traffic sequences,the model uses a parallel structure to fuse the features generated by the Transformer block and the 1D-CNN block with each other using a feature fusion block.This structure improved the representation of traffic features by both blocks and allows the model to perform well with both long and short length sequences.The model simultaneously handles multiple tasks,which lowers the cost of training.Experiments reveal that on the ISCX VPN-nonVPN dataset,the model achieves an average F1 score of 98.25%and an average recall of 98.30%for the task of identifying applications,and an average F1 score of 97.94%,and an average recall of 97.54%for the task of traffic characterization.When advanced models on the same dataset are chosen for comparison,the model produces the best results.To prove the generalization,we applied MTC to CICIDS2017 dataset,and our model also achieved good results.
基金supported by the Science and Technology Project of the Headquarters of State Grid Corporation of China(5700-202152186A-0-0-00)。
文摘Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network(CNN),recurrent neural network(RNN),and stacked autoencoder(SAE),and only construct a shallow network to extract features,which leads to the low accuracy of encrypted traffic classification,an encrypted traffic classification framework based on the fusion of vision transformer and temporal features was proposed.Bottleneck transformer network(BoTNet)was used to extract spatial features and bi-directional long short-term memory(BiLSTM)was used to extract temporal features.After the two sub-networks are parallelized,the feature fusion method of early fusion was used in the framework to perform feature fusion.Finally,the encrypted traffic was identified through the fused features.The experimental results show that the BiLSTM and BoTNet fusion transformer(BTFT)model can enhance the performance of encrypted traffic classification by fusing multi-dimensional features.The accuracy rate of a virtual private network(VPN)and non-VPN binary classification is 99.9%,and the accuracy rate of fine-grained encrypted traffic twelve-classification can also reach 97%.
文摘VPNs are vital for safeguarding communication routes in the continually changing cybersecurity world.However,increasing network attack complexity and variety require increasingly advanced algorithms to recognize and categorizeVPNnetwork data.We present a novelVPNnetwork traffic flowclassificationmethod utilizing Artificial Neural Networks(ANN).This paper aims to provide a reliable system that can identify a virtual private network(VPN)traffic fromintrusion attempts,data exfiltration,and denial-of-service assaults.We compile a broad dataset of labeled VPN traffic flows from various apps and usage patterns.Next,we create an ANN architecture that can handle encrypted communication and distinguish benign from dangerous actions.To effectively process and categorize encrypted packets,the neural network model has input,hidden,and output layers.We use advanced feature extraction approaches to improve the ANN’s classification accuracy by leveraging network traffic’s statistical and behavioral properties.We also use cutting-edge optimizationmethods to optimize network characteristics and performance.The suggested ANN-based categorization method is extensively tested and analyzed.Results show the model effectively classifies VPN traffic types.We also show that our ANN-based technique outperforms other approaches in precision,recall,and F1-score with 98.79%accuracy.This study improves VPN security and protects against new cyberthreats.Classifying VPNtraffic flows effectively helps enterprises protect sensitive data,maintain network integrity,and respond quickly to security problems.This study advances network security and lays the groundwork for ANN-based cybersecurity solutions.
基金the National Natural Science Foundation of China Youth Project(62302520).
文摘With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.
基金supported by the National Basic Research Program of China(973 Program)under Grant No.2011CB302903the Priority Academic Program Development of Jiangsu Higher Education Institutions under Grant No.YX002001
文摘In this paper, we to detect encrypted botnet propose a novel method traffic. During the traffic preprocessing stage, the proposed payload extraction method can identify a large amount of encrypted applications traffic. It can filter out a large amount of non-malicious traffic, greatly in, roving the detection efficiency. A Sequential Probability Ratio Test (SPRT)-based method can find spatialtemporal correlations in suspicious botnet traffic and make an accurate judgment. Experimental resuks show that the false positive and false nega- tive rates can be controlled within a certain range.
基金supported by the General Program of the National Natural Science Foundation of China under Grant No.62172093the National Key R&D Program of China under Grant No.2018YFB1800602+1 种基金2019 Industrial Internet Innovation and Development Project,Ministry of Industry and Information Technology(MIIT)under Grant No.6709010003Ministry of Education-China Mobile Research Fund under Grant No.MCM20180506。
文摘As an essential function of encrypted Internet traffic analysis,encrypted traffic service classification can support both coarse-grained network service traffic management and security supervision.However,the traditional plaintext-based Deep Packet Inspection(DPI)method cannot be applied to such a classification.Moreover,machine learning-based existing methods encounter two problems during feature selection:complex feature overcost processing and Transport Layer Security(TLS)version discrepancy.In this paper,we consider differences between encryption network protocol stacks and propose a composite deep learning-based method in multiprotocol environments using a sliding multiple Protocol Data Unit(multiPDU)length sequence as features by fully utilizing the Markov property in a multiPDU length sequence and maintaining suitability with a TLS-1.3 environment.Control experiments show that both Length-Sensitive(LS)composite deep learning model using a capsule neural network and LS-long short time memory achieve satisfactory effectiveness in F1-score and performance.Owing to faster feature extraction,our method is suitable for actual network environments and superior to state-of-the-art methods.
基金funded by Henan Provincial Key R&D and Promotion Special Project(Science and Technology Tackling)(212102210165)National Social Science Foun-dation Key Project(20AZD114)+1 种基金Henan Provincial Higher Education Key Research Project Program(20B520008)Public Security Behavior Scientific Research and Technological Innovation Project of the Chinese People’s Public Security University(2020SYS08).
文摘The dark web is a shadow area hidden in the depths of the Internet,which is difficult to access through common search engines.Because of its anonymity,the dark web has gradually become a hotbed for a variety of cyber-crimes.Although some research based on machine learning or deep learning has been shown to be effective in the task of analyzing dark web traffic in recent years,there are still pain points such as low accuracy,insufficient real-time performance,and limited application scenarios.Aiming at the difficulties faced by the existing automated dark web traffic analysis methods,a novel method named Dark-Forest to analyze the behavior of dark web traffic is proposed.In this method,firstly,particle swarm optimization algorithm is used to filter the redundant features of dark web traffic data,which can effectively shorten the training and inference time of the model to meet the realtime requirements of dark web detection task.Then,the selected features of traffic are analyzed and classified using the DeepForest model as a backbone classifier.The comparison experiment with the current mainstream methods shows that Dark-Forest takes into account the advantages of statistical machine learning and deep learning,and achieves an accuracy rate of 87.84%.This method not only outperforms baseline methods such as Random Forest,MLP,CNN,and the original DeepForest in both large-scale and small-scale dataset based learning tasks,but also can detect normal network traffic,tunnel network traffic and anonymous network traffic,which may close the gap between different network traffic analysis tasks.Thus,it has a wider application scenario and higher practical value.
基金supported by the National Key R&D Program of China under Grant 2020YFB1006101the Beijing Nova Program under Grant Z201100006820006the NSFC Project under Grant 61972039.
文摘Recently,website fingerprinting(WF)attacks that eavesdrop on the web browsing activity of users by analyzing the observed traffic can endanger the data security of users even if the users have deployed encrypted proxies such as Tor.Several WF defenses have been raised to counter passive WF attacks.However,the existing defense methods have several significant drawbacks in terms of effectiveness and overhead,which means that these defenses rarely apply in the real world.The performance of the existing methods greatly depends on the number of dummy packets added,which increases overheads and hampers the user experience of web browsing activity.Inspired by the feature extraction of current WF attacks with deep learning networks,in this paper,we propose TED,a lightweight WF defense method that effectively decreases the accuracy of current WF attacks.We apply the idea of adversary examples,aiming to effectively disturb the accuracy of WF attacks with deep learning networks and precisely insert a few dummy packets.The defense extracts the key features of similar websites through a feature extraction network with adapted Grad-CAM and applies the features to interfere with the WF attacks.The key features of traces are utilized to generate defense fractions that are inserted into the targeted trace to deceive WF classifiers.The experiments are carried out on public datasets from DF.Compared with several WF defenses,the experiments show that TED can efficiently reduce the effectiveness of WF attacks with minimal expenditure,reducing the accuracy by nearly 40%with less than 30%overhead.
基金supported by the National Natural Science Foundation of China under Grant No.61402485National Natural Science Foundation of China under Grant No.61303061supported by the Open fund from HPCL No.201513-01
文摘Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In this paper, we propose an accurate and extensible traffic classifier. Specifically, to address the discriminator bias issue, our classifier is built by making an optimal cascade of binary sub-classifiers, where each binary sub-classifier is trained independently with the discriminators used for identifying application specific traffic. Moreover, to balance a training dataset,we apply SMOTE algorithm in generating artificial training samples for minority classes.We evaluate our classifier on two datasets collected from different network border routers.Compared with the previous multi-class traffic classifiers built in one-time training process,our classifier achieves much higher F-Measure and AUC for each application.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2021-0-00493,5G Massive Next Generation Cyber Attack Deception Technology Development).
文摘In the early days of IoT’s introduction, it was challenging to introduce encryption communication due to the lackof performance of each component, such as computing resources like CPUs and batteries, to encrypt and decryptdata. Because IoT is applied and utilized in many important fields, a cyberattack on IoT can result in astronomicalfinancial and human casualties. For this reason, the application of encrypted communication to IoT has beenrequired, and the application of encrypted communication to IoT has become possible due to improvements inthe computing performance of IoT devices and the development of lightweight cryptography. The applicationof encrypted communication in IoT has made it possible to use encrypted communication channels to launchcyberattacks. The approach of extracting evidence of an attack based on the primary information of a networkpacket is no longer valid because critical information, such as the payload in a network packet, is encrypted byencrypted communication. For this reason, technology that can detect cyberattacks over encrypted network trafficoccurring in IoT environments is required. Therefore, this research proposes an encrypted cyberattack detectionsystem for the IoT (ECDS-IoT) that derives valid features for cyberattack detection from the cryptographic networktraffic generated in the IoT environment and performs cyberattack detection based on the derived features. ECDS-IoT identifies identifiable information from encrypted traffic collected in IoT environments and extracts statistics-based features through statistical analysis of identifiable information. ECDS-IoT understands information aboutnormal data by learning only statistical features extracted from normal data. ECDS-IoT detects cyberattacks basedonly on the normal data information it has trained. To evaluate the cyberattack detection performance of theproposed ECDS-IoT in this research, ECDS-IoT used CICIoT2023, a dataset containing encrypted traffic generatedby normal and seven categories of cyberattacks in the IoT environment and experimented with cyberattackdetection on encrypted traffic using Autoencoder, RNN, GRU, LSTM, BiLSTM, and AE-LSTM algorithms. Asa result of evaluating the performance of cyberattack detection for encrypted traffic, ECDS-IoT achieved highperformance such as accuracy 0.99739, precision 0.99154, recall 1.0, F1 score 0.99575, and ROC_AUC 0.99822when using the AE-LSTM algorithm. As shown by the cyberattack detection results of ECDS-IoT, it is possibleto detect most cyberattacks through encrypted traffic. By applying ECDS-IoT to IoT, it can effectively detectcyberattacks concealed in encrypted traffic, promoting the efficient operation of IoT and preventing financial andhuman damage caused by cyberattacks.
文摘The traffic encryption brings new challenges to the identification of unknown encrypted traffc.Currently,machine learning is the most commonly used encrypted traffic recognization technology,but this method relies on expensive prior label information.Therefore,we propose a subspace clustering via graph auto-encoder network(SCGAE)to recognize unknown applications without prior label information.The SCGAE adopts a graph encoder-decoder structure,which can comprehensively utilize the feature and structure information to extract discriminative embedding representation.Additionally,the self-supervised module is introduced,which use the clustering labels acts as a supervisor to guide the learning of the graph encoder-decoder module.Finally,we obtain the self-expression coefficient matrix through the self-expression module and map it to the subspace for clustering.The results show that SCGAE has better performance than all benchmark models in unknown encrypted traffic recognization.