With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged...With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.展开更多
While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning me...While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.展开更多
Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Altho...Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively.展开更多
Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out se...Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out separately by existing systems using separate models,significantly adding to the difficulty of network administration.Convolutional Neural Network(CNN)and Transformer are deep learning-based approaches for network traf-fic classification.CNN is good at extracting local features while ignoring long-distance information from the network traffic sequence,and Transformer can capture long-distance feature dependencies while ignoring local details.Based on these characteristics,a multi-task learning model that combines Transformer and 1D-CNN for encrypted traffic classification is proposed(MTC).In order to make up for the Transformer’s lack of local detail feature extraction capability and the 1D-CNN’s shortcoming of ignoring long-distance correlation information when processing traffic sequences,the model uses a parallel structure to fuse the features generated by the Transformer block and the 1D-CNN block with each other using a feature fusion block.This structure improved the representation of traffic features by both blocks and allows the model to perform well with both long and short length sequences.The model simultaneously handles multiple tasks,which lowers the cost of training.Experiments reveal that on the ISCX VPN-nonVPN dataset,the model achieves an average F1 score of 98.25%and an average recall of 98.30%for the task of identifying applications,and an average F1 score of 97.94%,and an average recall of 97.54%for the task of traffic characterization.When advanced models on the same dataset are chosen for comparison,the model produces the best results.To prove the generalization,we applied MTC to CICIDS2017 dataset,and our model also achieved good results.展开更多
Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abno...Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abnormal network traffic patterns and traffic classification based on labeled network traffic data are among the most effective approaches for network behavior identification.Traditional methods for network traffic classification utilize algorithms such as Naive Bayes,Decision Tree and XGBoost.However,network traffic classification,which is required for network behavior identification,generally suffers from the problem of low accuracy even with the recently proposed deep learning models.To improve network traffic classification accuracy thus improving network intrusion detection rate,this paper proposes a new network traffic classification model,called ArcMargin,which incorporates metric learning into a convolutional neural network(CNN)to make the CNN model more discriminative.ArcMargin maps network traffic samples from the same category more closely while samples from different categories are mapped as far apart as possible.The metric learning regularization feature is called additive angular margin loss,and it is embedded in the object function of traditional CNN models.The proposed ArcMargin model is validated with three datasets and is compared with several other related algorithms.According to a set of classification indicators,the ArcMargin model is proofed to have better performances in both network traffic classification tasks and open-set tasks.Moreover,in open-set tasks,the ArcMargin model can cluster unknown data classes that do not exist in the previous training dataset.展开更多
Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In...Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In this paper, we propose an accurate and extensible traffic classifier. Specifically, to address the discriminator bias issue, our classifier is built by making an optimal cascade of binary sub-classifiers, where each binary sub-classifier is trained independently with the discriminators used for identifying application specific traffic. Moreover, to balance a training dataset,we apply SMOTE algorithm in generating artificial training samples for minority classes.We evaluate our classifier on two datasets collected from different network border routers.Compared with the previous multi-class traffic classifiers built in one-time training process,our classifier achieves much higher F-Measure and AUC for each application.展开更多
Internet of Things(IoT)defines a network of devices connected to the internet and sharing a massive amount of data between each other and a central location.These IoT devices are connected to a network therefore prone...Internet of Things(IoT)defines a network of devices connected to the internet and sharing a massive amount of data between each other and a central location.These IoT devices are connected to a network therefore prone to attacks.Various management tasks and network operations such as security,intrusion detection,Quality-of-Service provisioning,performance monitoring,resource provisioning,and traffic engineering require traffic classification.Due to the ineffectiveness of traditional classification schemes,such as port-based and payload-based methods,researchers proposed machine learning-based traffic classification systems based on shallow neural networks.Furthermore,machine learning-based models incline to misclassify internet traffic due to improper feature selection.In this research,an efficient multilayer deep learning based classification system is presented to overcome these challenges that can classify internet traffic.To examine the performance of the proposed technique,Moore-dataset is used for training the classifier.The proposed scheme takes the pre-processed data and extracts the flow features using a deep neural network(DNN).In particular,the maximum entropy classifier is used to classify the internet traffic.The experimental results show that the proposed hybrid deep learning algorithm is effective and achieved high accuracy for internet traffic classification,i.e.,99.23%.Furthermore,the proposed algorithm achieved the highest accuracy compared to the support vector machine(SVM)based classification technique and k-nearest neighbours(KNNs)based classification technique.展开更多
The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper...The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper.By analyzing traffic statistical features and network behavior of P2P streaming,a group of flow characteristics were found,which can make P2P streaming more recognizable among other applications.Attributes from Netflow and those proposed by us are compared in terms of classification accuracy,and so are the results of different sampling rates.It is proved that the unified classification model with the proposed attributes can identify P2P streaming quickly and efficiently in the online system.Even with 1:50 sampling rate,the recognition accuracy can be higher than 94%.Moreover,we have evaluated the CPU resources,storage capacity and time consumption before and after the sampling,it is shown that the classification model after the sampling can significantly reduce the resource requirements with the same recognition accuracy.展开更多
The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet,as many employees have transitioned to working from home.Furthermo...The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet,as many employees have transitioned to working from home.Furthermore,with the increase in the adoption of encrypted data transmission by many people who tend to use a Virtual Private Network(VPN)or Tor Browser(dark web)to keep their data privacy and hidden,network traffic encryption is rapidly becoming a universal approach.This affects and complicates the quality of service(QoS),traffic monitoring,and network security provided by Internet Service Providers(ISPs),particularly for analysis and anomaly detection approaches based on the network traffic’s nature.The method of categorizing encrypted traffic is one of the most challenging issues introduced by a VPN as a way to bypass censorship as well as gain access to geo-locked services.Therefore,an efficient approach is especially needed that enables the identification of encrypted network traffic data to extract and select valuable features which improve the quality of service and network management as well as to oversee the overall performance.In this paper,the classification of network traffic data in terms of VPN and non-VPN traffic is studied based on the efficiency of time-based features extracted from network packets.Therefore,this paper suggests two machine learning models that categorize network traffic into encrypted and non-encrypted traffic.The proposed models utilize statistical features(SF),Pearson Correlation(PC),and a Genetic Algorithm(GA),preprocessing the traffic samples into net flow traffic to accomplish the experiment’s objectives.The GA-based method utilizes a stochastic method based on natural genetics and biological evolution to extract essential features.The PC-based method performs well in removing different features of network traffic.With a microsecond perpacket prediction time,the best model achieved an accuracy of more than 95.02 percent in the most demanding traffic classification task,a drop in accuracy of only 2.37 percent in comparison to the entire statistical-based machine learning approach.This is extremely promising for the development of real-time traffic analyzers.展开更多
Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emer...Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.展开更多
Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P ...Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P applications using dynamic port numbers, masquerading techniques, and payload encryption to avoid detection, traditional classification approaches turn to be ineffective. In this paper, we present a layered hybrid system to classify current Internet traffic, motivated by variety of network activities and their requirements of traffic classification. The proposed method could achieve fast and accurate traffic classification with low overheads and robustness to accommodate both known and unknown/encrypted applications. Furthermore, it is feasible to be used in the context of real-time traffic classification. Our experimental results show the distinct advantages of the proposed classifi- cation system, compared with the one-step Machine Learning (ML) approach.展开更多
Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypt...Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems,in this pa⁃per,we propose a method named Payload Encoding Representation from Transformer(PERT)to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique.By implementing traffic classification experiments on a pub⁃lic encrypted traffic data set and our captured Android HTTPS traffic,we prove the pro⁃posed method can achieve an obvious better effectiveness than other compared baselines.To the best of our knowledge,this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.展开更多
With the introduction of 5G technology,the application of Internet of Things(IoT)devices is expanding to various industrial fields.However,introducing a robust,lightweight,low-cost,and low-power security solution to t...With the introduction of 5G technology,the application of Internet of Things(IoT)devices is expanding to various industrial fields.However,introducing a robust,lightweight,low-cost,and low-power security solution to the IoT environment is challenging.Therefore,this study proposes two methods using a data compression technique to detect malicious traffic efficiently and accurately for a secure IoT environment.The first method,compressed sensing and learning(CSL),compresses an event log in a bitmap format to quickly detect attacks.Then,the attack log is detected using a machine-learning classification model.The second method,precise re-learning after CSL(Ra-CSL),comprises a two-step training.It uses CSL as the 1st step analyzer,and the 2nd step analyzer is applied using the original dataset for a log that is detected as an attack in the 1st step analyzer.In the experiment,the bitmap rule was set based on the boundary value,which was 99.6%true positive on average for the attack and benign data found by analyzing the training data.Experimental results showed that the CSL was effective in reducing the training and detection time,and Ra-CSL was effective in increasing the detection rate.According to the experimental results,the data compression technique reduced the memory size by up to 20%and the training and detection times by 67%when compared with the conventional technique.In addition,the proposed technique improves the detection accuracy;the Naive Bayes model with the highest performance showed a detection rate of approximately 99%.展开更多
Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network(CNN),recurrent neural network(RNN),and stacked autoencoder(SAE...Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network(CNN),recurrent neural network(RNN),and stacked autoencoder(SAE),and only construct a shallow network to extract features,which leads to the low accuracy of encrypted traffic classification,an encrypted traffic classification framework based on the fusion of vision transformer and temporal features was proposed.Bottleneck transformer network(BoTNet)was used to extract spatial features and bi-directional long short-term memory(BiLSTM)was used to extract temporal features.After the two sub-networks are parallelized,the feature fusion method of early fusion was used in the framework to perform feature fusion.Finally,the encrypted traffic was identified through the fused features.The experimental results show that the BiLSTM and BoTNet fusion transformer(BTFT)model can enhance the performance of encrypted traffic classification by fusing multi-dimensional features.The accuracy rate of a virtual private network(VPN)and non-VPN binary classification is 99.9%,and the accuracy rate of fine-grained encrypted traffic twelve-classification can also reach 97%.展开更多
Aiming at the hysteretic characteristics of classification problem existed in current intemet traffic identification field, this paper investigates the traffic characteristic suitable for the on-line traffic classific...Aiming at the hysteretic characteristics of classification problem existed in current intemet traffic identification field, this paper investigates the traffic characteristic suitable for the on-line traffic classification, such as quality of service (QoS). By the theoretical analysis and the experimental observation, two characteristics (the ACK-Len ab and ACK-Len ha) were obtained. They are the data volume which first be sent by the communication parties continuously. For these two characteristics only depend on data's total length of the first few packets on the flow, network traffic can be classified in the early time when the flow arrived. The experiment based on decision tree C4.5 algorithm, with above 97% accuracy. The result indicated that the characteristics proposed can commendably reflect behavior patterns of the network application, although they are simple.展开更多
Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification appro...Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifters and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is crested and tested and the empirical results prove its feasibility and effectiveness.展开更多
The continuous emerging of peer-to-peer(P2P) applications enriches resource sharing by networks, but it also brings about many challenges to network management. Therefore, P2 P applications monitoring, in particular,P...The continuous emerging of peer-to-peer(P2P) applications enriches resource sharing by networks, but it also brings about many challenges to network management. Therefore, P2 P applications monitoring, in particular,P2 P traffic classification, is becoming increasingly important. In this paper, we propose a novel approach for accurate P2 P traffic classification at a fine-grained level. Our approach relies only on counting some special flows that are appearing frequently and steadily in the traffic generated by specific P2 P applications. In contrast to existing methods, the main contribution of our approach can be summarized as the following two aspects. Firstly, it can achieve a high classification accuracy by exploiting only several generic properties of flows rather than complicated features and sophisticated techniques. Secondly, it can work well even if the classification target is running with other high bandwidth-consuming applications, outperforming most existing host-based approaches, which are incapable of dealing with this situation. We evaluated the performance of our approach on a real-world trace. Experimental results show that P2 P applications can be classified with a true positive rate higher than 97.22% and a false positive rate lower than 2.78%.展开更多
P2P traffic has always been a dominant portion of Internet traffic since its emergence in the late 1990s. The method used to accurately classify P2P traffic remains a key problem for Internet Service Producers (ISPs...P2P traffic has always been a dominant portion of Internet traffic since its emergence in the late 1990s. The method used to accurately classify P2P traffic remains a key problem for Internet Service Producers (ISPs) and network managers. This paper proposes a novel approach to the accurate classification of P2P traffic at a fine-grained level, which depends solely on the number of special flows during small time intervals. These special flows, named Clustering Flows (CFs), are de- fined as the most frequent and steady flows generated by P2P applications. Hence we are able to classify P2P applications by detecting tlle appearance of corresponding CFs. Com- pared to existing approaches, our classifier can realise high classification accuracy by ex- ploiting only several generic properties of flows, instead of extracting sophisticated fea- tures from host behaviours or transport layer data. We validate our framework on a large set of P2P traffic traces using a Support Vector Machine (SVM). Experimental results show that our approach correctly classifies P2P ap- plications with an average true positive rate of above 98% and a negligible false positive rate of about 0.01%.展开更多
Network traffic classification,which matches network traffic for a specific class of different granularities,plays a vital role in the domain of network administration and cyber security.With the rapid development of ...Network traffic classification,which matches network traffic for a specific class of different granularities,plays a vital role in the domain of network administration and cyber security.With the rapid development of network communication techniques,more and more network applications adopt encryption techniques during communication,which brings significant challenges to traditional network traffic classification methods.On the one hand,traditional methods mainly depend on matching features on the application layer of the ISO/OSI reference model,which leads to the failure of classifying encrypted traffic.On the other hand,machine learning-based methods require human-made features from network traffic data by human experts,which renders it difficult for them to deal with complex network protocols.In this paper,the convolution attention network(CAT)is proposed to overcom those difficulties.As an end-to-end model,CAT takes raw data as input and returns classification results automatically,with engineering by human experts.In CAT,firstly,the importance of different bytes with an attention mechanism of network traffic is achieved.Then,convolution neural network(CNN)is used to learn features automatically and feed the output into a softmax function to get classification results.It enables CAT to learn enough information from network traffic data and ensure the classified accuracy.Extensive experiments on the public encrypted network traffic dataset ISCX2016 demonstrate the effectiveness of the proposed model.展开更多
Network traffic classification aims at identifying the application types of network packets. It is important for Internet service providers (ISPs) to manage bandwidth resources and ensure the quality of service for ...Network traffic classification aims at identifying the application types of network packets. It is important for Internet service providers (ISPs) to manage bandwidth resources and ensure the quality of service for different network applications However, most classification techniques using machine learning only focus on high flow accuracy and ignore byte accuracy. The classifier would obtain low classification performance for elephant flows as the imbalance between elephant flows and mice flows on Internet. The elephant flows, however, consume much more bandwidth than mice flows. When the classifier is deployed for traffic policing, the network management system cannot penalize elephant flows and avoid network congestion effectively. This article explores the factors related to low byte accuracy, and secondly, it presents a new traffic classification method to improve byte accuracy at the aid of data cleaning. Experiments are carried out on three groups of real-world traffic datasets, and the method is compared with existing work on the performance of improving byte accuracy. Experiment shows that byte accuracy increased by about 22.31% on average. The method outperforms the existing one in most cases.展开更多
基金the National Natural Science Foundation of China Youth Project(62302520).
文摘With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.
基金This research was funded by National Natural Science Foundation of China under Grant No.61806171Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15+2 种基金Open Fund Project of Key Laboratory for Non-Destructive Testing and Engineering Computer of Sichuan Province Universities on Bridge Inspection and Engineering under Grant No.2022QYJ06Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘While encryption technology safeguards the security of network communications,malicious traffic also uses encryption protocols to obscure its malicious behavior.To address the issues of traditional machine learning methods relying on expert experience and the insufficient representation capabilities of existing deep learning methods for encrypted malicious traffic,we propose an encrypted malicious traffic classification method that integrates global semantic features with local spatiotemporal features,called BERT-based Spatio-Temporal Features Network(BSTFNet).At the packet-level granularity,the model captures the global semantic features of packets through the attention mechanism of the Bidirectional Encoder Representations from Transformers(BERT)model.At the byte-level granularity,we initially employ the Bidirectional Gated Recurrent Unit(BiGRU)model to extract temporal features from bytes,followed by the utilization of the Text Convolutional Neural Network(TextCNN)model with multi-sized convolution kernels to extract local multi-receptive field spatial features.The fusion of features from both granularities serves as the ultimate multidimensional representation of malicious traffic.Our approach achieves accuracy and F1-score of 99.39%and 99.40%,respectively,on the publicly available USTC-TFC2016 dataset,and effectively reduces sample confusion within the Neris and Virut categories.The experimental results demonstrate that our method has outstanding representation and classification capabilities for encrypted malicious traffic.
基金supported by the National Natural Science Foundation of China (Grants Nos.61931004,62072250)the Talent Launch Fund of Nanjing University of Information Science and Technology (2020r061).
文摘Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively.
基金supported by the People’s Public Security University of China central basic scientific research business program(No.2021JKF206).
文摘Traffic characterization(e.g.,chat,video)and application identifi-cation(e.g.,FTP,Facebook)are two of the more crucial jobs in encrypted network traffic classification.These two activities are typically carried out separately by existing systems using separate models,significantly adding to the difficulty of network administration.Convolutional Neural Network(CNN)and Transformer are deep learning-based approaches for network traf-fic classification.CNN is good at extracting local features while ignoring long-distance information from the network traffic sequence,and Transformer can capture long-distance feature dependencies while ignoring local details.Based on these characteristics,a multi-task learning model that combines Transformer and 1D-CNN for encrypted traffic classification is proposed(MTC).In order to make up for the Transformer’s lack of local detail feature extraction capability and the 1D-CNN’s shortcoming of ignoring long-distance correlation information when processing traffic sequences,the model uses a parallel structure to fuse the features generated by the Transformer block and the 1D-CNN block with each other using a feature fusion block.This structure improved the representation of traffic features by both blocks and allows the model to perform well with both long and short length sequences.The model simultaneously handles multiple tasks,which lowers the cost of training.Experiments reveal that on the ISCX VPN-nonVPN dataset,the model achieves an average F1 score of 98.25%and an average recall of 98.30%for the task of identifying applications,and an average F1 score of 97.94%,and an average recall of 97.54%for the task of traffic characterization.When advanced models on the same dataset are chosen for comparison,the model produces the best results.To prove the generalization,we applied MTC to CICIDS2017 dataset,and our model also achieved good results.
基金This work was supported by the National Natural Science Foundation of China(61871046).
文摘Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abnormal network traffic patterns and traffic classification based on labeled network traffic data are among the most effective approaches for network behavior identification.Traditional methods for network traffic classification utilize algorithms such as Naive Bayes,Decision Tree and XGBoost.However,network traffic classification,which is required for network behavior identification,generally suffers from the problem of low accuracy even with the recently proposed deep learning models.To improve network traffic classification accuracy thus improving network intrusion detection rate,this paper proposes a new network traffic classification model,called ArcMargin,which incorporates metric learning into a convolutional neural network(CNN)to make the CNN model more discriminative.ArcMargin maps network traffic samples from the same category more closely while samples from different categories are mapped as far apart as possible.The metric learning regularization feature is called additive angular margin loss,and it is embedded in the object function of traditional CNN models.The proposed ArcMargin model is validated with three datasets and is compared with several other related algorithms.According to a set of classification indicators,the ArcMargin model is proofed to have better performances in both network traffic classification tasks and open-set tasks.Moreover,in open-set tasks,the ArcMargin model can cluster unknown data classes that do not exist in the previous training dataset.
基金supported by the National Natural Science Foundation of China under Grant No.61402485National Natural Science Foundation of China under Grant No.61303061supported by the Open fund from HPCL No.201513-01
文摘Machine Learning(ML) techniques have been widely applied in recent traffic classification.However, the problems of both discriminator bias and class imbalance decrease the accuracies of ML based traffic classifier. In this paper, we propose an accurate and extensible traffic classifier. Specifically, to address the discriminator bias issue, our classifier is built by making an optimal cascade of binary sub-classifiers, where each binary sub-classifier is trained independently with the discriminators used for identifying application specific traffic. Moreover, to balance a training dataset,we apply SMOTE algorithm in generating artificial training samples for minority classes.We evaluate our classifier on two datasets collected from different network border routers.Compared with the previous multi-class traffic classifiers built in one-time training process,our classifier achieves much higher F-Measure and AUC for each application.
基金This work has supported by the Xiamen University Malaysia Research Fund(XMUMRF)(Grant No:XMUMRF/2019-C3/IECE/0007)。
文摘Internet of Things(IoT)defines a network of devices connected to the internet and sharing a massive amount of data between each other and a central location.These IoT devices are connected to a network therefore prone to attacks.Various management tasks and network operations such as security,intrusion detection,Quality-of-Service provisioning,performance monitoring,resource provisioning,and traffic engineering require traffic classification.Due to the ineffectiveness of traditional classification schemes,such as port-based and payload-based methods,researchers proposed machine learning-based traffic classification systems based on shallow neural networks.Furthermore,machine learning-based models incline to misclassify internet traffic due to improper feature selection.In this research,an efficient multilayer deep learning based classification system is presented to overcome these challenges that can classify internet traffic.To examine the performance of the proposed technique,Moore-dataset is used for training the classifier.The proposed scheme takes the pre-processed data and extracts the flow features using a deep neural network(DNN).In particular,the maximum entropy classifier is used to classify the internet traffic.The experimental results show that the proposed hybrid deep learning algorithm is effective and achieved high accuracy for internet traffic classification,i.e.,99.23%.Furthermore,the proposed algorithm achieved the highest accuracy compared to the support vector machine(SVM)based classification technique and k-nearest neighbours(KNNs)based classification technique.
基金supported by State Key Program of National Natural Science Foundation of China under Grant No.61072061111 Project of China under Grant No.B08004the Fundamental Research Funds for the Central Universities under Grant No.2009RC0122
文摘The growing P2P streaming traffic brings a variety of problems and challenges to ISP networks and service providers.A P2P streaming traffic classification method based on sampling technology is presented in this paper.By analyzing traffic statistical features and network behavior of P2P streaming,a group of flow characteristics were found,which can make P2P streaming more recognizable among other applications.Attributes from Netflow and those proposed by us are compared in terms of classification accuracy,and so are the results of different sampling rates.It is proved that the unified classification model with the proposed attributes can identify P2P streaming quickly and efficiently in the online system.Even with 1:50 sampling rate,the recognition accuracy can be higher than 94%.Moreover,we have evaluated the CPU resources,storage capacity and time consumption before and after the sampling,it is shown that the classification model after the sampling can significantly reduce the resource requirements with the same recognition accuracy.
文摘The continual growth of the use of technological appliances during the COVID-19 pandemic has resulted in a massive volume of data flow on the Internet,as many employees have transitioned to working from home.Furthermore,with the increase in the adoption of encrypted data transmission by many people who tend to use a Virtual Private Network(VPN)or Tor Browser(dark web)to keep their data privacy and hidden,network traffic encryption is rapidly becoming a universal approach.This affects and complicates the quality of service(QoS),traffic monitoring,and network security provided by Internet Service Providers(ISPs),particularly for analysis and anomaly detection approaches based on the network traffic’s nature.The method of categorizing encrypted traffic is one of the most challenging issues introduced by a VPN as a way to bypass censorship as well as gain access to geo-locked services.Therefore,an efficient approach is especially needed that enables the identification of encrypted network traffic data to extract and select valuable features which improve the quality of service and network management as well as to oversee the overall performance.In this paper,the classification of network traffic data in terms of VPN and non-VPN traffic is studied based on the efficiency of time-based features extracted from network packets.Therefore,this paper suggests two machine learning models that categorize network traffic into encrypted and non-encrypted traffic.The proposed models utilize statistical features(SF),Pearson Correlation(PC),and a Genetic Algorithm(GA),preprocessing the traffic samples into net flow traffic to accomplish the experiment’s objectives.The GA-based method utilizes a stochastic method based on natural genetics and biological evolution to extract essential features.The PC-based method performs well in removing different features of network traffic.With a microsecond perpacket prediction time,the best model achieved an accuracy of more than 95.02 percent in the most demanding traffic classification task,a drop in accuracy of only 2.37 percent in comparison to the entire statistical-based machine learning approach.This is extremely promising for the development of real-time traffic analyzers.
基金Supported by the National High Technology Research and Development Programme of China (No. 2005AA121620, 2006AA01Z232)the Zhejiang Provincial Natural Science Foundation of China (No. Y1080935 )the Research Innovation Program for Graduate Students in Jiangsu Province (No. CX07B_ 110zF)
文摘Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results.
基金Supported in part by the National 863 Project of China (No.2006AA01Z232)Zhejiang Natural Science Founda-tion (No.Y1080935)Research Innovation Program Project for Graduate Students in Jiangsu Province ( No.CX07B_110zF)
文摘Accurate and real-time classification of network traffic is significant to network operation and management such as QoS differentiation, traffic shaping and security surveillance. However, with many newly emerged P2P applications using dynamic port numbers, masquerading techniques, and payload encryption to avoid detection, traditional classification approaches turn to be ineffective. In this paper, we present a layered hybrid system to classify current Internet traffic, motivated by variety of network activities and their requirements of traffic classification. The proposed method could achieve fast and accurate traffic classification with low overheads and robustness to accommodate both known and unknown/encrypted applications. Furthermore, it is feasible to be used in the context of real-time traffic classification. Our experimental results show the distinct advantages of the proposed classifi- cation system, compared with the one-step Machine Learning (ML) approach.
文摘Traffic identification becomes more important,yet more challenging as related encryption techniques are rapidly developing nowadays.Unlike recent deep learning methods that apply image processing to solve such encrypted traffic problems,in this pa⁃per,we propose a method named Payload Encoding Representation from Transformer(PERT)to perform automatic traffic feature extraction using a state-of-the-art dynamic word embedding technique.By implementing traffic classification experiments on a pub⁃lic encrypted traffic data set and our captured Android HTTPS traffic,we prove the pro⁃posed method can achieve an obvious better effectiveness than other compared baselines.To the best of our knowledge,this is the first time the encrypted traffic classification with the dynamic word embedding has been addressed.
基金supported by a Korea Institute for Advancement of Technology(KIAT)Grant funded by theKorean Government(MOTIE)(P0008703,The Competency Development Program for Industry Specialists)the MSIT under the ICAN(ICT Challenge and Advanced Network ofHRD)program(No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information Communication Technology Planning and Evaluation(IITP).
文摘With the introduction of 5G technology,the application of Internet of Things(IoT)devices is expanding to various industrial fields.However,introducing a robust,lightweight,low-cost,and low-power security solution to the IoT environment is challenging.Therefore,this study proposes two methods using a data compression technique to detect malicious traffic efficiently and accurately for a secure IoT environment.The first method,compressed sensing and learning(CSL),compresses an event log in a bitmap format to quickly detect attacks.Then,the attack log is detected using a machine-learning classification model.The second method,precise re-learning after CSL(Ra-CSL),comprises a two-step training.It uses CSL as the 1st step analyzer,and the 2nd step analyzer is applied using the original dataset for a log that is detected as an attack in the 1st step analyzer.In the experiment,the bitmap rule was set based on the boundary value,which was 99.6%true positive on average for the attack and benign data found by analyzing the training data.Experimental results showed that the CSL was effective in reducing the training and detection time,and Ra-CSL was effective in increasing the detection rate.According to the experimental results,the data compression technique reduced the memory size by up to 20%and the training and detection times by 67%when compared with the conventional technique.In addition,the proposed technique improves the detection accuracy;the Naive Bayes model with the highest performance showed a detection rate of approximately 99%.
基金supported by the Science and Technology Project of the Headquarters of State Grid Corporation of China(5700-202152186A-0-0-00)。
文摘Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network(CNN),recurrent neural network(RNN),and stacked autoencoder(SAE),and only construct a shallow network to extract features,which leads to the low accuracy of encrypted traffic classification,an encrypted traffic classification framework based on the fusion of vision transformer and temporal features was proposed.Bottleneck transformer network(BoTNet)was used to extract spatial features and bi-directional long short-term memory(BiLSTM)was used to extract temporal features.After the two sub-networks are parallelized,the feature fusion method of early fusion was used in the framework to perform feature fusion.Finally,the encrypted traffic was identified through the fused features.The experimental results show that the BiLSTM and BoTNet fusion transformer(BTFT)model can enhance the performance of encrypted traffic classification by fusing multi-dimensional features.The accuracy rate of a virtual private network(VPN)and non-VPN binary classification is 99.9%,and the accuracy rate of fine-grained encrypted traffic twelve-classification can also reach 97%.
基金supported by the National Natural Science Foundation of China (60903130)
文摘Aiming at the hysteretic characteristics of classification problem existed in current intemet traffic identification field, this paper investigates the traffic characteristic suitable for the on-line traffic classification, such as quality of service (QoS). By the theoretical analysis and the experimental observation, two characteristics (the ACK-Len ab and ACK-Len ha) were obtained. They are the data volume which first be sent by the communication parties continuously. For these two characteristics only depend on data's total length of the first few packets on the flow, network traffic can be classified in the early time when the flow arrived. The experiment based on decision tree C4.5 algorithm, with above 97% accuracy. The result indicated that the characteristics proposed can commendably reflect behavior patterns of the network application, although they are simple.
基金Supported by the National Natural Science Foundation of China (Grant Nos.60525213 and 60776096)the National Basic Research Program of China (Grant No.2006CB303106)+2 种基金the National High-Tech Research & Development Program of China (Grant Nos.2007AA01Z236 and 2007AA01Z449)the Joint Funds of NSFC-Guangdong (Grant No.U0735001)the National Project of Scientific and Technical Supporting Programs (Grant No.2007BAH13B01)
文摘Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifters and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is crested and tested and the empirical results prove its feasibility and effectiveness.
基金supported by the National Natural Science Foundation of China(Nos.61170286 and 61202486)
文摘The continuous emerging of peer-to-peer(P2P) applications enriches resource sharing by networks, but it also brings about many challenges to network management. Therefore, P2 P applications monitoring, in particular,P2 P traffic classification, is becoming increasingly important. In this paper, we propose a novel approach for accurate P2 P traffic classification at a fine-grained level. Our approach relies only on counting some special flows that are appearing frequently and steadily in the traffic generated by specific P2 P applications. In contrast to existing methods, the main contribution of our approach can be summarized as the following two aspects. Firstly, it can achieve a high classification accuracy by exploiting only several generic properties of flows rather than complicated features and sophisticated techniques. Secondly, it can work well even if the classification target is running with other high bandwidth-consuming applications, outperforming most existing host-based approaches, which are incapable of dealing with this situation. We evaluated the performance of our approach on a real-world trace. Experimental results show that P2 P applications can be classified with a true positive rate higher than 97.22% and a false positive rate lower than 2.78%.
基金supported by the National Natural Science Foundation of China under Grants No.61170286,No.61202486
文摘P2P traffic has always been a dominant portion of Internet traffic since its emergence in the late 1990s. The method used to accurately classify P2P traffic remains a key problem for Internet Service Producers (ISPs) and network managers. This paper proposes a novel approach to the accurate classification of P2P traffic at a fine-grained level, which depends solely on the number of special flows during small time intervals. These special flows, named Clustering Flows (CFs), are de- fined as the most frequent and steady flows generated by P2P applications. Hence we are able to classify P2P applications by detecting tlle appearance of corresponding CFs. Com- pared to existing approaches, our classifier can realise high classification accuracy by ex- ploiting only several generic properties of flows, instead of extracting sophisticated fea- tures from host behaviours or transport layer data. We validate our framework on a large set of P2P traffic traces using a Support Vector Machine (SVM). Experimental results show that our approach correctly classifies P2P ap- plications with an average true positive rate of above 98% and a negligible false positive rate of about 0.01%.
基金This work was supported by the State Grid Science and Technology Project Research on Key Technologies and Applications of Self-Service Big Data Governance of Power Grid(5442YD180015).
文摘Network traffic classification,which matches network traffic for a specific class of different granularities,plays a vital role in the domain of network administration and cyber security.With the rapid development of network communication techniques,more and more network applications adopt encryption techniques during communication,which brings significant challenges to traditional network traffic classification methods.On the one hand,traditional methods mainly depend on matching features on the application layer of the ISO/OSI reference model,which leads to the failure of classifying encrypted traffic.On the other hand,machine learning-based methods require human-made features from network traffic data by human experts,which renders it difficult for them to deal with complex network protocols.In this paper,the convolution attention network(CAT)is proposed to overcom those difficulties.As an end-to-end model,CAT takes raw data as input and returns classification results automatically,with engineering by human experts.In CAT,firstly,the importance of different bytes with an attention mechanism of network traffic is achieved.Then,convolution neural network(CNN)is used to learn features automatically and feed the output into a softmax function to get classification results.It enables CAT to learn enough information from network traffic data and ensure the classified accuracy.Extensive experiments on the public encrypted network traffic dataset ISCX2016 demonstrate the effectiveness of the proposed model.
基金supported by the National Basic Research Program of China(2009CB320505)
文摘Network traffic classification aims at identifying the application types of network packets. It is important for Internet service providers (ISPs) to manage bandwidth resources and ensure the quality of service for different network applications However, most classification techniques using machine learning only focus on high flow accuracy and ignore byte accuracy. The classifier would obtain low classification performance for elephant flows as the imbalance between elephant flows and mice flows on Internet. The elephant flows, however, consume much more bandwidth than mice flows. When the classifier is deployed for traffic policing, the network management system cannot penalize elephant flows and avoid network congestion effectively. This article explores the factors related to low byte accuracy, and secondly, it presents a new traffic classification method to improve byte accuracy at the aid of data cleaning. Experiments are carried out on three groups of real-world traffic datasets, and the method is compared with existing work on the performance of improving byte accuracy. Experiment shows that byte accuracy increased by about 22.31% on average. The method outperforms the existing one in most cases.