In the new network business,the danger of botnets should not be underestimated.Botnets often generatemalicious domain names through DGAs to enable communication with command and control servers(C&C)and then receiv...In the new network business,the danger of botnets should not be underestimated.Botnets often generatemalicious domain names through DGAs to enable communication with command and control servers(C&C)and then receive commands from the botmaster,carrying out further attack activities.Therefore,a system based onmachine learning to dichotomizeDNSdomain access is designed,which can instantly detectDGAdomain names and thus quickly dispose of infected computers to avoid spreading the virus and further damage.In the comparison,the bidirectional LSTM model slightly outperformed the unidirectional LSTM network and achieved 99%accuracy in the open dataset classification task.展开更多
Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 ser...Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79%for the detection and macro average precision and recall of 83%for the classification task of DGA domain names.展开更多
Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 ser...Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79% for the detection and macro average precision and recall of 83% for the classification task of DGA domain names.展开更多
Botnets based on the Domain Generation Algorithm(DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family...Botnets based on the Domain Generation Algorithm(DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family and the imbalance of samples continue to impede research on DGA detection. In the existing work, the sample size of each DGA family is regarded as the most important determinant of the resampling proportion;thus,differences in the characteristics of various samples are ignored, and the optimal resampling effect is not achieved.In this paper, a Long Short-Term Memory-based Property and Quantity Dependent Optimization(LSTM.PQDO)method is proposed. This method takes advantage of LSTM to automatically mine the comprehensive features of DGA domain names. It iterates the resampling proportion with the optimal solution based on a comprehensive consideration of the original number and characteristics of the samples to heuristically search for a better solution around the initial solution in the right direction;thus, dynamic optimization of the resampling proportion is realized.The experimental results show that the LSTM.PQDO method can achieve better performance compared with existing models to overcome the difficulties of unbalanced datasets;moreover, it can function as a reference for sample resampling tasks in similar scenarios.展开更多
基金Supported by Hainan Provincial National Science Foundation of China,621MS0789.
文摘In the new network business,the danger of botnets should not be underestimated.Botnets often generatemalicious domain names through DGAs to enable communication with command and control servers(C&C)and then receive commands from the botmaster,carrying out further attack activities.Therefore,a system based onmachine learning to dichotomizeDNSdomain access is designed,which can instantly detectDGAdomain names and thus quickly dispose of infected computers to avoid spreading the virus and further damage.In the comparison,the bidirectional LSTM model slightly outperformed the unidirectional LSTM network and achieved 99%accuracy in the open dataset classification task.
基金Our research was supported by the National Key Research and Development Program of China(Grant No.2016YFB0801004)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDC02030200)the National Key Research and Development Program of China(Grant No.2018YFC0824801).
文摘Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79%for the detection and macro average precision and recall of 83%for the classification task of DGA domain names.
基金supported by the National Key Research and Development Program of China(Grant No.2016YFB0801004)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDC02030200)the National Key Research and Development Program of China(Grant No.2018YFC0824801).
文摘Command and control(C2)servers are used by attackers to operate communications.To perform attacks,attackers usually employee the Domain Generation Algorithm(DGA),with which to confirm rendezvous points to their C2 servers by generating various network locations.The detection of DGA domain names is one of the important technologies for command and control communication detection.Considering the randomness of the DGA domain names,recent research in DGA detection applyed machine learning methods based on features extracting and deep learning architectures to classify domain names.However,these methods are insufficient to handle wordlist-based DGA threats,which generate domain names by randomly concatenating dictionary words according to a special set of rules.In this paper,we proposed a a deep learning framework ATT-CNN-BiLSTMfor identifying and detecting DGA domains to alleviate the threat.Firstly,the Convolutional Neural Network(CNN)and bidirectional Long Short-Term Memory(BiLSTM)neural network layer was used to extract the features of the domain sequences information;secondly,the attention layer was used to allocate the corresponding weight of the extracted deep information from the domain names.Finally,the different weights of features in domain names were put into the output layer to complete the tasks of detection and classification.Our extensive experimental results demonstrate the effectiveness of the proposed model,both on regular DGA domains and DGA that hard to detect such as wordlist-based and part-wordlist-based ones.To be precise,we got a F1 score of 98.79% for the detection and macro average precision and recall of 83% for the classification task of DGA domain names.
基金partially funded by the National Natural Science Foundation of China (No. 61272447)the National Entrepreneurship&Innovation Demonstration Base of China (No. C700011)the Key Research&Development Project of Sichuan Province of China (No.2018G20100)。
文摘Botnets based on the Domain Generation Algorithm(DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family and the imbalance of samples continue to impede research on DGA detection. In the existing work, the sample size of each DGA family is regarded as the most important determinant of the resampling proportion;thus,differences in the characteristics of various samples are ignored, and the optimal resampling effect is not achieved.In this paper, a Long Short-Term Memory-based Property and Quantity Dependent Optimization(LSTM.PQDO)method is proposed. This method takes advantage of LSTM to automatically mine the comprehensive features of DGA domain names. It iterates the resampling proportion with the optimal solution based on a comprehensive consideration of the original number and characteristics of the samples to heuristically search for a better solution around the initial solution in the right direction;thus, dynamic optimization of the resampling proportion is realized.The experimental results show that the LSTM.PQDO method can achieve better performance compared with existing models to overcome the difficulties of unbalanced datasets;moreover, it can function as a reference for sample resampling tasks in similar scenarios.