Botnets often use domain generation algorithms(DGA)to connect to a command and control(C2)server,which enables the compromised hosts connect to the C2 server for accessing many domains.The detection of DGA domains is ...Botnets often use domain generation algorithms(DGA)to connect to a command and control(C2)server,which enables the compromised hosts connect to the C2 server for accessing many domains.The detection of DGA domains is critical for blocking the C2 server,and for identifying the compromised hosts as well.However,the detection is difficult,because some DGA domain names look normal.Much of the previous work based on statistical analysis of machine learning relies on manual features and contextual information,which causes long response time and cannot be used for real-time detection.In addition,when a new family of DGA appears,the classifier has to be re-trained from the very beginning.This paper presents a deep learning approach based on bidirectional long short-term memory(Bi-LSTM)model for DGA domain detection.The classifier can extract features without the need for manual feature extraction,and the trainable model can effectively deal with new unknown DGA family members.In addition,the proposed model only needs the domain name without any additional context information.All domain names are preprocessed by bigram and the length of each processed domain name is set as a value longer than the most samples.Bidirectional LSTM model receives the encoded data and returns labels to check whether domain names are normal or not.Experiments show that our model outperforms state-of-the-art approaches and is able to detect new DGA families reliably.展开更多
基金This work was supported by the National Natural Science Foundation of China(NSFC)under the grant(No.U1836102).
文摘Botnets often use domain generation algorithms(DGA)to connect to a command and control(C2)server,which enables the compromised hosts connect to the C2 server for accessing many domains.The detection of DGA domains is critical for blocking the C2 server,and for identifying the compromised hosts as well.However,the detection is difficult,because some DGA domain names look normal.Much of the previous work based on statistical analysis of machine learning relies on manual features and contextual information,which causes long response time and cannot be used for real-time detection.In addition,when a new family of DGA appears,the classifier has to be re-trained from the very beginning.This paper presents a deep learning approach based on bidirectional long short-term memory(Bi-LSTM)model for DGA domain detection.The classifier can extract features without the need for manual feature extraction,and the trainable model can effectively deal with new unknown DGA family members.In addition,the proposed model only needs the domain name without any additional context information.All domain names are preprocessed by bigram and the length of each processed domain name is set as a value longer than the most samples.Bidirectional LSTM model receives the encoded data and returns labels to check whether domain names are normal or not.Experiments show that our model outperforms state-of-the-art approaches and is able to detect new DGA families reliably.