Domain name generation algorithm(DGA)classification is an essential but challenging problem.Both feature-extract-ing machine learning(ML)methods and deep learning(DL)models such as convolutional neural networks and lo...Domain name generation algorithm(DGA)classification is an essential but challenging problem.Both feature-extract-ing machine learning(ML)methods and deep learning(DL)models such as convolutional neural networks and long short-term memory have been developed.However,the performance of these approaches varies with different types of DGAs.Most features in the ML methods can characterize random-looking DGAs better than Word-looking DGAs.To improve the classification performance on word-looking DGAs,subword tokenization is employed for the DL mod-els.Our experimental results proved that the subword tokenization can provide excellent classification performance on the word-looking DGAs.We then propose an integrated scheme that chooses an appropriate method for DGA classification depending on the nature of the DGAs.Results show that the integrated scheme outperformed existing ML and DL methods,and also the subword DL methods.展开更多
文摘Domain name generation algorithm(DGA)classification is an essential but challenging problem.Both feature-extract-ing machine learning(ML)methods and deep learning(DL)models such as convolutional neural networks and long short-term memory have been developed.However,the performance of these approaches varies with different types of DGAs.Most features in the ML methods can characterize random-looking DGAs better than Word-looking DGAs.To improve the classification performance on word-looking DGAs,subword tokenization is employed for the DL mod-els.Our experimental results proved that the subword tokenization can provide excellent classification performance on the word-looking DGAs.We then propose an integrated scheme that chooses an appropriate method for DGA classification depending on the nature of the DGAs.Results show that the integrated scheme outperformed existing ML and DL methods,and also the subword DL methods.