摘要
针对目前DGA(domain generation algorithm)恶意域名检测方法计算量较大、检测精确率不高等问题,提出了DGA恶意域名检测框架。首先对域名的字符统计特征和N-Gram模型特征进行分析,提取出区分度大的域名特征组合;然后利用正常域名和DGA恶意域名数据集训练不同的机器学习模型,如朴素贝叶斯、多层感知器和XGBoost(extreme gradient boosting)模型,再用训练好的模型检测恶意域名。实验结果表明,采用域名的N-Gram模型特征的精确率和召回率都优于统计特征,多层感知器的精确率较高,误报率较低,其AUC(area under curve)值高于朴素贝叶斯和XGBoost模型。
To solve the problems of large computation and low detection accuracy of DGA(domain generation algorithm)malicious domain name detection method,a framework of DGA malicious domain name detection is proposed.First,the statistical features of domain names and N-Gram model features are analyzed,and the features of domain names with large discrimination are extracted.Then,different machine learning models,such as Naive Bayesian,Multilayer Perceptron and XGBoost(extreme gradient boosting)Model,are trained using normal domain names and DGA malicious domain names data set.Then malicious domain names are detected by the trained model.The experimental results show that the accuracy and recall rate of N-Gram model of domain name are better than statistical features.The accuracy rate of multi-layer perceptron is higher and the false alarm rate is lower.The AUC(area under curve)value of N-Gram model is higher than those of Naive Bayesian model and XGBoost model.
作者
蒋鸿玲
戴俊伟
JIANG Hongling;DAI Junwei(School of Information Management,Beijing Information Science&Technology University,Beijing 100192,China)
出处
《北京信息科技大学学报(自然科学版)》
2019年第5期45-50,共6页
Journal of Beijing Information Science and Technology University
基金
北京信息科技大学学校校科研基金(1925023)