摘要
攻击者常使用域名生成算法(DGA)生成大量的随机域名来传输恶意软件控制指令,而传统DGA检测方法存在计算量大、检测精确度低等问题,采用机器学习和深度学习的方法可极大缓解上述问题。首先从域名的基本特征、语言特征和统计特征3个方面对DGA域名和正常域名进行特征提取,在特征集上采用机器学习算法进行模型训练;同时,采用长短期记忆(LSTM)网络以域名字符串的嵌入向量作为输入,提取域名的深度特征进行域名检测。通过查准率、召回率、F1-score、ROC曲线、AUC值等评测指标对模型训练结果进行对比,获得较优的DGA域名检测模型。
Attackers often use Domain Generation Algorithms(DGAs)to generate numerous random domain names for transmitting malicious software control commands.However,traditional DGA detection methods have problems such as large amount of calculation and low detection accuracy.The use of machine learning and deep learning methods can greatly alleviate these problems.Firstly,features are extracted from both DGA and legitimate domains across three dimensions:fundamental characteristics,linguistic attributes,and statistical properties.Then machine learning algorithms are used to train models on these feature sets.Additionally,it used Long Short Term Memory(LSTM)network with domain string embedding vector as input to extract deep features of domain names for domain name detection.By comparing the training results of the model through evaluation metrics such as precision,recall,F1 score,ROC curve,AUC value,etc.,a better DGA domain name detection model is obtained.
作者
周婧莹
黎宇
曾楚轩
Zhou Jingying;Li Yu;Zeng Chuxuan(China Unicom Guangdong Branch,Guangzhou 510000,China)
出处
《邮电设计技术》
2024年第8期13-17,共5页
Designing Techniques of Posts and Telecommunications
关键词
域名生成算法
机器学习
深度学习
域名检测
Domain name generation algorithm
Machine learning
Deep learning
Domain name detection