摘要
现有恶意域名检测模型主要利用字符和单词特征构造分类器,极易导致新出现或新变种等伪造域名的漏报。因此,提出一种基于多尺度卷积和注意力机制的伪造域名检测算法。首先,利用长短时记忆神经网络(Long Short-Term Memory,LSTM)改进的Transformer编码器,细粒度地捕获域名字符串中的多尺度特征;然后,利用注意力机制将多尺度特征融合,深层次提取域名字符串在空间和时间序列维度上的特征信息;最后,引入强化学习算法端到端的优化模型。在多个开源伪造域名集上进行测试,实验结果表明,所提模型在合法域名和伪造域名的二分类任务中可以实现98.03%的Accuracy、97.91%的Precision、2.01%的FPR、1.55%的FNR和98.18%的F1-score,在多家族伪造域名的多分类任务中表现同样良好。
In order to tackle the problems that existing malicious domain name detection methods mainly use characters and word features to construct classifiers,which can easily lead to false negative of new generation or new varieties of forgery domain names.A forgery domain names detection with multi-scale convolution and attention mechanism is proposed.Firstly,Transformer encoder improved by long short-term memory(LSTM)is used to capture multi-scale features of domain name string fine-grainedly.Then,attention mechanism is utilized to fuse the multi-scale features and extract the feature information of domain name strings in the space and time sequence.Finally,reinforcement learning algorithm is introduced to optimize the proposed model in the end-to-end manner.The result of experiments on open-source forgery domain datasets shows that the proposed method can achieve 98.03% Accuracy,97.91% Precision,2.01% FPR,1.55% FNR and 98.18% F1-score in the binary classification task of normal domain names and forgery domain names.It also has the same observation that the proposed method has better performance in the multi-classification task of multi-family forgery domain names.
作者
马伟
谢莉萍
惠巧娟
MA Wei;XIE Liping;HUI Qiaojuan(Department of Information and Computer Science,Xinhua College of Ningxia University,Yinchuan Ningxia 750021,China;School of Information Engineering,Yinchuan University of Science and Technology,Yinchuan Ningxia 750021,China)
出处
《电子器件》
CAS
2024年第4期922-928,共7页
Chinese Journal of Electron Devices
基金
宁夏自然科学基金项目(2022AAC03642,2023AAC03388)
宁夏教育厅产教融合项目(18SFZY29)
宁夏高等学校科学研究项目(NYG2024288)。