摘要
针对当前恶意域名检测方法存在检测精度和范围等表现不佳的问题,提出一种基于BERT和层次化Attention的恶意域名检测算法。首先,通过BERT构造包含上下文语义信息的词向量矩阵;然后,利用双向长短时记忆神经网络(Bi-Directional Long Short Term Memory,Bi-LSTM)分别获得域名字符串统一资源定位符(Uniform Resource Locator,URL)包含的字符和单词的向量表示,并在整条URL中加入全局Attention机制区分不同单词的重要性,在单词中引入局部Attention机制区分不同字符的重要性;最后,利用Softmax分类器进行合法域名与恶意域名的分类。通过在多个数据集上进行测试,实验结果表明,所提方法可以达到96.49%的查准率、96.27%的查全率、3.90%的误报率和94.13%的F1-Score,与当前主流恶意域名检测算法相比,在保持检测精度较高的基础上,具有更广的检测范围。
In view of the poor performance of the existing malicious domain names detection methods in terms of detection precision and range,a malicious domain names detection algorithm based on BERT and hierarchical attention was proposed.Firstly,the word vector matrix containing the context semantics is generated by BERT.Then,the bi-directional long short term memory(Bi-LSTM)is used to obtain vector representation of characters and words contained in uniform resource locator(URL)respectively.The global attention mechanism is introduced in the whole URL to distinguish the importance of different words,and local attention mechanism is introduced to distinguish the importance of each character in words.Finally,the softmax classifier is used to classify normal domain name and malicious domain name.Through testing on multiple data sets,experimental results show that the proposed method can maintain Precision 96.49%,Recall 96.27%,3.90%FPR and F1-Score 94.13.Compared with the existing mainstream malicious domain names detection methods,the proposed method has a wider detection range while maintaining a higher detection accuracy.
作者
张凤
张微
魏金花
ZHANG Feng;ZHANG Wei;WEI Jin-hua(School of Information Engineering,Yinchuan university of Science and Technology,Yinchuan 750003,China)
出处
《中国电子科学研究院学报》
北大核心
2022年第3期290-296,共7页
Journal of China Academy of Electronics and Information Technology
基金
宁夏高教科研项目(NGY2020115)。