摘要
面向域名生成算法(domain generation algorithm,DGA)的域名检测方法普遍具有特征提取能力弱、特征信息压缩比高等特点,这导致特征信息丢失、特征结构破坏以及域名检测效果较差等诸多不足.针对上述问题,提出一种基于双分支特征提取和自适应胶囊网络的DGA域名检测方法.首先,通过样本清洗和字典构建重构原始样本并生成重构样本集;其次,通过双分支特征提取网络处理重构样本,在其中,利用切片金字塔网络提取域名局部特征,利用Transformer提取域名全局特征,并利用轻量级注意力融合不同层次的域名特征;然后,利用自适应胶囊网络计算域名特征图的重要度系数,将域名文本特征转换为向量域名特征,并通过特征转移计算基于文本特征的域名分类概率;同时,利用多层感知机处理域名统计特征,以此计算基于统计特征的域名分类概率;最后,通过合并得到的两种不同视角的域名分类概率进行域名检测.大量的实验表明,所提方法在DGA域名检测以及DGA域名家族检测分类方面均取得了当前领先的检测效果.在DGA域名检测中,F1分数提升了0.76%-5.57%;在DGA域名家族检测分类中,F1分数(宏平均)提升了1.79%-3.68%.
The existing domain name detection methods for domain generation algorithm(DGA)generally have the characteristics of weak feature extraction ability and high feature information compression ratio,which lead to feature information loss,feature structure destruction,and poor domain name detection performance.Aiming at the above problems,a DGA domain name detection method based on double branch feature extraction and adaptive capsule network is proposed.Firstly,the original samples are reconstructed through sample cleaning and dictionary construction,and the reconstructed sample set is generated.Secondly,the reconstructed samples are processed by a double branch feature extraction network,in which the local features of domain name are extracted by using a sliced pyramid network,the global features of domain name are extracted by using a transformer,and the features at different levels are fused by using lightweight attention.Then,an adaptive capsule network is used to calculate the importance coefficient of the domain name feature map,convert domain name text features into vector domain name features,and calculate the domain name classification probability based on text features by feature transfer.Meanwhile,multilayer perceptron is used to process domain name statistical features to calculate the domain name classification probability based on statistical features.Finally,domain name detection is performed by combining the domain name classification probabilities from two different perspectives.A large number of experiments show that the method proposed in this study achieves leading detection results in DGA domain name detection and DGA domain name family detection and classification,where the F1-score in DGA domain name detection increased by 0.76%to 5.57%,and the F1-score(macro average)in DGA domain name family detection classification increased by 1.79%to 3.68%.
作者
杨宏宇
章涛
张良
成翔
胡泽
YANG Hong-Yu;ZHANG Tao;ZHANG Liang;CHENG Xiang;HU Ze(College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China;College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China;School of Information,The University of Arizona,Tucson AZ 85721,USA;School of Information Engineering,Yangzhou University,Yangzhou 225127,China)
出处
《软件学报》
EI
CSCD
北大核心
2024年第8期3626-3646,共21页
Journal of Software
基金
国家自然科学基金(62201576,U1833107)
中央高校基本科研业务费专项资金(3122022050)
中国民航大学信息安全测评中心开放基金(ISECCA-202202)
中国民航大学学科经费。
关键词
DGA域名检测
深度学习
双分支特征提取网络
切片金字塔网络
自适应胶囊网络
DGA domain name detection
deep learning
double branch feature extraction network
sliced pyramid network
adaptive capsule network