摘要
为了更加准确、快速地检测恶意PDF与DOCX格式文档,提出一种基于深度学习的恶意文档可视化检测方法。该方法通过马尔可夫模型将文档的字节序列转化为三通道的彩色图,从而获取更能区分恶意文档和良性文档的视觉表征,并采用当前主流的EfficientNet-B0模型对提取的可视化特征进行分类。结合迁移学习领域中的微调技术,将ImageNet上的分类权重应用到EfficientNet-B0模型的训练中,加快检测模型的收敛速度,缩短模型的训练时间。实验证明,在两个数据集上,模型的收敛速度快于随机初始化权重的预训练,且模型对恶意PDF文档和恶意DOCX文档的检测准确率分别达到了99.80%和98.14%,优于ResNet34、MobileNetV2等模型。与主流的恶意文档检测工具Wepawet和PJScan相比,所提出的方法具有更优的综合检测性能,进一步验证了所提出方法对恶意文档检测的有效性。
In order to detect malicious PDF and DOCX format documents more accurately and quickly,a visual detection method of malicious documents based on deep learning is proposed.This method converts the byte stream of the document into a three-channel color image through the Markov model,so as to obtain a visual representation that can better distinguish between malicious documents and benign documents,and uses the current mainstream EfficientNet-B0 model to extract visual features to classify.Combined with the fine-tuning technology in the field of transfer learning,the classification weights on ImageNet are applied to the training of the EfficientNet-B0 model,which speeds up the convergence of the detection model and shortens the training time of the model.Experiments show that on two datasets,the convergence speed of the model is faster than the pre-training of random initialization weights,and the detection accuracy of the model for malicious PDF documents and malicious DOCX documents reaches 99.80%and 98.14%,respectively,which is better than models such as ResNet34 and MobileNetV2.Compared with the mainstream malicious document detection tools Wepawet and PJScan,the proposed method has better comprehensive detection performance,which further verifies the effectiveness of the proposed method for malicious document detection.
作者
黄昆
徐洋
张思聪
李克资
Huang Kun;Xu Yang;Zhang Sicong;Li Kezi(Key Laboratory of Information and Computing Science of Guizhou Province,Guizhou Normal University,Guiyang 550001,China)
出处
《电子测量技术》
北大核心
2022年第18期126-133,共8页
Electronic Measurement Technology
基金
中央引导地方科技发展专项资金(黔科中引地〔2018〕4008)
贵州省科技计划项目(黔科合支撑[2020]2Y013号)
贵州省研究生科研基金(黔教合YJSKYJJ〔2021〕102)项目资助。