期刊文献+

基于API序列和预训练模型的恶意软件检测

Malware detection based on API sequences and pre-training
下载PDF
导出
摘要 针对现有方法存在特征表达受限、无法捕获API序列全局语义信息,且恶意软件数据集通常包含大量无标注数据,无法直接进行有监督学习等问题,利用自然语言预训练模型技术,提出一种基于API调用序列和预训练模型的恶意软件检测方法。使用原始API序列构建分词器;基于BERT模型构建出动态掩码序列模型进行无监督学习的预训练,同时获取API序列的全局动态编码表示;使用该编码构造检测模型。实验结果表明,所提方法能有效检测出恶意软件。 In response to the existing limitations in feature expression and the inability to capture the global semantic information of API sequences,and confronted with the issue of an abundance of unlabeled data typically present in malware datasets,which impedes direct supervised learning,a method for malicious software detection based on pre-trained models utilizing API call sequences was proposed through the application of natural language pre-training model technology.A tokenizer was constructed using the original API sequence.Subsequently,a dynamic mask sequence model was constructed based on the BERT model for unsupervised pre-training,facilitating the extraction of a global encoding representation of the API sequence.This encoding was employed for the construction of a detection model.Experimental results demonstrate the effective detection of malicious software using the method proposed.
作者 窦建民 师智斌 于孟洋 霍帅 张舒娟 DOU Jian-min;SHI Zhi-bin;YU Meng-yang;HUO Shuai;ZHANG Shu-juan(School of Data Science and Technology,North University of China,Taiyuan 030051,China)
出处 《计算机工程与设计》 北大核心 2024年第4期974-981,共8页 Computer Engineering and Design
基金 山西省基础研究计划基金项目(20210302123018)。
关键词 恶意软件检测 预训练模型 无监督学习 动态掩码 软件调用序列 模型微调 编码表示 malware detection pre-trained model unsupervised learning dynamic mask software call sequence model fine-tuning coded representation
  • 相关文献

参考文献4

二级参考文献13

共引文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部