基于系统调用和数据溯源的PDF文档检测模型被引量：1

PDF document detection model based on system calls and data provenance

下载PDF

导出

摘要针对传统静态检测及动态检测方法无法应对基于大量混淆及未知技术的PDF文档攻击的缺陷,提出了一个基于系统调用和数据溯源技术的新型检测模型NtProvenancer。首先,使用系统调用捕获工具收集文档执行时产生的系统调用记录;其次,利用数据溯源技术构建基于系统调用的数据溯源图;而后,用图的路径筛选算法提取系统调用特征片段进行检测。实验数据集由528个良性PDF文档与320个恶意PDF文档组成。在Adobe Reader上展开测试,并使用词频-逆文档频率(TF-IDF)及PROVDETECTOR稀有度算法替换所提出的图的关键点算法来进行对比实验。结果表明NtProvenancer在精确率和F1分数等多项指标上均优于对比模型。在最佳参数设置下,所提模型的文档训练与检测阶段的平均用时分别为251.51 ms以及60.55 ms,同时误报率低于5.22%,F1分数达到0.989。可见NtProvenancer是一种高效实用的PDF文档检测模型。 Focused on the issue that the traditional static detection and dynamic detection methods cannot cope with malicious PDF document attacks using a lot of obfuscation and unknown technologies,a new detection model based on system calls and data provenance,called NtProvenancer,was proposed.Firstly,the system call records during execution of the document were collected by the system call tracing tool.Then,the data provenance technology was used to establish a data provenance graph based on the system calls.After that,the feature segments of system calls were extracted for detection by using the key point algorithm of the graph.The experimental dataset consists of 528 benign PDF documents and 320 malicious ones.The test was carried out on Adobe Reader,and the Term Frequency-Inverse Document Frequency(TF-IDF)and the rarity algorithm in PROVDETECTOR were used to replace the key point algorithm of the graph to conduct the comparative study.The results show that NtProvenancer has better performance on precision and F1 score.Under the optimal parameter setting,the proposed model has the average time of document training and detection stages of 251.51 ms and 60.55 ms respectively,the false alarm rate lower than 5.22%,and the F1 score reached 0.989,showing that NtProvenancer is an efficient and practical model for PDF document detection.

作者雷靖玮伊鹏陈祥王亮毛明 LEI Jingwei;YI Peng;CHEN Xiang;WANG Liang;MAO Ming(Information Engineering University,Zhengzhou Henan 450001,China)

机构地区中国人民解放军战略支援部队信息工程大学

出处《计算机应用》 CSCD 北大核心 2022年第12期3831-3840,共10页 journal of Computer Applications

基金国防科技创新特区项目。

关键词 PDF文档检测系统调用数据溯源关键点算法特征片段 PDF document detection system call data provenance key point algorithm feature segment

分类号 TP309.5 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1王伟平,柏军洋,张玉婵,王建新.基于代码改写的JavaScript动态污点跟踪[J].清华大学学报（自然科学版）,2016,56(9):956-962. 被引量：3
2王丽娜,谈诚,余荣威,尹正光.针对数据泄漏行为的恶意软件检测[J].计算机研究与发展,2017,54(7):1537-1548. 被引量：16
3马洪亮,王伟,韩臻.混淆恶意JavaScript代码的检测与反混淆方法研究[J].计算机学报,2017,40(7):1699-1713. 被引量：19

二级参考文献27

1穆祥昆,王劲松,薛羽丰,黄玮.基于活跃熵的网络异常流量检测方法[J].通信学报,2013,34(S2):51-57. 被引量：20
2OWASP.Cross-site scripting(XSS)[EB/OL].(2014-04-22).[2015-04-07].https://www.owasp.org/index.php/XSS.
3Meyerovich L A,Livshits B.Conscript:Specifying and enforcing fine-grained security policies for JavaScript in the browser[C]//Proceedings of the 31st IEEE Symposium on Security and Privacy(SP).Piscataway,NJ,USA:IEEE Press,2010:481-496.
4Weinberger J,Barth A,Song D.Towards client-side HTML security policies[C]//Proceedings of the 6th USENIX Conference on Hot Topics in Security.Berkeley,CA,USA:USENIX Association,2011.
5Saxena P,Molnar D,Livshits B.SCRIPTGARD:Automatic context-sensitive sanitization for large-scale legacy web applications[C]//Proceedings of the 18th ACM Conference on Computer and Communications Security.New York,NY,USA:ACM,2011:601-614.
6Vogt P,Nentwich F,Jovanovic N,et al.Cross site scripting prevention with dynamic data tainting and static analysis[C]//Proceedings of the 14th Annual Network and Distributed System Security Symposium.San Diego,CA,USA:Internet Society,2007.
7Minded Security.DOMinatorPro:Securing next generation of Web applications[EB/OL].(2012-09-30).[2015-04-07].https://dominator.mindedsecurity.com.
8Lekies S,Stock B,Johns M.25 million flows later:Large-scale detection of DOM-based XSS[C]//Proceedings of the 20th ACM Conference on Computer and Communications Security.New York,NY,USA:ACM,2013:1193-1204.
9Saxena P,Hanna S,Poosankam P,et al.FLAX:Systematic discovery of client-side validation vulnerabilities in rich Web applications[C]//Proceedings of the 17th Annual Network and Distributed System Security Symposium.San Diego,CA,USA:Internet Society,2010.
10Phung P H,Sands D,Chudnov A.Lightweight selfprotecting JavaScript[C]//Proceedings of the 4th International Symposium on Information,Computer,and Communications Security.New York,NY,USA:ACM,2009:47-60.

共引文献32

1王助尧.论邓小平哲学思想的核心和实质[J].探索,2000(2):4-7. 被引量：5
2宋雪勦,张俊,何明星.基于动态符号执行的不透明谓词反混淆算法[J].西华大学学报（自然科学版）,2018,37(3):73-77.
3曲文鹏,赵连军,邓旭.混淆恶意JavaScript代码的多特征检测识别与分析[J].智能计算机与应用,2018,8(4):42-47. 被引量：1
4夏凡,崔艳荣.基于canvas的前端数据加密[J].电脑知识与技术,2018,14(12Z):33-34.
5刘鹏睿,宋礼鹏.针对恶意JavaScript识别的降维方法[J].计算机工程与应用,2018,54(21):20-24. 被引量：1
6黄一才,周伟伟,郁滨.目标成本值最优的物联网WSS蠕虫抑制算法[J].计算机研究与发展,2018,55(11):2467-2481. 被引量：2
7郭宇燕,江明明,宋万干.可证明安全的弹性泄漏基于证书加密方案[J].淮北师范大学学报（自然科学版）,2019,40(1):19-25. 被引量：2
8邱瑶瑶,方勇,黄诚,刘亮,张星.基于语义分析的恶意JavaScript代码检测方法[J].四川大学学报（自然科学版）,2019,56(2):273-278. 被引量：7
9王鹏,努尔布力,苏芮.恶意代码检测研究前沿与发展趋势的计量分析[J].计算机工程与应用,2019,55(8):92-101. 被引量：2
10倪硕辰.计算机恶意软件的危害及防范方法[J].科技传播,2019,11(6):125-126. 被引量：1

同被引文献5

1马洪亮,王伟,韩臻.混淆恶意JavaScript代码的检测与反混淆方法研究[J].计算机学报,2017,40(7):1699-1713. 被引量：19
2Samuel Ndichu,Sangwook Kim,Seiichi Ozawa.Deobfuscation,unpacking,and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement[J].CAAI Transactions on Intelligence Technology,2020,5(3):184-192. 被引量：1
3王婷,牟永敏,张志华,崔展齐.JavaScript混淆恶意代码检测方法[J].计算机仿真,2021,38(2):432-437. 被引量：3
4喻民,姜建国,李罡,刘超,黄伟庆,宋楠.恶意文档检测研究综述[J].信息安全学报,2021,6(3):54-76. 被引量：9
5陈恺,王鹏,Yeonjoon Lee,王晓峰,张楠,黄鹤清,邹维,刘鹏.面向海量软件的未知恶意代码检测方法[J].信息安全学报,2016,1(1):24-38. 被引量：5

引证文献1

1宋恩舟,胡涛,伊鹏,王文博.基于符号执行优化的PDF恶意指标提取技术[J].计算机科学,2024,51(7):389-396.

1稿约[J].广西民族大学学报（哲学社会科学版）,2022,44(5):85-85.
2陈晨,史国华,张瑞,张涛,严陶陶,陈勃旭,张子仑,张岩,周巍.一种羊乳粉的微滴式数字PCR定量分析方法[J].食品科学,2022,43(22):341-345. 被引量：2
3蔡发群.基于循环神经网络和余弦相似度算法的智能客服机器人研究[J].软件工程,2022,25(11):39-43. 被引量：5
4李雪莹,王田路,梁鹏,王翀.基于系统模型的用户评论中非功能需求的自动分类[J].电子学报,2022,50(9):2079-2089. 被引量：2
5蒋宗礼,李静文.基于异质信息网络和多任务学习的推荐模型[J].北京工业大学学报,2022,48(12):1289-1297.
6阮进军,杨萍.基于Att-CN-BiLSTM模型的中文新闻文本分类[J].通化师范学院学报,2022,43(12):65-70. 被引量：4
7柳静.利用Adobe After Effects制作魔方打开动画[J].信息与电脑,2022,34(14):32-34.
8郭威,张凯,魏新杰,张华铭.高渗透率分布式光伏接入的新型电力系统净功率预测[J].电测与仪表,2022,59(12):48-55. 被引量：23
9李征.美国分级阅读对我国儿童阅读的借鉴意义——以Reading Wonders Leveled Reader丛书为例[J].出版科学,2022,30(6):108-116. 被引量：2
10张晓涵,吕金鑫.挖掘社交媒体数据探究自然灾害时公众注意力的变化[J].北京测绘,2022,36(9):1171-1176. 被引量：2

计算机应用

2022年第12期

浏览历史

内容加载中请稍等...

基于系统调用和数据溯源的PDF文档检测模型被引量：1

参考文献3

二级参考文献27

共引文献32

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于系统调用和数据溯源的PDF文档检测模型 被引量：1

参考文献3

二级参考文献27

共引文献32

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于系统调用和数据溯源的PDF文档检测模型被引量：1