期刊文献+

一种减少对威胁情报标注依赖的自动化IOC抽取方法

An automatical IOC extraction method for reducing dependency on threat intelligence labeling
下载PDF
导出
摘要 为了应对日益严峻的网络威胁,需要对网络攻击做深入的分析.网络威胁指标(IOC)是网络威胁情报(CTI)的重要组成部分,贯穿了网络攻击整个生命周期,准确描述了每个攻击阶段的关键信息(攻击行为、威胁体等).从CTI中抽取IOC可以帮助进行网络防御、追踪和对抗.现有的IOC抽取方法基于机器学习或深度学习方法取得了巨大进展,但是需要大量人工标注的CTI进行训练.为了应对这一挑战,本文提出了一种新颖的IOC自动提取方法(L-AIE),仅使用少量标注的CTI就能达到优秀的提取准确率. L-AIE通过细粒度的分词方式以从较少的CTI中获得足够的信息,上下文层和组合层用于充分提取子词级别的上下文信息.在训练阶段,L-AIE利用额外的关系层来扩大IOC类别之间的差异.实验证明,L-AIE对训练数据量的依赖较小,而且提取效果也优于其他对比方法 . L-AIE仅使用其他模型10%的数据训练,就达到了87.54%Macro F1值,比其他方法高出20%.当训练数据量进一步减少时,L-AIE受影响的程度也小于其他模型的一半. To address the increasingly challenging cyber threats,there is an urgent need to analyze cyber threats to gain advantage in cyberspace operations.Indicator of Compromise(IOC),an essential part of Cyber Threat Intelligence(CTI),is throughout the entire cyber attack lifecycle and describes key information(attack behaviors,entities,etc.)accurately at each attack stage.Extracting IOCs from CTI can assist cyber defence,trace and countermeasure.Existing IOC extraction methods have made great progress with machine learning or deep learning,but they require massive investment to label adequate CTI for training and are not as effective in scenarios with limited labeled CTI.To tackle this challenge,Automatical IOC Extraction based on Less labeled data(L-AIE),a novel IOC extraction method,is proposed to reduce the labeling cost while ensuring the extraction accuracy.L-AIE enhances the CTI text processing by fine-grained word tokenization to obtain enough information from less CTI.Context and Combination Layer are used to extract sufficient context of IOC entities which are split into subwords.Furthermore,in the training stage,L-AIE has an additional Relation Layer to expand the differences between IOC categories.Extensive experiments demonstrates that L-AIE not only has less dependence on the amount of labeled data but also outperforms other outstanding methods.With only approximately 10%of the training data of previous experiments,L-AIE achieves a macro F1 score of 87.54%,more than 20%higher than other methods.When the amount of training data is further reduced,the L-AIE extraction result is affected to less than half the extent of the other models.
作者 余坚 王俊峰 陈熳熳 方智阳 YU Jian;WANG Jun-Feng;CHEN Man-Man;FANG Zhi-Yang(College of Computer Science(College of Software),Sichuan University,Chengdu 610065,China;School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China)
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2024年第4期14-26,共13页 Journal of Sichuan University(Natural Science Edition)
基金 国家自然科学基金(U2133208) 国家重点研发计划(2022YFB3305200) 四川大学-泸州市人民政府战略合作项目(2022CDLZ-5)。
关键词 网络威胁 网络威胁情报 威胁指标 小样本学习 Cyber threat Cyber threat intelligence Indicator of compromise Few-shot learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部